Eligibility traces have been shown to speed reinforcement learning, to make it more robust to hidden states, and to provide a link between Monte Carlo and temporal-difference meth...
Doina Precup, Richard S. Sutton, Satinder P. Singh
We present a unified framework for learning link prediction and edge weight prediction functions in large networks, based on the transformation of a graph's algebraic spectru...
This paper examines the notion of symmetry in Markov decision processes (MDPs). We define symmetry for an MDP and show how it can be exploited for more effective learning in singl...
We consider an online learning setting where at each time step the decision maker has to choose how to distribute the future loss between k alternatives, and then observes the los...
Assessing the reliability of a software system has always been an elusive target. A program may work very well for a number of years and this same program may suddenly become quit...