AbstractGroup utility functions are an extension of the common team utility function for providing multiple agents with a common reinforcement learning signal for learning cooperat...
In this theoretical contribution we provide mathematical proof that two of the most important classes of network learning - correlation-based differential Hebbian learning and rew...
Christoph Kolodziejski, Bernd Porr, Minija Tamosiu...
tion Learning about Temporally Abstract Actions Richard S. Sutton Department of Computer Science University of Massachusetts Amherst, MA 01003-4610 rich@cs.umass.edu Doina Precup D...
Richard S. Sutton, Doina Precup, Satinder P. Singh
An open problem in reinforcement learning is discovering hierarchical structure. HEXQ, an algorithm which automatically attempts to decompose and solve a model-free factored MDP h...
We introduce the first algorithm for off-policy temporal-difference learning that is stable with linear function approximation. Off-policy learning is of interest because it forms...