We introduce the first algorithm for off-policy temporal-difference learning that is stable with linear function approximation. Off-policy learning is of interest because it forms...
This paper investigates the problem of automatically learning how to restructure the reward function of a Markov decision process so as to speed up reinforcement learning. We begi...
In several agent-oriented scenarios in the real world, an autonomous agent that is situated in an unknown environment must learn through a process of trial and error to take actio...
Many multiagent problems comprise subtasks which can be considered as reinforcement learning (RL) problems. In addition to classical temporal difference methods, evolutionary algo...
Jan Hendrik Metzen, Mark Edgington, Yohannes Kassa...
As the reach of multiagent reinforcement learning extends to more and more complex tasks, it is likely that the diverse challenges posed by some of these tasks can only be address...