In this paper, we propose a novel adaptive step-size approach for policy gradient reinforcement learning. A new metric is defined for policy gradients that measures the effect of ...
Takamitsu Matsubara, Tetsuro Morimura, Jun Morimot...
This paper addresses the issue of policy evaluation in Markov Decision Processes, using linear function approximation. It provides a unified view of algorithms such as TD(), LSTD()...
The main aim of this paper is to extend the single-agent policy gradient method for multiagent domains where all agents share the same utility function. We formulate these team pro...
In this paper we propose differential eligibility vectors (DEV) for temporal-difference (TD) learning, a new class of eligibility vectors designed to bring out the contribution of...