Sciweavers

AAAI
2011

Differential Eligibility Vectors for Advantage Updating and Gradient Methods

12 years 4 months ago
Differential Eligibility Vectors for Advantage Updating and Gradient Methods
In this paper we propose differential eligibility vectors (DEV) for temporal-difference (TD) learning, a new class of eligibility vectors designed to bring out the contribution of each action in the TD-error at each state. Specifically, we use DEV in TD-Q(λ) to more accurately learn the relative value of the actions, rather than their absolute value. We identify conditions that ensure convergence w.p.1 of TD-Q(λ) with DEV and show that this algorithm can also be used to directly approximate the advantage function associated with a given policy, without the need to compute an auxiliary function – something that, to the extent of our knowledge, was not known possible. Finally, we discuss the integration of DEV in LSTDQ and actor-critic algorithms.
Francisco S. Melo
Added 12 Dec 2011
Updated 12 Dec 2011
Type Journal
Year 2011
Where AAAI
Authors Francisco S. Melo
Comments (0)