Differential Eligibility Vectors for Advantage Updating and Gradient Methods

14 years 5 months ago

Download gaips.inesc-id.pt

In this paper we propose differential eligibility vectors (DEV) for temporal-difference (TD) learning, a new class of eligibility vectors designed to bring out the contribution of each action in the TD-error at each state. Speciﬁcally, we use DEV in TD-Q(λ) to more accurately learn the relative value of the actions, rather than their absolute value. We identify conditions that ensure convergence w.p.1 of TD-Q(λ) with DEV and show that this algorithm can also be used to directly approximate the advantage function associated with a given policy, without the need to compute an auxiliary function – something that, to the extent of our knowledge, was not known possible. Finally, we discuss the integration of DEV in LSTDQ and actor-critic algorithms.

Francisco S. Melo

Real-time Traffic

AAAI 2011 | Auxiliary Function | Intelligent Agents | Relative Value | Temporal Difference |

claim paper

Added	12 Dec 2011
Updated	12 Dec 2011
Type	Journal
Year	2011
Where	AAAI
Authors	Francisco S. Melo

Sciweavers

Differential Eligibility Vectors for Advantage Updating and Gradient Methods

AAAI 2011 | Auxiliary Function | Intelligent Agents | Relative Value | Temporal Difference |

Explore & Download

Productivity Tools

Sciweavers