Adaptive Step-size Policy Gradients with Average Reward Metric

12 years 11 months ago

Download jmlr.csail.mit.edu

In this paper, we propose a novel adaptive step-size approach for policy gradient reinforcement learning. A new metric is defined for policy gradients that measures the effect of changes on average reward with respect to the policy parameters. Since the metric directly measures the effects on the average reward, the resulting policy gradient learning employs an adaptive step-size strategy that can effectively avoid falling into a stagnant phase from the complex structure of the average reward function with respect to the policy parameters. Two algorithms are derived with the metric as variants of ordinary and natural policy gradients. Their properties are compared with previously proposed policy gradients through numerical experiments with simple, but non-trivial, 3-state Markov Decision Processes (MDPs). We also show performance improvements over previous methods in on-line learning with more challenging 20-state MDPs.

Takamitsu Matsubara, Tetsuro Morimura, Jun Morimot

Real-time Traffic

Gradients | JMLR 2010 | Policy | Policy Gradients |

claim paper

Post Info
More Details (n/a)

Added	19 May 2011
Updated	19 May 2011
Type	Journal
Year	2010
Where	JMLR
Authors	Takamitsu Matsubara, Tetsuro Morimura, Jun Morimoto

Comments (0)

Sciweavers

Adaptive Step-size Policy Gradients with Average Reward Metric

Gradients | JMLR 2010 | Policy | Policy Gradients |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers