Parameter-exploring policy gradients

14 years 10 months ago

Download www.kyb.mpg.de

We present a model-free reinforcement learning method for partially observable Markov decision problems. Our method estimates a likelihood gradient by sampling directly in parameter space, which leads to lower variance gradient estimates than obtained by regular policy gradient methods. We show that for several complex control tasks, including robust standing with a humanoid robot, this method outperforms well-known algorithms from the ﬁelds of standard policy gradients, ﬁnite diﬀerence methods and population based heuristics. We also show that the improvement is largest when the parameter samples are drawn symmetrically. Lastly we analyse the importance of the individual components of our method by incrementally incorporating them into the other algorithms, and measuring the gain in performance after each step.

Frank Sehnke, Christian Osendorfer, Thomas Rü

Real-time Traffic

Gradients | Neural Networks | NN 2010 | Observable Markov Decision | Standard Policy Gradients |

claim paper

» Derivatives of Logarithmic Stationary Distributions for Policy Gradient Reinforcement Lear...

» Policy Gradient Planning for Environmental Decision Making with Existing Simulators

» Multimodal Parameterexploring Policy Gradients

» Recurrent policy gradients

» Adaptive Stepsize Policy Gradients with Average Reward Metric

» A Unified View of TD Algorithms Introducing FullGradient TD and EquiGradient Descent TD

» Policy Gradient Method for Team Markov Games

» Differential Eligibility Vectors for Advantage Updating and Gradient Methods

Post Info
More Details (n/a)

Added	29 Jan 2011
Updated	29 Jan 2011
Type	Journal
Year	2010
Where	NN
Authors	Frank Sehnke, Christian Osendorfer, Thomas Rückstieß, Alex Graves, Jan Peters, Jürgen Schmidhuber

Comments (0)

Sciweavers

Parameter-exploring policy gradients

Gradients | Neural Networks | NN 2010 | Observable Markov Decision | Standard Policy Gradients |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers