Sciweavers

ICANN
2007
Springer

Solving Deep Memory POMDPs with Recurrent Policy Gradients

13 years 10 months ago
Solving Deep Memory POMDPs with Recurrent Policy Gradients
Abstract. This paper presents Recurrent Policy Gradients, a modelfree reinforcement learning (RL) method creating limited-memory stochastic policies for partially observable Markov decision problems (POMDPs) that require long-term memories of past observations. The approach involves approximating a policy gradient for a Recurrent Neural Network (RNN) by backpropagating return-weighted characteristic eligibilities through time. Using a “Long Short-Term Memory” architecture, we are able to outperform other RL methods on two important benchmark tasks. Furthermore, we show promising results on a complex car driving simulation task.
Daan Wierstra, Alexander Förster, Jan Peters,
Added 08 Jun 2010
Updated 08 Jun 2010
Type Conference
Year 2007
Where ICANN
Authors Daan Wierstra, Alexander Förster, Jan Peters, Jürgen Schmidhuber
Comments (0)