Search Sciweavers | Sciweavers

27 search results - page 2 / 6

» Policy Gradient Method for Team Markov Games

click to vote

ICANN
2007
Springer

95views Neural Networks» more ICANN 2007»

Solving Deep Memory POMDPs with Recurrent Policy Gradients

13 years 11 months ago

Download www.idsia.ch

Abstract. This paper presents Recurrent Policy Gradients, a modelfree reinforcement learning (RL) method creating limited-memory stochastic policies for partially observable Markov...

Daan Wierstra, Alexander Förster, Jan Peters,...

claim paper

Read More »

click to vote

NIPS
2008

116views Information Technology» more NIPS 2008»

Particle Filter-based Policy Gradient in POMDPs

13 years 6 months ago

Download eprints.pascal-network.org

Our setting is a Partially Observable Markov Decision Process with continuous state, observation and action spaces. Decisions are based on a Particle Filter for estimating the bel...

Pierre-Arnaud Coquelin, Romain Deguest, Rém...

claim paper

Read More »

click to vote

NN
2010
Springer

125views Neural Networks» more NN 2010»

Parameter-exploring policy gradients

13 years 3 months ago

Download www.kyb.mpg.de

We present a model-free reinforcement learning method for partially observable Markov decision problems. Our method estimates a likelihood gradient by sampling directly in paramet...

Frank Sehnke, Christian Osendorfer, Thomas Rü...

claim paper

Read More »

click to vote

ICML
2009
IEEE

148views Machine Learning» more ICML 2009»

Predictive representations for policy gradient in POMDPs

14 years 5 months ago

Download damas.ift.ulaval.ca

We consider the problem of estimating the policy gradient in Partially Observable Markov Decision Processes (POMDPs) with a special class of policies that are based on Predictive ...

Abdeslam Boularias, Brahim Chaib-draa

claim paper

Read More »

click to vote

ECML
2007
Springer

192views Machine Learning» more ECML 2007»

Policy Gradient Critics

13 years 11 months ago

Download www.idsia.ch

We present Policy Gradient Actor-Critic (PGAC), a new model-free Reinforcement Learning (RL) method for creating limited-memory stochastic policies for Partially Observable Markov ...

Daan Wierstra, Jürgen Schmidhuber

claim paper

Read More »

« Prev « First page 2 / 6 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers