Sciweavers

95 search results - page 5 / 19
» Policy Gradients for Cryptanalysis
Sort
View
90
Voted
ECML
2007
Springer
15 years 5 months ago
Policy Gradient Critics
We present Policy Gradient Actor-Critic (PGAC), a new model-free Reinforcement Learning (RL) method for creating limited-memory stochastic policies for Partially Observable Markov ...
Daan Wierstra, Jürgen Schmidhuber
NN
2010
Springer
125views Neural Networks» more  NN 2010»
14 years 10 months ago
Parameter-exploring policy gradients
We present a model-free reinforcement learning method for partially observable Markov decision problems. Our method estimates a likelihood gradient by sampling directly in paramet...
Frank Sehnke, Christian Osendorfer, Thomas Rü...
ICMLA
2010
14 years 9 months ago
Multimodal Parameter-exploring Policy Gradients
Abstract-- Policy Gradients with Parameter-based Exploration (PGPE) is a novel model-free reinforcement learning method that alleviates the problem of high-variance gradient estima...
Frank Sehnke, Alex Graves, Christian Osendorfer, J...
82
Voted
ICML
2003
IEEE
16 years 14 days ago
Hierarchical Policy Gradient Algorithms
Hierarchical reinforcement learning is a general framework which attempts to accelerate policy learning in large domains. On the other hand, policy gradient reinforcement learning...
Mohammad Ghavamzadeh, Sridhar Mahadevan
113
Voted
IROS
2006
IEEE
113views Robotics» more  IROS 2006»
15 years 5 months ago
Policy Gradient Methods for Robotics
— The aquisition and improvement of motor skills and control policies for robotics from trial and error is of essential importance if robots should ever leave precisely pre-struc...
Jan Peters, Stefan Schaal