Sciweavers

827 search results - page 2 / 166
» Variational methods for Reinforcement Learning
Sort
View
NIPS
2001
13 years 6 months ago
Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning
Policy gradient methods for reinforcement learning avoid some of the undesirable properties of the value function approaches, such as policy degradation (Baxter and Bartlett, 2001...
Evan Greensmith, Peter L. Bartlett, Jonathan Baxte...
RAS
2006
105views more  RAS 2006»
13 years 5 months ago
Reinforcement learning for quasi-passive dynamic walking of an unstable biped robot
A class of biped locomotion called Passive Dynamic Walking (PDW) has been recognized to be efficient in energy consumption and a key to understand human walking. Although PDW is s...
Kentarou Hitomi, Tomohiro Shibata, Yutaka Nakamura...
IJON
2006
90views more  IJON 2006»
13 years 5 months ago
Reinforcement learning of a simple control task using the spike response model
In this work, we propose a variation of a direct reinforcement learning algorithm, suitable for usage with spiking neurons based on the spike response model (SRM). The SRM is a bi...
Murilo Saraiva de Queiroz, Roberto Coelho de Berr&...
NCI
2004
185views Neural Networks» more  NCI 2004»
13 years 6 months ago
Hierarchical reinforcement learning with subpolicies specializing for learned subgoals
This paper describes a method for hierarchical reinforcement learning in which high-level policies automatically discover subgoals, and low-level policies learn to specialize for ...
Bram Bakker, Jürgen Schmidhuber
ESANN
2006
13 years 6 months ago
Reducing policy degradation in neuro-dynamic programming
We focus on neuro-dynamic programming methods to learn state-action value functions and outline some of the inherent problems to be faced, when performing reinforcement learning in...
Thomas Gabel, Martin Riedmiller