Sciweavers

ESANN
2006

Reducing policy degradation in neuro-dynamic programming

13 years 5 months ago
Reducing policy degradation in neuro-dynamic programming
We focus on neuro-dynamic programming methods to learn state-action value functions and outline some of the inherent problems to be faced, when performing reinforcement learning in combination with function approximation. In an attempt to overcome some of these problems, we develop a reinforcement learning method that monitors the learning process, enables the learner to reflect whether it is better to cease learning, and thus obtains more stable learning results.
Thomas Gabel, Martin Riedmiller
Added 31 Oct 2010
Updated 31 Oct 2010
Type Conference
Year 2006
Where ESANN
Authors Thomas Gabel, Martin Riedmiller
Comments (0)