Sciweavers

NECO
2010

Derivatives of Logarithmic Stationary Distributions for Policy Gradient Reinforcement Learning

13 years 2 months ago
Derivatives of Logarithmic Stationary Distributions for Policy Gradient Reinforcement Learning
Most conventional Policy Gradient Reinforcement Learning (PGRL) algorithms neglect (or do not explicitly make use of) a term in the average reward gradient with respect to the policy parameter. That term involves the derivative of the stationary state distribution which corresponds to the sensitivity of its distribution to changes in the policy parameter. Although the bias introduced by this omission can be reduced by setting the forgetting rate γ for the value functions close to 1, these algorithms do not permit γ to
Tetsuro Morimura, Eiji Uchibe, Junichiro Yoshimoto
Added 29 Jan 2011
Updated 29 Jan 2011
Type Journal
Year 2010
Where NECO
Authors Tetsuro Morimura, Eiji Uchibe, Junichiro Yoshimoto, Jan Peters, Kenji Doya
Comments (0)