Sciweavers

2 search results - page 1 / 1
» Derivatives of Logarithmic Stationary Distributions for Poli...
Sort
View
NECO
2010
97views more  NECO 2010»
13 years 3 months ago
Derivatives of Logarithmic Stationary Distributions for Policy Gradient Reinforcement Learning
Most conventional Policy Gradient Reinforcement Learning (PGRL) algorithms neglect (or do not explicitly make use of) a term in the average reward gradient with respect to the pol...
Tetsuro Morimura, Eiji Uchibe, Junichiro Yoshimoto...
ICML
2010
IEEE
13 years 5 months ago
Finite-Sample Analysis of LSTD
In this paper we consider the problem of policy evaluation in reinforcement learning, i.e., learning the value function of a fixed policy, using the least-squares temporal-differe...
Alessandro Lazaric, Mohammad Ghavamzadeh, Ré...