Sciweavers

6 search results - page 1 / 2
» Hyperbolically Discounted Temporal Difference Learning
Sort
View
NECO
2010
52views more  NECO 2010»
14 years 8 months ago
Hyperbolically Discounted Temporal Difference Learning
William H. Alexander, Joshua W. Brown
ML
2002
ACM
168views Machine Learning» more  ML 2002»
14 years 9 months ago
On Average Versus Discounted Reward Temporal-Difference Learning
We provide an analytical comparison between discounted and average reward temporal-difference (TD) learning with linearly parameterized approximations. We first consider the asympt...
John N. Tsitsiklis, Benjamin Van Roy
ICML
2010
IEEE
14 years 10 months ago
Convergence of Least Squares Temporal Difference Methods Under General Conditions
We consider approximate policy evaluation for finite state and action Markov decision processes (MDP) in the off-policy learning context and with the simulation-based least square...
Huizhen Yu
ICML
2005
IEEE
15 years 10 months ago
Reinforcement learning with Gaussian processes
Gaussian Process Temporal Difference (GPTD) learning offers a Bayesian solution to the policy evaluation problem of reinforcement learning. In this paper we extend the GPTD framew...
Yaakov Engel, Shie Mannor, Ron Meir
ICML
2007
IEEE
15 years 10 months ago
Bayesian actor-critic algorithms
We1 present a new actor-critic learning model in which a Bayesian class of non-parametric critics, using Gaussian process temporal difference learning is used. Such critics model ...
Mohammad Ghavamzadeh, Yaakov Engel