Sciweavers

6 search results - page 1 / 2
» Hyperbolically Discounted Temporal Difference Learning
Sort
View
NECO
2010
52views more  NECO 2010»
13 years 3 months ago
Hyperbolically Discounted Temporal Difference Learning
William H. Alexander, Joshua W. Brown
ML
2002
ACM
168views Machine Learning» more  ML 2002»
13 years 4 months ago
On Average Versus Discounted Reward Temporal-Difference Learning
We provide an analytical comparison between discounted and average reward temporal-difference (TD) learning with linearly parameterized approximations. We first consider the asympt...
John N. Tsitsiklis, Benjamin Van Roy
ICML
2010
IEEE
13 years 6 months ago
Convergence of Least Squares Temporal Difference Methods Under General Conditions
We consider approximate policy evaluation for finite state and action Markov decision processes (MDP) in the off-policy learning context and with the simulation-based least square...
Huizhen Yu
ICML
2005
IEEE
14 years 5 months ago
Reinforcement learning with Gaussian processes
Gaussian Process Temporal Difference (GPTD) learning offers a Bayesian solution to the policy evaluation problem of reinforcement learning. In this paper we extend the GPTD framew...
Yaakov Engel, Shie Mannor, Ron Meir
ICML
2007
IEEE
14 years 5 months ago
Bayesian actor-critic algorithms
We1 present a new actor-critic learning model in which a Bayesian class of non-parametric critics, using Gaussian process temporal difference learning is used. Such critics model ...
Mohammad Ghavamzadeh, Yaakov Engel