Sciweavers

162 search results - page 6 / 33
» Off-Policy Temporal Difference Learning with Function Approx...
Sort
View
ICML
2010
IEEE
14 years 10 months ago
Convergence of Least Squares Temporal Difference Methods Under General Conditions
We consider approximate policy evaluation for finite state and action Markov decision processes (MDP) in the off-policy learning context and with the simulation-based least square...
Huizhen Yu
NIPS
1993
14 years 10 months ago
Temporal Difference Learning of Position Evaluation in the Game of Go
The game of Go has a high branching factor that defeats the tree search approach used in computer chess, and long-range spatiotemporal interactions that make position evaluation e...
Nicol N. Schraudolph, Peter Dayan, Terrence J. Sej...
EWRL
2008
14 years 11 months ago
Bayesian Reward Filtering
A wide variety of function approximation schemes have been applied to reinforcement learning. However, Bayesian filtering approaches, which have been shown efficient in other field...
Matthieu Geist, Olivier Pietquin, Gabriel Fricout
NIPS
2007
14 years 11 months ago
Incremental Natural Actor-Critic Algorithms
We present four new reinforcement learning algorithms based on actor-critic and natural-gradient ideas, and provide their convergence proofs. Actor-critic reinforcement learning m...
Shalabh Bhatnagar, Richard S. Sutton, Mohammad Gha...
COLT
2000
Springer
15 years 1 months ago
Bias-Variance Error Bounds for Temporal Difference Updates
We give the first rigorous upper bounds on the error of temporal difference (td) algorithms for policy evaluation as a function of the amount of experience. These upper bounds pr...
Michael J. Kearns, Satinder P. Singh