Sciweavers

ICML
2010
IEEE

Convergence of Least Squares Temporal Difference Methods Under General Conditions

13 years 5 months ago
Convergence of Least Squares Temporal Difference Methods Under General Conditions
We consider approximate policy evaluation for finite state and action Markov decision processes (MDP) in the off-policy learning context and with the simulation-based least squares temporal difference algorithm, LSTD(). We establish for the discounted cost criterion that the off-policy LSTD() converges almost surely under mild, minimal conditions. We also analyze other convergence and boundedness properties of the iterates involved in the algorithm, and based on them, we suggest a modification in its practical implementation. Our analysis uses theories of both finite space Markov chains and Markov chains on topological spaces.
Huizhen Yu
Added 09 Nov 2010
Updated 09 Nov 2010
Type Conference
Year 2010
Where ICML
Authors Huizhen Yu
Comments (0)