Sciweavers

81
Voted
ICML
2010
IEEE
15 years 3 hour ago
Convergence of Least Squares Temporal Difference Methods Under General Conditions
We consider approximate policy evaluation for finite state and action Markov decision processes (MDP) in the off-policy learning context and with the simulation-based least square...
Huizhen Yu