Sciweavers

CDC
2010
IEEE
160views Control Systems» more  CDC 2010»
12 years 12 months ago
Adaptive bases for Q-learning
Abstract-- We consider reinforcement learning, and in particular, the Q-learning algorithm in large state and action spaces. In order to cope with the size of the spaces, a functio...
Dotan Di Castro, Shie Mannor
ICML
2001
IEEE
14 years 5 months ago
Off-Policy Temporal Difference Learning with Function Approximation
We introduce the first algorithm for off-policy temporal-difference learning that is stable with linear function approximation. Off-policy learning is of interest because it forms...
Doina Precup, Richard S. Sutton, Sanjoy Dasgupta
ICML
2008
IEEE
14 years 5 months ago
Sample-based learning and search with permanent and transient memories
We present a reinforcement learning architecture, Dyna-2, that encompasses both samplebased learning and sample-based search, and that generalises across states during both learni...
David Silver, Martin Müller 0003, Richard S. ...
ICML
2008
IEEE
14 years 5 months ago
A worst-case comparison between temporal difference and residual gradient with linear function approximation
Residual gradient (RG) was proposed as an alternative to TD(0) for policy evaluation when function approximation is used, but there exists little formal analysis comparing them ex...
Lihong Li
ICML
2007
IEEE
14 years 5 months ago
Tracking value function dynamics to improve reinforcement learning with piecewise linear function approximation
Reinforcement learning algorithms can become unstable when combined with linear function approximation. Algorithms that minimize the mean-square Bellman error are guaranteed to co...
Chee Wee Phua, Robert Fitch