Sample-based learning and search with permanent and transient memories

14 years 5 months ago

Download www.cs.ualberta.ca

We present a reinforcement learning architecture, Dyna-2, that encompasses both samplebased learning and sample-based search, and that generalises across states during both learning and search. We apply Dyna-2 to high performance Computer Go. In this domain the most successful planning methods are based on sample-based search algorithms, such as UCT, in which states are treated individually, and the most successful learning methods are based on temporal-difference learning algorithms, such as Sarsa, in which linear function approximation is used. In both cases, an estimate of the value function is formed, but in the first case it is transient, computed and then discarded after each move, whereas in the second case it is more permanent, slowly accumulating over many moves and games. The idea of Dyna-2 is for the transient planning memory and the permanent learning memory to remain separate, but for both to be based on linear function approximation and both to be updated by Sarsa. To ap...

David Silver, Martin Müller 0003, Richard S.

Real-time Traffic

ICML 2008 | Linear Function Approximation | Machine Learning | Permanent Learning Memory | Successful Learning Methods |

claim paper

Added	17 Nov 2009
Updated	17 Nov 2009
Type	Conference
Year	2008
Where	ICML
Authors	David Silver, Martin Müller 0003, Richard S. Sutton

Sciweavers

Sample-based learning and search with permanent and transient memories

ICML 2008 | Linear Function Approximation | Machine Learning | Permanent Learning Memory | Successful Learning Methods |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers