Sciweavers

NIPS
1993

Convergence of Stochastic Iterative Dynamic Programming Algorithms

13 years 5 months ago
Convergence of Stochastic Iterative Dynamic Programming Algorithms
Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms,including the TD( ) algorithm of Sutton (1988) and the Q-learning algorithm of Watkins (1989), can be motivated heuristically as approximations to dynamic programming (DP). In this paper we provide a rigorous proof of convergence of these DP-based learning algorithms by relating them to the powerful techniques of stochastic approximation theory via a new convergence theorem. The theorem establishes a general class of convergent algorithms to which both TD( ) and Q-learning belong. Copyright c Massachusetts Institute of Technology, 1993 This report describes research done at the Dept. of Brain and Cognitive Sciences, the Center for Biological and Computational Learning, and the Arti cial Intelligence Laboratory of the Massachusetts Institute of Technology. Support for CBCL is provided in part by a grant from the NSF...
Tommi Jaakkola, Michael I. Jordan, Satinder P. Sin
Added 02 Nov 2010
Updated 02 Nov 2010
Type Conference
Year 1993
Where NIPS
Authors Tommi Jaakkola, Michael I. Jordan, Satinder P. Singh
Comments (0)