Sciweavers

NIPS
1998

Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms

13 years 5 months ago
Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms
In this paper, we address two issues of long-standing interest in the reinforcement learning literature. First, what kinds of performance guarantees can be made for Q-learning after only a nite number of actions? Second, what quantitative comparisons can be made between Q-learning and model-based indirect approaches, which use experience to estimate next-state distributions for o -line value iteration? We rst show that both Q-learning and the indirect approach enjoy rather rapid convergence to the optimal policy as a function of the number of state transitions observed. In particular, on the order of only N log1= = 2 logN + log log1= transitions are su cient for both algorithms to come within of the optimal policy, in an idealized model that assumes the observed transitions are well-mixed" throughout an N-state MDP. Thus, the two approaches have roughly the same sample complexity. Perhaps surprisingly, this sample complexity is far less than what is required for the model-based ...
Michael J. Kearns, Satinder P. Singh
Added 01 Nov 2010
Updated 01 Nov 2010
Type Conference
Year 1998
Where NIPS
Authors Michael J. Kearns, Satinder P. Singh
Comments (0)