Sciweavers

JMLR
2002

On the Convergence of Optimistic Policy Iteration

13 years 4 months ago
On the Convergence of Optimistic Policy Iteration
We consider a finite-state Markov decision problem and establish the convergence of a special case of optimistic policy iteration that involves Monte Carlo estimation of Q-values, in conjunction with greedy policy selection. We provide convergence results for a number of algorithmic variations, including one that involves temporal difference learning (bootstrapping) instead of Monte Carlo estimation. We also indicate some extensions that either fail or are unlikely to go through.
John N. Tsitsiklis
Added 22 Dec 2010
Updated 22 Dec 2010
Type Journal
Year 2002
Where JMLR
Authors John N. Tsitsiklis
Comments (0)