Sciweavers

CDC
2010
IEEE
139views Control Systems» more  CDC 2010»
12 years 11 months ago
Q-learning and enhanced policy iteration in discounted dynamic programming
We consider the classical finite-state discounted Markovian decision problem, and we introduce a new policy iteration-like algorithm for finding the optimal state costs or Q-facto...
Dimitri P. Bertsekas, Huizhen Yu