Sciweavers

38 search results - page 1 / 8
» On the Convergence of Optimistic Policy Iteration
Sort
View
JMLR
2002
100views more  JMLR 2002»
13 years 4 months ago
On the Convergence of Optimistic Policy Iteration
We consider a finite-state Markov decision problem and establish the convergence of a special case of optimistic policy iteration that involves Monte Carlo estimation of Q-values,...
John N. Tsitsiklis
ICML
2009
IEEE
14 years 5 months ago
Model-free reinforcement learning as mixture learning
We cast model-free reinforcement learning as the problem of maximizing the likelihood of a probabilistic mixture model via sampling, addressing both the infinite and finite horizo...
Nikos Vlassis, Marc Toussaint
AAAI
2007
13 years 7 months ago
Point-Based Policy Iteration
We describe a point-based policy iteration (PBPI) algorithm for infinite-horizon POMDPs. PBPI replaces the exact policy improvement step of Hansen’s policy iteration with point...
Shihao Ji, Ronald Parr, Hui Li, Xuejun Liao, Lawre...
DAS
2008
Springer
13 years 6 months ago
The Convergence of Iterated Classification
We report an improved methodology for training a sequence of classifiers for document image content extraction, that is, the location and segmentation of regions containing handwr...
Chang An, Henry S. Baird
CDC
2010
IEEE
139views Control Systems» more  CDC 2010»
12 years 12 months ago
Q-learning and enhanced policy iteration in discounted dynamic programming
We consider the classical finite-state discounted Markovian decision problem, and we introduce a new policy iteration-like algorithm for finding the optimal state costs or Q-facto...
Dimitri P. Bertsekas, Huizhen Yu