Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

14

ML
2002
ACM

favoriteEmaildiscussreport

121views Machine Learning» more ML 2002»

Near-Optimal Reinforcement Learning in Polynomial Time

13 years 4 months ago

Near-Optimal Reinforcement Learning in Polynomial Time

Download www.cis.upenn.edu

We present new algorithms for reinforcement learning, and prove that they have polynomial bounds on the resources required to achieve near-optimal return in general Markov decision processes. After observing that the number of actions required to approach the optimal return is lower bounded by the mixing time T of the optimal policy in the undiscounted case or by the horizon time T in the discounted case, we then give algorithms requiring a number of actions and total computation time that are only polynomial in T and the number of states, for both the undiscounted and discounted cases. An interesting aspect of our algorithms is their explicit handling of the ExplorationExploitation trade-o .

Michael J. Kearns, Satinder P. Singh

Real-time Traffic

Algorithms | General Markov Decision | Machine Learning | ML 2002 | Near-optimal Return |

claim paper

Related Content

» RMAX A General Polynomial Time Algorithm for NearOptimal Reinforcement Learning

» A hierarchical approach to efficient reinforcement learning in deterministic domains

» Integrating SampleBased Planning and ModelBased Reinforcement Learning

» PACMDP learning with knowledgebased admissible models

» NonDeterministic Policies in Markovian Decision Processes

» Exploration in Metric State Spaces

» Learning Binary Relations Using Weighted Majority Voting

» NearBayesian exploration in polynomial time

» An analytic solution to discrete Bayesian reinforcement learning

Post Info
More Details (n/a)

Added	22 Dec 2010
Updated	22 Dec 2010
Type	Journal
Year	2002
Where	ML
Authors	Michael J. Kearns, Satinder P. Singh

Comments (0)