Sciweavers

ML
2008
ACM
152views Machine Learning» more  ML 2008»
13 years 4 months ago
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
Abstract. We consider batch reinforcement learning problems in continuous space, expected total discounted-reward Markovian Decision Problems. As opposed to previous theoretical wo...
András Antos, Csaba Szepesvári, R&ea...
COLT
2000
Springer
13 years 8 months ago
Estimation and Approximation Bounds for Gradient-Based Reinforcement Learning
We model reinforcement learning as the problem of learning to control a Partially Observable Markov Decision Process (  ¢¡¤£¦¥§  ), and focus on gradient ascent approache...
Peter L. Bartlett, Jonathan Baxter
CDC
2009
IEEE
147views Control Systems» more  CDC 2009»
13 years 9 months ago
A simulation-based method for aggregating Markov chains
— This paper addresses model reduction for a Markov chain on a large state space. A simulation-based framework is introduced to perform state aggregation of the Markov chain base...
Kun Deng, Prashant G. Mehta, Sean P. Meyn
ICML
2000
IEEE
14 years 5 months ago
Reinforcement Learning in POMDP's via Direct Gradient Ascent
This paper discusses theoretical and experimental aspects of gradient-based approaches to the direct optimization of policy performance in controlled ??? ?s. We introduce ??? ?, a...
Jonathan Baxter, Peter L. Bartlett