Sciweavers

102
Voted
ML
2008
ACM
152views Machine Learning» more  ML 2008»
14 years 9 months ago
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
Abstract. We consider batch reinforcement learning problems in continuous space, expected total discounted-reward Markovian Decision Problems. As opposed to previous theoretical wo...
András Antos, Csaba Szepesvári, R&ea...
65
Voted
COLT
2000
Springer
15 years 1 months ago
Estimation and Approximation Bounds for Gradient-Based Reinforcement Learning
We model reinforcement learning as the problem of learning to control a Partially Observable Markov Decision Process (  ¢¡¤£¦¥§  ), and focus on gradient ascent approache...
Peter L. Bartlett, Jonathan Baxter
95
Voted
CDC
2009
IEEE
147views Control Systems» more  CDC 2009»
15 years 2 months ago
A simulation-based method for aggregating Markov chains
— This paper addresses model reduction for a Markov chain on a large state space. A simulation-based framework is introduced to perform state aggregation of the Markov chain base...
Kun Deng, Prashant G. Mehta, Sean P. Meyn
68
Voted
ICML
2000
IEEE
15 years 10 months ago
Reinforcement Learning in POMDP's via Direct Gradient Ascent
This paper discusses theoretical and experimental aspects of gradient-based approaches to the direct optimization of policy performance in controlled ??? ?s. We introduce ??? ?, a...
Jonathan Baxter, Peter L. Bartlett