Sciweavers

86 search results - page 4 / 18
» Estimation and Approximation Bounds for Gradient-Based Reinf...
Sort
View

Publication
222views
15 years 6 months ago
Algorithms and Bounds for Rollout Sampling Approximate Policy Iteration
Abstract: Several approximate policy iteration schemes without value functions, which focus on policy representation using classifiers and address policy learning as a supervis...
Christos Dimitrakakis, Michail G. Lagoudakis
ICMLA
2007
14 years 11 months ago
Control of a re-entrant line manufacturing model with a reinforcement learning approach
This paper presents the application of a reinforcement learning (RL) approach for the near-optimal control of a re-entrant line manufacturing (RLM) model. The RL approach utilizes...
José A. Ramírez-Hernández, Em...
87
Voted
CORR
2010
Springer
105views Education» more  CORR 2010»
14 years 8 months ago
Optimism in Reinforcement Learning Based on Kullback-Leibler Divergence
We consider model-based reinforcement learning in finite Markov Decision Processes (MDPs), focussing on so-called optimistic strategies. Optimism is usually implemented by carryin...
Sarah Filippi, Olivier Cappé, Aurelien Gari...
68
Voted
ICML
2000
IEEE
15 years 10 months ago
Reinforcement Learning in POMDP's via Direct Gradient Ascent
This paper discusses theoretical and experimental aspects of gradient-based approaches to the direct optimization of policy performance in controlled ??? ?s. We introduce ??? ?, a...
Jonathan Baxter, Peter L. Bartlett
88
Voted
ICML
2001
IEEE
15 years 10 months ago
Off-Policy Temporal Difference Learning with Function Approximation
We introduce the first algorithm for off-policy temporal-difference learning that is stable with linear function approximation. Off-policy learning is of interest because it forms...
Doina Precup, Richard S. Sutton, Sanjoy Dasgupta