Search Sciweavers | Sciweavers

162 search results - page 1 / 33

» Off-Policy Temporal Difference Learning with Function Approx...

140

click to vote

ICML
2001
IEEE

185views Machine Learning» more ICML 2001»

Off-Policy Temporal Difference Learning with Function Approximation

16 years 4 months ago

Download www.cs.ualberta.ca

We introduce the first algorithm for off-policy temporal-difference learning that is stable with linear function approximation. Off-policy learning is of interest because it forms...

Doina Precup, Richard S. Sutton, Sanjoy Dasgupta

claim paper

Read More »

157

Voted

ICML
2000
IEEE

153views Machine Learning» more ICML 2000»

Eligibility Traces for Off-Policy Policy Evaluation

16 years 4 months ago

Download www.cs.ualberta.ca

Eligibility traces have been shown to speed reinforcement learning, to make it more robust to hidden states, and to provide a link between Monte Carlo and temporal-difference meth...

Doina Precup, Richard S. Sutton, Satinder P. Singh

claim paper

Read More »

135

click to vote

ICML
2010
IEEE

231views Machine Learning» more ICML 2010»

Toward Off-Policy Learning Control with Function Approximation

15 years 4 months ago

Download www.sztaki.hu

We present the first temporal-difference learning algorithm for off-policy control with unrestricted linear function approximation whose per-time-step complexity is linear in the ...

Hamid Reza Maei, Csaba Szepesvári, Shalabh ...

claim paper

Read More »

147

click to vote

NIPS
2001

206views Information Technology» more NIPS 2001»

Model-Free Least-Squares Policy Iteration

15 years 4 months ago

Download www.cs.duke.edu

We propose a new approach to reinforcement learning which combines least squares function approximation with policy iteration. Our method is model-free and completely off policy. ...

Michail G. Lagoudakis, Ronald Parr

claim paper

Read More »

131

click to vote

ML
2002
ACM

154views Machine Learning» more ML 2002»

Technical Update: Least-Squares Temporal Difference Learning

15 years 3 months ago

Download www.research.rutgers.edu

TD() is a popular family of algorithms for approximate policy evaluation in large MDPs. TD() works by incrementally updating the value function after each observed transition. It h...

Justin A. Boyan

claim paper

Read More »

« Prev « First page 1 / 33 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers