Search Sciweavers | Sciweavers

162 search results - page 2 / 33

» Off-Policy Temporal Difference Learning with Function Approx...

click to vote

ML
2002
ACM

168views Machine Learning» more ML 2002»

On Average Versus Discounted Reward Temporal-Difference Learning

13 years 4 months ago

Download web.mit.edu

We provide an analytical comparison between discounted and average reward temporal-difference (TD) learning with linearly parameterized approximations. We first consider the asympt...

John N. Tsitsiklis, Benjamin Van Roy

claim paper

Read More »

click to vote

ICML
2008
IEEE

165views Machine Learning» more ICML 2008»

A worst-case comparison between temporal difference and residual gradient with linear function approximation

14 years 5 months ago

Download www.research.rutgers.edu

Residual gradient (RG) was proposed as an alternative to TD(0) for policy evaluation when function approximation is used, but there exists little formal analysis comparing them ex...

Lihong Li

claim paper

Read More »

click to vote

CORR
2010
Springer

204views Education» more CORR 2010»

Predictive State Temporal Difference Learning

13 years 3 months ago

Download www.cs.cmu.edu

We propose a new approach to value function approximation which combines linear temporal difference reinforcement learning with subspace identiﬁcation. In practical applications...

Byron Boots, Geoffrey J. Gordon

claim paper

Read More »

click to vote

ICML
2003
IEEE

150views Machine Learning» more ICML 2003»

The Significance of Temporal-Difference Learning in Self-Play Training TD-Rummy versus EVO-rummy

13 years 10 months ago

Download www.hpl.hp.com

Reinforcement learning has been used for training game playing agents. The value function for a complex game must be approximated with a continuous function because the number of ...

Clifford Kotnik, Jugal K. Kalita

claim paper

Read More »

click to vote

ICML
1999
IEEE

168views Machine Learning» more ICML 1999»

Least-Squares Temporal Difference Learning

14 years 5 months ago

Download www.research.rutgers.edu

Excerpted from: Boyan, Justin. Learning Evaluation Functions for Global Optimization. Ph.D. thesis, Carnegie Mellon University, August 1998. (Available as Technical Report CMU-CS-...

Justin A. Boyan

claim paper

Read More »

« Prev « First page 2 / 33 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers