Search Sciweavers | Sciweavers

162 search results - page 6 / 33

» Off-Policy Temporal Difference Learning with Function Approx...

Voted

ICML
2010
IEEE

219views Machine Learning» more ICML 2010»

Convergence of Least Squares Temporal Difference Methods Under General Conditions

15 years 20 days ago

Download www.cs.helsinki.fi

We consider approximate policy evaluation for finite state and action Markov decision processes (MDP) in the off-policy learning context and with the simulation-based least square...

Huizhen Yu

claim paper

Read More »

click to vote

NIPS
1993

123views Information Technology» more NIPS 1993»

Temporal Difference Learning of Position Evaluation in the Game of Go

15 years 27 days ago

Download www.gatsby.ucl.ac.uk

The game of Go has a high branching factor that defeats the tree search approach used in computer chess, and long-range spatiotemporal interactions that make position evaluation e...

Nicol N. Schraudolph, Peter Dayan, Terrence J. Sej...

claim paper

Read More »

143

click to vote

EWRL
2008

191views Machine Learning» more EWRL 2008»

Bayesian Reward Filtering

15 years 1 months ago

Download www.metz.supelec.fr

A wide variety of function approximation schemes have been applied to reinforcement learning. However, Bayesian filtering approaches, which have been shown efficient in other field...

Matthieu Geist, Olivier Pietquin, Gabriel Fricout

claim paper

Read More »

click to vote

NIPS
2007

164views Information Technology» more NIPS 2007»

Incremental Natural Actor-Critic Algorithms

15 years 1 months ago

Download books.nips.cc

We present four new reinforcement learning algorithms based on actor-critic and natural-gradient ideas, and provide their convergence proofs. Actor-critic reinforcement learning m...

Shalabh Bhatnagar, Richard S. Sutton, Mohammad Gha...

claim paper

Read More »

click to vote

COLT
2000
Springer

121views Machine Learning» more COLT 2000»

Bias-Variance Error Bounds for Temporal Difference Updates

15 years 4 months ago

Download www.cis.upenn.edu

We give the ﬁrst rigorous upper bounds on the error of temporal difference (td) algorithms for policy evaluation as a function of the amount of experience. These upper bounds pr...

Michael J. Kearns, Satinder P. Singh

claim paper

Read More »

« Prev « First page 6 / 33 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers