Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

92

Voted

ICML
2010
IEEE

favoriteEmaildiscussreport

167views Machine Learning» more ICML 2010»

Finite-Sample Analysis of LSTD

15 years 1 months ago

Finite-Sample Analysis of LSTD

Download hal.inria.fr

In this paper we consider the problem of policy evaluation in reinforcement learning, i.e., learning the value function of a fixed policy, using the least-squares temporal-difference (LSTD) learning algorithm. We report a finite-sample analysis of LSTD. We first derive a bound on the performance of the LSTD solution evaluated at the states generated by the Markov chain and used by the algorithm to learn an estimate of the value function. This result is general in the sense that no assumption is made on the existence of a stationary distribution for the Markov chain. We then derive generalization bounds in the case when the Markov chain possesses a stationary distribution and is -mixing.

Alessandro Lazaric, Mohammad Ghavamzadeh, Ré

Real-time Traffic

ICML 2010 | Machine Learning | Markov Chain | Stationary Distribution | Value Function |

claim paper

Related Content

» FiniteSample Performance Analysis of Widely Linear Multiuser Receivers for DSCDMA Systems

» Finite sample effects of the fast ICA algorithm

» Cluster Stability for Finite Samples

» Robustness of Fourier estimator of integrated volatility in the presence of microstructure...

» Convergence of Least Squares Temporal Difference Methods Under General Conditions

» Expected loss bounds for authentication in constrained channels

» Econometrics

» Unlabeled data Now it helps now it doesnt

» Robust estimation of constrained covariance matrices for confirmatory factor analysis

Post Info
More Details (n/a)

Added	09 Nov 2010
Updated	09 Nov 2010
Type	Conference
Year	2010
Where	ICML
Authors	Alessandro Lazaric, Mohammad Ghavamzadeh, Rémi Munos

Comments (0)