Sciweavers

13 search results - page 3 / 3
» Rollout Sampling Approximate Policy Iteration
Sort
View
JMLR
2006
143views more  JMLR 2006»
13 years 4 months ago
Geometric Variance Reduction in Markov Chains: Application to Value Function and Gradient Estimation
We study a sequential variance reduction technique for Monte Carlo estimation of functionals in Markov Chains. The method is based on designing sequential control variates using s...
Rémi Munos
CORR
2010
Springer
170views Education» more  CORR 2010»
13 years 4 months ago
Global Optimization for Value Function Approximation
Existing value function approximation methods have been successfully used in many applications, but they often lack useful a priori error bounds. We propose a new approximate bili...
Marek Petrik, Shlomo Zilberstein
CORR
2006
Springer
113views Education» more  CORR 2006»
13 years 4 months ago
A Unified View of TD Algorithms; Introducing Full-Gradient TD and Equi-Gradient Descent TD
This paper addresses the issue of policy evaluation in Markov Decision Processes, using linear function approximation. It provides a unified view of algorithms such as TD(), LSTD()...
Manuel Loth, Philippe Preux