Temporal Difference Bayesian Model Averaging: A Bayesian Perspective on Adapting Lambda

13 years 2 months ago

Download www.icml2010.org

Temporal difference (TD) algorithms are attractive for reinforcement learning due to their ease-of-implementation and use of "bootstrapped" return estimates to make efficient use of sampled data. In particular, TD() methods comprise a family of reinforcement learning algorithms that often yield fast convergence by averaging multiple estimators of the expected return. However, TD() chooses a very specific way of averaging these estimators based on the fixed parameter , which may not lead to optimal convergence rates in all settings. In this paper, we derive an automated Bayesian approach to setting that we call temporal difference Bayesian model averaging (TDBMA). Empirically, TD-BMA always performs as well and often much better than the best fixed for TD() (even when performance for different values of varies across problems) without requiring that or any analogous parameter be manually tuned.

Carlton Downey, Scott Sanner

Real-time Traffic

Difference Bayesian Model | ICML 2010 | Machine Learning | Temporal Difference | Yield Fast Convergence |

claim paper

Added	12 Feb 2011
Updated	12 Feb 2011
Type	Journal
Year	2010
Where	ICML
Authors	Carlton Downey, Scott Sanner

Sciweavers

Temporal Difference Bayesian Model Averaging: A Bayesian Perspective on Adapting Lambda

Difference Bayesian Model | ICML 2010 | Machine Learning | Temporal Difference | Yield Fast Convergence |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers