Near-Bayesian exploration in polynomial time

12 years 6 months ago
Near-Bayesian exploration in polynomial time
We consider the exploration/exploitation problem in reinforcement learning (RL). The Bayesian approach to model-based RL offers an elegant solution to this problem, by considering a distribution over possible models and acting to maximize expected reward; unfortunately, the Bayesian solution is intractable for all but very restricted cases. In this paper we present a simple algorithm, and prove that with high probability it is able to perform -close to the true (intractable) optimal Bayesian policy after some small (polynomial in quantities describing the system) number of time steps. The algorithm and analysis are motivated by the so-called PACMDP approach, and extend such results into the setting of Bayesian RL. In this setting, we show that we can achieve lower sample complexity bounds than existing algorithms, while using an exploration strategy that is much greedier than the (extremely cautious) exploration of PAC-MDP algorithms.
J. Zico Kolter, Andrew Y. Ng
Added 17 Nov 2009
Updated 17 Nov 2009
Type Conference
Year 2009
Where ICML
Authors J. Zico Kolter, Andrew Y. Ng
Comments (0)