Sciweavers

ECML
2006
Springer

Bandit Based Monte-Carlo Planning

13 years 8 months ago
Bandit Based Monte-Carlo Planning
Abstract. For large state-space Markovian Decision Problems MonteCarlo planning is one of the few viable approaches to find near-optimal solutions. In this paper we introduce a new algorithm, UCT, that applies bandit ideas to guide Monte-Carlo planning. In finite-horizon or discounted MDPs the algorithm is shown to be consistent and finite sample bounds are derived on the estimation error due to sampling. Experimental results show that in several domains, UCT is significantly more efficient than its alternatives.
Levente Kocsis, Csaba Szepesvári
Added 22 Aug 2010
Updated 22 Aug 2010
Type Conference
Year 2006
Where ECML
Authors Levente Kocsis, Csaba Szepesvári
Comments (0)