Bandit Algorithms for Tree Search

13 years 9 months ago

Download hal.inria.fr

Bandit based methods for tree search have recently gained popularity when applied to huge trees, e.g. in the game of go [6]. Their eﬃcient exploration of the tree enables to return rapidly a good value, and improve precision if more time is provided. The UCT algorithm [8], a tree search method based on Upper Conﬁdence Bounds (UCB) [2], is believed to adapt locally to the eﬀective smoothness of the tree. However, we show that UCT is “over-optimistic” in some sense, leading to a worst-case regret that may be very poor. We propose alternative bandit algorithms for tree search. First, a modiﬁcation of UCT using a conﬁdence sequence that scales exponentially in the horizon depth is analyzed. We then consider Flat-UCB performed on the leaves and provide a ﬁnite regret bound with high probability. Then, we introduce and analyze a Bandit Algorithm for Smooth Trees (BAST) which takes into account actual smoothness of the rewards for performing eﬃcient “cuts” of sub-optima...

Pierre-Arnaud Coquelin, Rémi Munos

Real-time Traffic