Sciweavers

74 search results - page 6 / 15
» Regret Bounds for Gaussian Process Bandit Problems
Sort
View
91
Voted
CORR
2004
Springer
103views Education» more  CORR 2004»
14 years 11 months ago
Online convex optimization in the bandit setting: gradient descent without a gradient
We study a general online convex optimization problem. We have a convex set S and an unknown sequence of cost functions c1, c2, . . . , and in each period, we choose a feasible po...
Abraham Flaxman, Adam Tauman Kalai, H. Brendan McM...
CORR
2006
Springer
83views Education» more  CORR 2006»
14 years 11 months ago
How to Beat the Adaptive Multi-Armed Bandit
The multi-armed bandit is a concise model for the problem of iterated decision-making under uncertainty. In each round, a gambler must pull one of K arms of a slot machine, withou...
Varsha Dani, Thomas P. Hayes
CORR
2007
Springer
106views Education» more  CORR 2007»
14 years 11 months ago
Bandit Algorithms for Tree Search
Bandit based methods for tree search have recently gained popularity when applied to huge trees, e.g. in the game of go [6]. Their efficient exploration of the tree enables to ret...
Pierre-Arnaud Coquelin, Rémi Munos
119
Voted
SIGMOD
2012
ACM
210views Database» more  SIGMOD 2012»
13 years 2 months ago
Interactive regret minimization
We study the notion of regret ratio proposed in [19] to deal with multi-criteria decision making in database systems. The regret minimization query proposed in [19] was shown to h...
Danupon Nanongkai, Ashwin Lall, Atish Das Sarma, K...
TSP
2010
14 years 6 months ago
Distributed learning in multi-armed bandit with multiple players
We formulate and study a decentralized multi-armed bandit (MAB) problem. There are distributed players competing for independent arms. Each arm, when played, offers i.i.d. reward a...
Keqin Liu, Qing Zhao