Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

210

ALT
2011
Springer

259views Machine Learning» more ALT 2011»

Deviations of Stochastic Bandit Regret

14 years 6 months ago

Deviations of Stochastic Bandit Regret

Download certis.enpc.fr

This paper studies the deviations of the regret in a stochastic multi-armed bandit problem. When the total number of plays n is known beforehand by the agent, Audibert et al. (2009) exhibit a policy such that with probability at least 1 − 1/n, the regret of the policy is of order log n. They have also shown that such a property is not shared by the popular ucb1 policy of Auer et al. (2002). This work ﬁrst answers an open question: it extends this negative result to any anytime policy. The second contribution of this paper is to design anytime robust policies for speciﬁc multi-armed bandit problems in which some restrictions are put on the set of possible distributions of the diﬀerent arms.

Antoine Salomon, Jean-Yves Audibert

Real-time Traffic

ALT 2011 | Bandit Problems | Erent | Machine Learning | Open Question |

claim paper

Related Content

» Pure Exploration in Multiarmed Bandits Problems

» Tuning Bandit Algorithms in Stochastic Environments

» The best of both worlds stochastic and adversarial bandits

» Regret Bounds and Minimax Policies under Partial Monitoring

» Online Least Squares Estimation with SelfNormalized Processes An Application to Bandit Pro...

» Improved Rates for the Stochastic ContinuumArmed Bandit Problem

» Regret Bounds for Sleeping Experts and Bandits

» An Asymptotically Optimal Bandit Algorithm for Bounded Support Models

» Combinatorial Network Optimization with Unknown Variables MultiArmed Bandits with Linear R...

Post Info
More Details (n/a)

Added	12 Dec 2011
Updated	12 Dec 2011
Type	Journal
Year	2011
Where	ALT
Authors	Antoine Salomon, Jean-Yves Audibert

Comments (0)