Sciweavers

16 search results - page 1 / 4
» Deviations of Stochastic Bandit Regret
Sort
View
ALT
2011
Springer
12 years 4 months ago
Deviations of Stochastic Bandit Regret
This paper studies the deviations of the regret in a stochastic multi-armed bandit problem. When the total number of plays n is known beforehand by the agent, Audibert et al. (2009...
Antoine Salomon, Jean-Yves Audibert
ALT
2009
Springer
14 years 1 months ago
Pure Exploration in Multi-armed Bandits Problems
Abstract. We consider the framework of stochastic multi-armed bandit problems and study the possibilities and limitations of strategies that explore sequentially the arms. The stra...
Sébastien Bubeck, Rémi Munos, Gilles...
ALT
2007
Springer
14 years 1 months ago
Tuning Bandit Algorithms in Stochastic Environments
Algorithms based on upper-confidence bounds for balancing exploration and exploitation are gaining popularity since they are easy to implement, efficient and effective. In this p...
Jean-Yves Audibert, Rémi Munos, Csaba Szepe...
CORR
2012
Springer
192views Education» more  CORR 2012»
12 years 1 days ago
The best of both worlds: stochastic and adversarial bandits
We present a bandit algorithm, SAO (Stochastic and Adversarial Optimal), whose regret is, essentially, optimal both for adversarial rewards and for stochastic rewards. Specifical...
Sébastien Bubeck, Aleksandrs Slivkins
JMLR
2010
103views more  JMLR 2010»
12 years 11 months ago
Regret Bounds and Minimax Policies under Partial Monitoring
This work deals with four classical prediction settings, namely full information, bandit, label efficient and bandit label efficient as well as four different notions of regret: p...
Jean-Yves Audibert, Sébastien Bubeck