Sciweavers

CORR
2012
Springer
192views Education» more  CORR 2012»
12 years 6 days ago
The best of both worlds: stochastic and adversarial bandits
We present a bandit algorithm, SAO (Stochastic and Adversarial Optimal), whose regret is, essentially, optimal both for adversarial rewards and for stochastic rewards. Specifical...
Sébastien Bubeck, Aleksandrs Slivkins