Sciweavers

JMLR
2010

Regret Bounds and Minimax Policies under Partial Monitoring

12 years 11 months ago
Regret Bounds and Minimax Policies under Partial Monitoring
This work deals with four classical prediction settings, namely full information, bandit, label efficient and bandit label efficient as well as four different notions of regret: pseudo-regret, expected regret, high probability regret and tracking the best expert regret. We introduce a new forecaster, INF (Implicitly Normalized Forecaster) based on an arbitrary function for which we propose a unified analysis of its pseudo-regret in the four games we consider. In particular, for (x) = exp(x)+ K , INF reduces to the classical exponentially weighted average forecaster and our analysis of the pseudo-regret recovers known results while for the expected regret we slightly tighten the bounds. On the other hand with (x) = -x q + K , which defines a new forecaster, we are able to remove the extraneous logarithmic factor in the pseudo-regret bounds for bandits games, and thus fill in a long open gap in the characterization of the minimax rate for the pseudo-regret in the bandit game. We also p...
Jean-Yves Audibert, Sébastien Bubeck
Added 19 May 2011
Updated 19 May 2011
Type Journal
Year 2010
Where JMLR
Authors Jean-Yves Audibert, Sébastien Bubeck
Comments (0)