Regret Bounds and Minimax Policies under Partial Monitoring

12 years 11 months ago

Download jmlr.csail.mit.edu

This work deals with four classical prediction settings, namely full information, bandit, label efficient and bandit label efficient as well as four different notions of regret: pseudo-regret, expected regret, high probability regret and tracking the best expert regret. We introduce a new forecaster, INF (Implicitly Normalized Forecaster) based on an arbitrary function for which we propose a unified analysis of its pseudo-regret in the four games we consider. In particular, for (x) = exp(x)+ K , INF reduces to the classical exponentially weighted average forecaster and our analysis of the pseudo-regret recovers known results while for the expected regret we slightly tighten the bounds. On the other hand with (x) = -x q + K , which defines a new forecaster, we are able to remove the extraneous logarithmic factor in the pseudo-regret bounds for bandits games, and thus fill in a long open gap in the characterization of the minimax rate for the pseudo-regret in the bandit game. We also p...

Jean-Yves Audibert, Sébastien Bubeck

Real-time Traffic

Bandit | Bandit Game | Forecaster | JMLR 2010 |

claim paper

Added	19 May 2011
Updated	19 May 2011
Type	Journal
Year	2010
Where	JMLR
Authors	Jean-Yves Audibert, Sébastien Bubeck

Sciweavers

Regret Bounds and Minimax Policies under Partial Monitoring

Bandit | Bandit Game | Forecaster | JMLR 2010 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers