Sciweavers

5 search results - page 1 / 1
» Regret Bounds for Sleeping Experts and Bandits
Sort
View
COLT
2008
Springer
13 years 6 months ago
Regret Bounds for Sleeping Experts and Bandits
We study on-line decision problems where the set of actions that are available to the decision algorithm vary over time. With a few notable exceptions, such problems remained larg...
Robert D. Kleinberg, Alexandru Niculescu-Mizil, Yo...
COLT
2005
Springer
13 years 6 months ago
From External to Internal Regret
External regret compares the performance of an online algorithm, selecting among N actions, to the performance of the best of those actions in hindsight. Internal regret compares ...
Avrim Blum, Yishay Mansour
CORR
2011
Springer
202views Education» more  CORR 2011»
12 years 12 months ago
Online Least Squares Estimation with Self-Normalized Processes: An Application to Bandit Problems
The analysis of online least squares estimation is at the heart of many stochastic sequential decision-making problems. We employ tools from the self-normalized processes to provi...
Yasin Abbasi-Yadkori, Dávid Pál, Csa...
JMLR
2010
103views more  JMLR 2010»
12 years 11 months ago
Regret Bounds and Minimax Policies under Partial Monitoring
This work deals with four classical prediction settings, namely full information, bandit, label efficient and bandit label efficient as well as four different notions of regret: p...
Jean-Yves Audibert, Sébastien Bubeck
LION
2010
Springer
190views Optimization» more  LION 2010»
13 years 8 months ago
Algorithm Selection as a Bandit Problem with Unbounded Losses
Abstract. Algorithm selection is typically based on models of algorithm performance learned during a separate offline training sequence, which can be prohibitively expensive. In r...
Matteo Gagliolo, Jürgen Schmidhuber