Sciweavers

263 search results - page 2 / 53
» Regret Bounds for Prediction Problems
Sort
View
JMLR
2012
11 years 7 months ago
Contextual Bandit Learning with Predictable Rewards
Contextual bandit learning is a reinforcement learning problem where the learner repeatedly receives a set of features (context), takes an action and receives a reward based on th...
Alekh Agarwal, Miroslav Dudík, Satyen Kale,...
COCOON
2006
Springer
13 years 8 months ago
Approximating Min-Max (Regret) Versions of Some Polynomial Problems
Abstract. While the complexity of min-max and min-max regret versions of most classical combinatorial optimization problems has been thoroughly investigated, there are very few stu...
Hassene Aissi, Cristina Bazgan, Daniel Vanderpoote...
ALT
2007
Springer
14 years 1 months ago
Tuning Bandit Algorithms in Stochastic Environments
Algorithms based on upper-confidence bounds for balancing exploration and exploitation are gaining popularity since they are easy to implement, efficient and effective. In this p...
Jean-Yves Audibert, Rémi Munos, Csaba Szepe...
JMLR
2010
125views more  JMLR 2010»
12 years 11 months ago
Regret Bounds for Gaussian Process Bandit Problems
Bandit algorithms are concerned with trading exploration with exploitation where a number of options are available but we can only learn their quality by experimenting with them. ...
Steffen Grünewälder, Jean-Yves Audibert,...
ECCC
2010
80views more  ECCC 2010»
13 years 4 months ago
Regret Minimization for Online Buffering Problems Using the Weighted Majority Algorithm
Suppose a decision maker has to purchase a commodity over time with varying prices and demands. In particular, the price per unit might depend on the amount purchased and this pri...
Melanie Winkler, Berthold Vöcking, Sascha Geu...