Sciweavers

227 search results - page 2 / 46
» Linearly Parameterized Bandits
Sort
View
COLT
2008
Springer
13 years 7 months ago
High-Probability Regret Bounds for Bandit Online Linear Optimization
We present a modification of the algorithm of Dani et al. [8] for the online linear optimization problem in the bandit setting, which with high probability has regret at most O ( ...
Peter L. Bartlett, Varsha Dani, Thomas P. Hayes, S...
CORR
2012
Springer
210views Education» more  CORR 2012»
12 years 29 days ago
Towards minimax policies for online linear optimization with bandit feedback
We address the online linear optimization problem with bandit feedback. Our contribution is twofold. First, we provide an algorithm (based on exponential weights) with a regret of...
Sébastien Bubeck, Nicolò Cesa-Bianch...
CORR
2010
Springer
152views Education» more  CORR 2010»
13 years 6 days ago
Combinatorial Network Optimization with Unknown Variables: Multi-Armed Bandits with Linear Rewards
In the classic multi-armed bandits problem, the goal is to have a policy for dynamically operating arms that each yield stochastic rewards with unknown means. The key metric of int...
Yi Gai, Bhaskar Krishnamachari, Rahul Jain
COLT
2008
Springer
13 years 7 months ago
Stochastic Linear Optimization under Bandit Feedback
Varsha Dani, Thomas P. Hayes, Sham M. Kakade
ICML
2001
IEEE
14 years 6 months ago
Expectation Maximization for Weakly Labeled Data
We call data weakly labeled if it has no exact label but rather a numerical indication of correctness of the label "guessed" by the learning algorithm - a situation comm...
Yuri A. Ivanov, Bruce Blumberg, Alex Pentland