Sciweavers

73 search results - page 1 / 15
» Stochastic Linear Optimization under Bandit Feedback
Sort
View
COLT
2008
Springer
13 years 6 months ago
Stochastic Linear Optimization under Bandit Feedback
Varsha Dani, Thomas P. Hayes, Sham M. Kakade
CORR
2012
Springer
210views Education» more  CORR 2012»
12 years 20 days ago
Towards minimax policies for online linear optimization with bandit feedback
We address the online linear optimization problem with bandit feedback. Our contribution is twofold. First, we provide an algorithm (based on exponential weights) with a regret of...
Sébastien Bubeck, Nicolò Cesa-Bianch...
FOCS
2007
IEEE
13 years 11 months ago
Approximation Algorithms for Partial-Information Based Stochastic Control with Markovian Rewards
We consider a variant of the classic multi-armed bandit problem (MAB), which we call FEEDBACK MAB, where the reward obtained by playing each of n independent arms varies according...
Sudipto Guha, Kamesh Munagala
JMLR
2010
103views more  JMLR 2010»
12 years 11 months ago
Regret Bounds and Minimax Policies under Partial Monitoring
This work deals with four classical prediction settings, namely full information, bandit, label efficient and bandit label efficient as well as four different notions of regret: p...
Jean-Yves Audibert, Sébastien Bubeck
JMLR
2012
11 years 7 months ago
PAC-Bayes-Bernstein Inequality for Martingales and its Application to Multiarmed Bandits
We develop a new tool for data-dependent analysis of the exploration-exploitation trade-off in learning under limited feedback. Our tool is based on two main ingredients. The fi...
Yevgeny Seldin, Nicolò Cesa-Bianchi, Peter ...