Sciweavers

12 search results - page 1 / 3
» Finite-time Analysis of the Multiarmed Bandit Problem
Sort
View
JMLR
2012
11 years 7 months ago
PAC-Bayes-Bernstein Inequality for Martingales and its Application to Multiarmed Bandits
We develop a new tool for data-dependent analysis of the exploration-exploitation trade-off in learning under limited feedback. Our tool is based on two main ingredients. The fi...
Yevgeny Seldin, Nicolò Cesa-Bianchi, Peter ...
ML
2002
ACM
133views Machine Learning» more  ML 2002»
13 years 4 months ago
Finite-time Analysis of the Multiarmed Bandit Problem
Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the search for a balance between exploring the environment to find profitable actions while t...
Peter Auer, Nicolò Cesa-Bianchi, Paul Fisch...
CORR
2011
Springer
202views Education» more  CORR 2011»
12 years 11 months ago
Online Least Squares Estimation with Self-Normalized Processes: An Application to Bandit Problems
The analysis of online least squares estimation is at the heart of many stochastic sequential decision-making problems. We employ tools from the self-normalized processes to provi...
Yasin Abbasi-Yadkori, Dávid Pál, Csa...
COLT
2010
Springer
13 years 2 months ago
Best Arm Identification in Multi-Armed Bandits
We consider the problem of finding the best arm in a stochastic multi-armed bandit game. The regret of a forecaster is here defined by the gap between the mean reward of the optim...
Jean-Yves Audibert, Sébastien Bubeck, R&eac...
COLT
2010
Springer
13 years 2 months ago
An Asymptotically Optimal Bandit Algorithm for Bounded Support Models
Multiarmed bandit problem is a typical example of a dilemma between exploration and exploitation in reinforcement learning. This problem is expressed as a model of a gambler playi...
Junya Honda, Akimichi Takemura