Sciweavers

JMLR
2012
11 years 7 months ago
PAC-Bayes-Bernstein Inequality for Martingales and its Application to Multiarmed Bandits
We develop a new tool for data-dependent analysis of the exploration-exploitation trade-off in learning under limited feedback. Our tool is based on two main ingredients. The fi...
Yevgeny Seldin, Nicolò Cesa-Bianchi, Peter ...
AMAI
2011
Springer
12 years 4 months ago
Multi-armed bandits with episode context
A multi-armed bandit episode consists of n trials, each allowing selection of one of K arms, resulting in payoff from a distribution over [0, 1] associated with that arm. We assum...
Christopher D. Rosin
CORR
2011
Springer
202views Education» more  CORR 2011»
12 years 11 months ago
Online Least Squares Estimation with Self-Normalized Processes: An Application to Bandit Problems
The analysis of online least squares estimation is at the heart of many stochastic sequential decision-making problems. We employ tools from the self-normalized processes to provi...
Yasin Abbasi-Yadkori, Dávid Pál, Csa...