Bandit algorithms are concerned with trading exploration with exploitation where a number of options are available but we can only learn their quality by experimenting with them. ...
Many applications require optimizing an unknown, noisy function that is expensive to evaluate. We formalize this task as a multiarmed bandit problem, where the payoff function is ...
Niranjan Srinivas, Andreas Krause, Sham Kakade, Ma...
We motivate and analyse a new Tree Search algorithm, based on recent advances in the use of Gaussian Processes for bandit problems. We assume that the function to maximise on the ...
The analysis of online least squares estimation is at the heart of many stochastic sequential decision-making problems. We employ tools from the self-normalized processes to provi...
Abstract. We consider the framework of stochastic multi-armed bandit problems and study the possibilities and limitations of strategies that explore sequentially the arms. The stra...