Sciweavers

74 search results - page 2 / 15
» Regret Bounds for Gaussian Process Bandit Problems
Sort
View
ICASSP
2011
IEEE
12 years 8 months ago
Logarithmic weak regret of non-Bayesian restless multi-armed bandit
Abstract—We consider the restless multi-armed bandit (RMAB) problem with unknown dynamics. At each time, a player chooses K out of N (N > K) arms to play. The state of each ar...
Haoyang Liu, Keqin Liu, Qing Zhao
ICML
2007
IEEE
14 years 5 months ago
Multi-armed bandit problems with dependent arms
We provide a framework to exploit dependencies among arms in multi-armed bandit problems, when the dependencies are in the form of a generative model on clusters of arms. We find ...
Sandeep Pandey, Deepayan Chakrabarti, Deepak Agarw...
ALT
2007
Springer
14 years 1 months ago
Tuning Bandit Algorithms in Stochastic Environments
Algorithms based on upper-confidence bounds for balancing exploration and exploitation are gaining popularity since they are easy to implement, efficient and effective. In this p...
Jean-Yves Audibert, Rémi Munos, Csaba Szepe...
ALT
2008
Springer
14 years 1 months ago
Online Regret Bounds for Markov Decision Processes with Deterministic Transitions
Abstract. We consider an upper confidence bound algorithm for Markov decision processes (MDPs) with deterministic transitions. For this algorithm we derive upper bounds on the onl...
Ronald Ortner
COLT
2008
Springer
13 years 6 months ago
Regret Bounds for Sleeping Experts and Bandits
We study on-line decision problems where the set of actions that are available to the decision algorithm vary over time. With a few notable exceptions, such problems remained larg...
Robert D. Kleinberg, Alexandru Niculescu-Mizil, Yo...