Regret Bounds for Gaussian Process Bandit Problems

14 years 12 months ago

Download jmlr.csail.mit.edu

Bandit algorithms are concerned with trading exploration with exploitation where a number of options are available but we can only learn their quality by experimenting with them. We consider the scenario in which the reward distribution for arms is modelled by a Gaussian process and there is no noise in the observed reward. Our main result is to bound the regret experienced by algorithms relative to the a posteriori optimal strategy of playing the best arm throughout based on benign assumptions about the covariance function defining the Gaussian process. We further complement these upper bounds with corresponding lower bounds for particular covariance functions demonstrating that in general there is at most a logarithmic looseness in our upper bounds.

Steffen Grünewälder, Jean-Yves Audibert,

Real-time Traffic

Covariance Functions | Gaussian Process | JMLR 2010 | Upper Bounds |

claim paper

» Gaussian Process Bandits for Tree Search

» Online Least Squares Estimation with SelfNormalized Processes An Application to Bandit Pro...

» Pure Exploration in Multiarmed Bandits Problems

» Logarithmic weak regret of nonBayesian restless multiarmed bandit

» Multiarmed bandit problems with dependent arms

» Tuning Bandit Algorithms in Stochastic Environments

» Online Regret Bounds for Markov Decision Processes with Deterministic Transitions

» Regret Bounds for Sleeping Experts and Bandits

Post Info
More Details (n/a)

Added	19 May 2011
Updated	19 May 2011
Type	Journal
Year	2010
Where	JMLR
Authors	Steffen Grünewälder, Jean-Yves Audibert, Manfred Opper, John Shawe-Taylor

Comments (0)

Sciweavers

Regret Bounds for Gaussian Process Bandit Problems

Covariance Functions | Gaussian Process | JMLR 2010 | Upper Bounds |

Explore & Download

Productivity Tools

Sciweavers