Sciweavers

103 search results - page 1 / 21
» An Asymptotically Optimal Bandit Algorithm for Bounded Suppo...
Sort
View
COLT
2010
Springer
13 years 2 months ago
An Asymptotically Optimal Bandit Algorithm for Bounded Support Models
Multiarmed bandit problem is a typical example of a dilemma between exploration and exploitation in reinforcement learning. This problem is expressed as a model of a gambler playi...
Junya Honda, Akimichi Takemura
ALT
2008
Springer
14 years 1 months ago
Online Regret Bounds for Markov Decision Processes with Deterministic Transitions
Abstract. We consider an upper confidence bound algorithm for Markov decision processes (MDPs) with deterministic transitions. For this algorithm we derive upper bounds on the onl...
Ronald Ortner
NIPS
2004
13 years 5 months ago
Nearly Tight Bounds for the Continuum-Armed Bandit Problem
In the multi-armed bandit problem, an online algorithm must choose from a set of strategies in a sequence of n trials so as to minimize the total cost of the chosen strategies. Wh...
Robert D. Kleinberg
LION
2010
Springer
190views Optimization» more  LION 2010»
13 years 8 months ago
Algorithm Selection as a Bandit Problem with Unbounded Losses
Abstract. Algorithm selection is typically based on models of algorithm performance learned during a separate offline training sequence, which can be prohibitively expensive. In r...
Matteo Gagliolo, Jürgen Schmidhuber
JMLR
2010
125views more  JMLR 2010»
12 years 11 months ago
Regret Bounds for Gaussian Process Bandit Problems
Bandit algorithms are concerned with trading exploration with exploitation where a number of options are available but we can only learn their quality by experimenting with them. ...
Steffen Grünewälder, Jean-Yves Audibert,...