Sciweavers

74 search results - page 4 / 15
» Regret Bounds for Gaussian Process Bandit Problems
Sort
View
ML
2002
ACM
133views Machine Learning» more  ML 2002»
13 years 5 months ago
Finite-time Analysis of the Multiarmed Bandit Problem
Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the search for a balance between exploring the environment to find profitable actions while t...
Peter Auer, Nicolò Cesa-Bianchi, Paul Fisch...
COLT
2007
Springer
13 years 12 months ago
Improved Rates for the Stochastic Continuum-Armed Bandit Problem
Abstract. Considering one-dimensional continuum-armed bandit problems, we propose an improvement of an algorithm of Kleinberg and a new set of conditions which give rise to improve...
Peter Auer, Ronald Ortner, Csaba Szepesvári
NIPS
2007
13 years 7 months ago
The Price of Bandit Information for Online Optimization
In the online linear optimization problem, a learner must choose, in each round, a decision from a set D ⊂ Rn in order to minimize an (unknown and changing) linear cost function...
Varsha Dani, Thomas P. Hayes, Sham Kakade
CORR
2008
Springer
64views Education» more  CORR 2008»
13 years 5 months ago
Linearly Parameterized Bandits
We consider bandit problems involving a large (possibly infinite) collection of arms, in which the expected reward of each arm is a linear function of an r-dimensional random vect...
Paat Rusmevichientong, John N. Tsitsiklis
LION
2010
Springer
190views Optimization» more  LION 2010»
13 years 9 months ago
Algorithm Selection as a Bandit Problem with Unbounded Losses
Abstract. Algorithm selection is typically based on models of algorithm performance learned during a separate offline training sequence, which can be prohibitively expensive. In r...
Matteo Gagliolo, Jürgen Schmidhuber