Search Sciweavers | Sciweavers

74 search results - page 4 / 15

» Regret Bounds for Gaussian Process Bandit Problems

click to vote

ML
2002
ACM

133views Machine Learning» more ML 2002»

Finite-time Analysis of the Multiarmed Bandit Problem

13 years 5 months ago

Download homes.dsi.unimi.it

Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the search for a balance between exploring the environment to find profitable actions while t...

Peter Auer, Nicolò Cesa-Bianchi, Paul Fisch...

claim paper

Read More »

click to vote

COLT
2007
Springer

144views Machine Learning» more COLT 2007»

Improved Rates for the Stochastic Continuum-Armed Bandit Problem

13 years 12 months ago

Download www.sztaki.hu

Abstract. Considering one-dimensional continuum-armed bandit problems, we propose an improvement of an algorithm of Kleinberg and a new set of conditions which give rise to improve...

Peter Auer, Ronald Ortner, Csaba Szepesvári

claim paper

Read More »

click to vote

NIPS
2007

135views Information Technology» more NIPS 2007»

The Price of Bandit Information for Online Optimization

13 years 7 months ago

Download books.nips.cc

In the online linear optimization problem, a learner must choose, in each round, a decision from a set D ⊂ Rn in order to minimize an (unknown and changing) linear cost function...

Varsha Dani, Thomas P. Hayes, Sham Kakade

claim paper

Read More »

click to vote

CORR
2008
Springer

64views Education» more CORR 2008»

Linearly Parameterized Bandits

13 years 5 months ago

Download legacy.orie.cornell.edu

We consider bandit problems involving a large (possibly infinite) collection of arms, in which the expected reward of each arm is a linear function of an r-dimensional random vect...

Paat Rusmevichientong, John N. Tsitsiklis

claim paper

Read More »

click to vote

LION
2010
Springer

190views Optimization» more LION 2010»

Algorithm Selection as a Bandit Problem with Unbounded Losses

13 years 9 months ago

Download como.vub.ac.be

Abstract. Algorithm selection is typically based on models of algorithm performance learned during a separate ofﬂine training sequence, which can be prohibitively expensive. In r...

Matteo Gagliolo, Jürgen Schmidhuber

claim paper

Read More »

« Prev « First page 4 / 15 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers