Sciweavers

16 search results - page 2 / 4
» Deviations of Stochastic Bandit Regret
Sort
View
CORR
2011
Springer
202views Education» more  CORR 2011»
12 years 11 months ago
Online Least Squares Estimation with Self-Normalized Processes: An Application to Bandit Problems
The analysis of online least squares estimation is at the heart of many stochastic sequential decision-making problems. We employ tools from the self-normalized processes to provi...
Yasin Abbasi-Yadkori, Dávid Pál, Csa...
COLT
2007
Springer
13 years 11 months ago
Improved Rates for the Stochastic Continuum-Armed Bandit Problem
Abstract. Considering one-dimensional continuum-armed bandit problems, we propose an improvement of an algorithm of Kleinberg and a new set of conditions which give rise to improve...
Peter Auer, Ronald Ortner, Csaba Szepesvári
COLT
2008
Springer
13 years 6 months ago
Regret Bounds for Sleeping Experts and Bandits
We study on-line decision problems where the set of actions that are available to the decision algorithm vary over time. With a few notable exceptions, such problems remained larg...
Robert D. Kleinberg, Alexandru Niculescu-Mizil, Yo...
COLT
2010
Springer
13 years 2 months ago
An Asymptotically Optimal Bandit Algorithm for Bounded Support Models
Multiarmed bandit problem is a typical example of a dilemma between exploration and exploitation in reinforcement learning. This problem is expressed as a model of a gambler playi...
Junya Honda, Akimichi Takemura
CORR
2010
Springer
152views Education» more  CORR 2010»
12 years 11 months ago
Combinatorial Network Optimization with Unknown Variables: Multi-Armed Bandits with Linear Rewards
In the classic multi-armed bandits problem, the goal is to have a policy for dynamically operating arms that each yield stochastic rewards with unknown means. The key metric of int...
Yi Gai, Bhaskar Krishnamachari, Rahul Jain