Sciweavers

16 search results - page 2 / 4
» Deviations of Stochastic Bandit Regret
Sort
View
CORR
2011
Springer
202views Education» more  CORR 2011»
14 years 4 months ago
Online Least Squares Estimation with Self-Normalized Processes: An Application to Bandit Problems
The analysis of online least squares estimation is at the heart of many stochastic sequential decision-making problems. We employ tools from the self-normalized processes to provi...
Yasin Abbasi-Yadkori, Dávid Pál, Csa...
COLT
2007
Springer
15 years 3 months ago
Improved Rates for the Stochastic Continuum-Armed Bandit Problem
Abstract. Considering one-dimensional continuum-armed bandit problems, we propose an improvement of an algorithm of Kleinberg and a new set of conditions which give rise to improve...
Peter Auer, Ronald Ortner, Csaba Szepesvári
COLT
2008
Springer
14 years 11 months ago
Regret Bounds for Sleeping Experts and Bandits
We study on-line decision problems where the set of actions that are available to the decision algorithm vary over time. With a few notable exceptions, such problems remained larg...
Robert D. Kleinberg, Alexandru Niculescu-Mizil, Yo...
COLT
2010
Springer
14 years 7 months ago
An Asymptotically Optimal Bandit Algorithm for Bounded Support Models
Multiarmed bandit problem is a typical example of a dilemma between exploration and exploitation in reinforcement learning. This problem is expressed as a model of a gambler playi...
Junya Honda, Akimichi Takemura
89
Voted
CORR
2010
Springer
152views Education» more  CORR 2010»
14 years 4 months ago
Combinatorial Network Optimization with Unknown Variables: Multi-Armed Bandits with Linear Rewards
In the classic multi-armed bandits problem, the goal is to have a policy for dynamically operating arms that each yield stochastic rewards with unknown means. The key metric of int...
Yi Gai, Bhaskar Krishnamachari, Rahul Jain