Sciweavers

23 search results - page 4 / 5
» Online Optimization in X-Armed Bandits
Sort
View
COLT
2004
Springer
13 years 10 months ago
Online Geometric Optimization in the Bandit Setting Against an Adaptive Adversary
We give an algorithm for the bandit version of a very general online optimization problem considered by Kalai and Vempala [1], for the case of an adaptive adversary. In this proble...
H. Brendan McMahan, Avrim Blum
COLT
2008
Springer
13 years 6 months ago
Adapting to a Changing Environment: the Brownian Restless Bandits
In the multi-armed bandit (MAB) problem there are k distributions associated with the rewards of playing each of k strategies (slot machine arms). The reward distributions are ini...
Aleksandrs Slivkins, Eli Upfal
ICML
2009
IEEE
14 years 5 months ago
Interactively optimizing information retrieval systems as a dueling bandits problem
We present an on-line learning framework tailored towards real-time learning from observed user behavior in search engines and other information retrieval systems. In particular, ...
Yisong Yue, Thorsten Joachims
STOC
2007
ACM
146views Algorithms» more  STOC 2007»
14 years 5 months ago
Playing games with approximation algorithms
In an online linear optimization problem, on each period t, an online algorithm chooses st S from a fixed (possibly infinite) set S of feasible decisions. Nature (who may be adve...
Sham M. Kakade, Adam Tauman Kalai, Katrina Ligett
ALT
2008
Springer
14 years 2 months ago
Online Regret Bounds for Markov Decision Processes with Deterministic Transitions
Abstract. We consider an upper confidence bound algorithm for Markov decision processes (MDPs) with deterministic transitions. For this algorithm we derive upper bounds on the onl...
Ronald Ortner