Sciweavers

74 search results - page 5 / 15
» Regret Bounds for Gaussian Process Bandit Problems
Sort
View
ICML
2009
IEEE
16 years 2 months ago
Piecewise-stationary bandit problems with side observations
We consider a sequential decision problem where the rewards are generated by a piecewise-stationary distribution. However, the different reward distributions are unknown and may c...
Jia Yuan Yu, Shie Mannor
CORR
2010
Springer
187views Education» more  CORR 2010»
15 years 1 months ago
Learning in A Changing World: Non-Bayesian Restless Multi-Armed Bandit
We consider the restless multi-armed bandit (RMAB) problem with unknown dynamics. In this problem, at each time, a player chooses K out of N (N > K) arms to play. The state of ...
Haoyang Liu, Keqin Liu, Qing Zhao
COLT
2010
Springer
14 years 11 months ago
Nonparametric Bandits with Covariates
We consider a bandit problem which involves sequential sampling from two populations (arms). Each arm produces a noisy reward realization which depends on an observable random cov...
Philippe Rigollet, Assaf Zeevi
COLT
2005
Springer
15 years 3 months ago
From External to Internal Regret
External regret compares the performance of an online algorithm, selecting among N actions, to the performance of the best of those actions in hindsight. Internal regret compares ...
Avrim Blum, Yishay Mansour
JMLR
2012
13 years 4 months ago
Contextual Bandit Learning with Predictable Rewards
Contextual bandit learning is a reinforcement learning problem where the learner repeatedly receives a set of features (context), takes an action and receives a reward based on th...
Alekh Agarwal, Miroslav Dudík, Satyen Kale,...