Sciweavers

22 search results - page 5 / 5
» Contextual Multi-Armed Bandits
Sort
View
ICML
2008
IEEE
14 years 6 months ago
Exploration scavenging
We examine the problem of evaluating a policy in the contextual bandit setting using only observations collected during the execution of another policy. We show that policy evalua...
John Langford, Alexander L. Strehl, Jennifer Wortm...
CORR
2011
Springer
161views Education» more  CORR 2011»
12 years 9 months ago
Doubly Robust Policy Evaluation and Learning
We study decision making in environments where the reward is only partially observed, but can be modeled as a function of an action and an observed context. This setting, known as...
Miroslav Dudík, John Langford, Lihong Li