Sciweavers

2 search results - page 1 / 1
» Contextual Bandit Learning with Predictable Rewards
Sort
View
JMLR
2012
11 years 7 months ago
Contextual Bandit Learning with Predictable Rewards
Contextual bandit learning is a reinforcement learning problem where the learner repeatedly receives a set of features (context), takes an action and receives a reward based on th...
Alekh Agarwal, Miroslav Dudík, Satyen Kale,...
CORR
2011
Springer
161views Education» more  CORR 2011»
12 years 8 months ago
Doubly Robust Policy Evaluation and Learning
We study decision making in environments where the reward is only partially observed, but can be modeled as a function of an action and an observed context. This setting, known as...
Miroslav Dudík, John Langford, Lihong Li