Sciweavers

ICML
2007
IEEE

Multi-armed bandit problems with dependent arms

14 years 5 months ago
Multi-armed bandit problems with dependent arms
We provide a framework to exploit dependencies among arms in multi-armed bandit problems, when the dependencies are in the form of a generative model on clusters of arms. We find an optimal MDP-based policy for the discounted reward case, and also give an approximation of it with formal error guarantee. We discuss lower bounds on regret in the undiscounted reward scenario, and propose a general two-level bandit policy for it. We propose three different instantiations of our general policy and provide theoretical justifications of how the regret of the instantiated policies depend on the characteristics of the clusters. Finally, we empirically demonstrate the efficacy of our policies on large-scale realworld and synthetic data, and show that they significantly outperform classical policies designed for bandits with independent arms.
Sandeep Pandey, Deepayan Chakrabarti, Deepak Agarw
Added 17 Nov 2009
Updated 17 Nov 2009
Type Conference
Year 2007
Where ICML
Authors Sandeep Pandey, Deepayan Chakrabarti, Deepak Agarwal
Comments (0)