Mortal Multi-Armed Bandits

11 years 1 months ago
Mortal Multi-Armed Bandits
We formulate and study a new variant of the k-armed bandit problem, motivated by e-commerce applications. In our model, arms have (stochastic) lifetime after which they expire. In this setting an algorithm needs to continuously explore new arms, in contrast to the standard k-armed bandit model in which arms are available indefinitely and exploration is reduced once an optimal arm is identified with nearcertainty. The main motivation for our setting is online-advertising, where ads have limited lifetime due to, for example, the nature of their content and their campaign budgets. An algorithm needs to choose among a large collection of ads, more than can be fully explored within the typical ad lifetime. We present an optimal algorithm for the state-aware (deterministic reward function) case, and build on this technique to obtain an algorithm for the state-oblivious (stochastic reward function) case. Empirical studies on various reward distributions, including one derived from a real-wor...
Deepayan Chakrabarti, Ravi Kumar, Filip Radlinski,
Added 30 Oct 2010
Updated 30 Oct 2010
Type Conference
Year 2008
Where NIPS
Authors Deepayan Chakrabarti, Ravi Kumar, Filip Radlinski, Eli Upfal
Comments (0)