Adapting to a Changing Environment: the Brownian Restless Bandits

13 years 6 months ago

Download research.microsoft.com

In the multi-armed bandit (MAB) problem there are k distributions associated with the rewards of playing each of k strategies (slot machine arms). The reward distributions are initially unknown to the player. The player iteratively plays one strategy per round, observes the associated reward, and decides on the strategy for the next iteration. The goal is to maximize the reward by balancing exploitation: the use of acquired information, with exploration: learning new information. We introduce and study a dynamic MAB problem in which the reward functions stochastically and gradually change in time. Specifically, the expected reward of each arm follows a Brownian motion, a discrete random walk, or similar processes. In this setting a player has to continuously keep exploring in order to adapt to the changing environment. Our formulation is (roughly) a special case of the notoriously intractable restless MAB problem. Our goal here is to characterize the cost of learning and adapting to t...

Aleksandrs Slivkins, Eli Upfal

Real-time Traffic

COLT 2008 | Dynamic Mab Problem | MAB Problem | Machine Learning | Restless Mab Problem |

claim paper

Added	18 Oct 2010
Updated	18 Oct 2010
Type	Conference
Year	2008
Where	COLT
Authors	Aleksandrs Slivkins, Eli Upfal

Sciweavers

Adapting to a Changing Environment: the Brownian Restless Bandits

COLT 2008 | Dynamic Mab Problem | MAB Problem | Machine Learning | Restless Mab Problem |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers