The Queue Method: Handling Delay, Heuristics, Prior Data, and Evaluation in Bandits

8 years 1 months ago

Download grail.cs.washington.edu

Current algorithms for the standard multi-armed bandit problem have good empirical performance and optimal regret bounds. However, real-world problems often differ from the standard formulation in several ways. First, feedback may be delayed instead of arriving immediately. Second, the real world often contains structure which suggests heuristics, which we wish to incorporate while retaining strong theoretical guarantees. Third, we may wish to make use of an arbitrary prior dataset without negatively impacting performance. Fourth, we may wish to efﬁciently evaluate algorithms using a previously collected dataset. Surprisingly, these seemingly-disparate problems can be addressed using algorithms inspired by a recently-developed queueing technique. We present the Stochastic Delayed Bandits (SDB) algorithm as a solution to these four problems, which takes black-box bandit algorithms (including heuristic approaches) as input while achieving good theoretical guarantees. We present empiri...

Travis Mandel, Yun-En Liu, Emma Brunskill, Zoran P

Real-time Traffic

AAAI 2015 | Intelligent Agents |

claim paper

Post Info
More Details (n/a)

Added	27 Mar 2016
Updated	27 Mar 2016
Type	Journal
Year	2015
Where	AAAI
Authors	Travis Mandel, Yun-En Liu, Emma Brunskill, Zoran Popovic

Comments (0)

Sciweavers

The Queue Method: Handling Delay, Heuristics, Prior Data, and Evaluation in Bandits

AAAI 2015 | Intelligent Agents |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers