Regret Bounds for Sleeping Experts and Bandits

15 years 1 months ago

Download colt2008.cs.helsinki.fi

We study on-line decision problems where the set of actions that are available to the decision algorithm vary over time. With a few notable exceptions, such problems remained largely unaddressed in the literature, despite their applicability to a large number of practical problems. Departing from previous work on this "Sleeping Experts" problem, we compare algorithms against the payoff obtained by the best ordering of the actions, which is a natural benchmark for this type of problem. We study both the full-information (best expert) and partial-information (multi-armed bandit) settings and consider both stochastic and adaptive adversaries. For all settings we give algorithms achieving (almost) information-theoretically optimal regret bounds (up to a constant or a sublogarithmic factor) with respect to the best-ordering benchmark.

Robert D. Kleinberg, Alexandru Niculescu-Mizil, Yo

Real-time Traffic

Algorithms | COLT 2008 | Decision Algorithm | Machine Learning | On-line Decision Problems |

claim paper

Related Content

» From External to Internal Regret

» Online Least Squares Estimation with SelfNormalized Processes An Application to Bandit Pro...

» Regret Bounds and Minimax Policies under Partial Monitoring

» Algorithm Selection as a Bandit Problem with Unbounded Losses

Post Info
More Details (n/a)

Added	18 Oct 2010
Updated	18 Oct 2010
Type	Conference
Year	2008
Where	COLT
Authors	Robert D. Kleinberg, Alexandru Niculescu-Mizil, Yogeshwer Sharma

Comments (0)

Sciweavers

Regret Bounds for Sleeping Experts and Bandits

Algorithms | COLT 2008 | Decision Algorithm | Machine Learning | On-line Decision Problems |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers