Sciweavers

COLT
2010
Springer

Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback

13 years 2 months ago
Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback
Bandit convex optimization is a special case of online convex optimization with partial information. In this setting, a player attempts to minimize a sequence of adversarially generated convex loss functions, while only observing the value of each function at a single point. In some cases, the minimax regret of these problems is known to be strictly worse than the minimax regret in the corresponding full information setting. We introduce the multi-point bandit setting, in which the player can query each loss function at multiple points. When the player is allowed to query each function at two points, we prove regret bounds that closely resemble bounds for the full information case. This suggests that knowing the value of each loss function at two points is almost as useful as knowing the value of each function everywhere. When the player is allowed to query each function at d + 1 points (d being the dimension of the space), we prove regret bounds that are exactly equivalent to full in...
Alekh Agarwal, Ofer Dekel, Lin Xiao
Added 10 Feb 2011
Updated 10 Feb 2011
Type Journal
Year 2010
Where COLT
Authors Alekh Agarwal, Ofer Dekel, Lin Xiao
Comments (0)