Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

89

CORR
2004
Springer

favoriteEmaildiscussreport

103views Education» more CORR 2004»

Online convex optimization in the bandit setting: gradient descent without a gradient

14 years 11 months ago

Online convex optimization in the bandit setting: gradient descent without a gradient

Download www.cs.cmu.edu

We study a general online convex optimization problem. We have a convex set S and an unknown sequence of cost functions c1, c2, . . . , and in each period, we choose a feasible point xt in S, and learn the cost ct(xt). If the function ct is also revealed after each period then, as Zinkevich shows in [25], gradient descent can be used on these functions to get regret bounds of O( n). That is, after n rounds, the total cost incurred will be O( n) more than the cost of the best single feasible decision chosen with the benefit of hindsight, minx ct(x). We extend this to the "bandit" setting, where, in each period, only the cost ct(xt) is revealed, and bound the expected regret as O(n3/4 ). Our approach uses a simple approximation of the gradient that is computed from evaluating ct at a single (random) point. We show that this biased estimate is sufficient to approximate gradient descent on the sequence of functions. In other words, it is possible to use gradient descent withou...

Abraham Flaxman, Adam Tauman Kalai, H. Brendan McM

Real-time Traffic

Convex Optimization Problem | CORR 2004 | Cost Ct | Education | Optimization Problem |

claim paper

Related Content

» Composite Objective Mirror Descent

» Adaptive Online Gradient Descent

» Mind the Duality Gap Logarithmic regret algorithms for online optimization

» Applying Online Gradient Descent Search to Genetic Programming for Object Recognition

» Logarithmic Regret Algorithms for Online Convex Optimization

» Using Gradient Descent Optimization for Acoustics Training from Heterogeneous Data

» The LCCP for Optimizing Kernel Parameters for SVM

» Online Convex Programming and Generalized Infinitesimal Gradient Ascent

» Exponentiated Gradient Algorithms for Conditional Random Fields and MaxMargin Markov Netwo...

Post Info
More Details (n/a)

Added	17 Dec 2010
Updated	17 Dec 2010
Type	Journal
Year	2004
Where	CORR
Authors	Abraham Flaxman, Adam Tauman Kalai, H. Brendan McMahan

Comments (0)