Free Online Productivity Tools
i2Speak
i2Symbol
i2OCR
iTex2Img
iWeb2Print
iWeb2Shot
i2Type
iPdf2Split
iPdf2Merge
i2Bopomofo
i2Arabic
i2Style
i2Image
i2PDF
iLatex2Rtf
Sci2ools

CORR

2004

Springer

2004

Springer

We study a general online convex optimization problem. We have a convex set S and an unknown sequence of cost functions c1, c2, . . . , and in each period, we choose a feasible point xt in S, and learn the cost ct(xt). If the function ct is also revealed after each period then, as Zinkevich shows in [25], gradient descent can be used on these functions to get regret bounds of O( n). That is, after n rounds, the total cost incurred will be O( n) more than the cost of the best single feasible decision chosen with the benefit of hindsight, minx ct(x). We extend this to the "bandit" setting, where, in each period, only the cost ct(xt) is revealed, and bound the expected regret as O(n3/4 ). Our approach uses a simple approximation of the gradient that is computed from evaluating ct at a single (random) point. We show that this biased estimate is sufficient to approximate gradient descent on the sequence of functions. In other words, it is possible to use gradient descent withou...

Related Content

Added |
17 Dec 2010 |

Updated |
17 Dec 2010 |

Type |
Journal |

Year |
2004 |

Where |
CORR |

Authors |
Abraham Flaxman, Adam Tauman Kalai, H. Brendan McMahan |

Comments (0)