Learning when to stop thinking and do something!

10 years 10 months ago
Learning when to stop thinking and do something!
An anytime algorithm is capable of returning a response to the given task at essentially any time; typically the quality of the response improves as the time increases. Here, we consider the challenge of learning when we should terminate such algorithms on each of a sequence of iid tasks, to optimize the expected average reward per unit time. We provide a system for addressing this challenge, which combines the global optimizer CrossEntropy method with local gradient ascent. This paper theoretically investigates how far the estimated gradient is from the true gradient, then empirically demonstrates that this system is effective by applying it to a toy problem, as well as on a real-world face detection task.
Barnabás Póczos, Csaba Szepesv&aacut
Added 17 Nov 2009
Updated 17 Nov 2009
Type Conference
Year 2009
Where ICML
Authors Barnabás Póczos, Csaba Szepesvári, Nathan R. Sturtevant, Russell Greiner, Yasin Abbasi-Yadkori
Comments (0)