Sciweavers

590 search results - page 68 / 118
» Can We Learn to Beat the Best Stock
Sort
View
JMLR
2010
125views more  JMLR 2010»
14 years 4 months ago
Regret Bounds for Gaussian Process Bandit Problems
Bandit algorithms are concerned with trading exploration with exploitation where a number of options are available but we can only learn their quality by experimenting with them. ...
Steffen Grünewälder, Jean-Yves Audibert,...
ICML
1999
IEEE
15 years 10 months ago
Distributed Value Functions
Many interesting problems, such as power grids, network switches, and tra c ow, that are candidates for solving with reinforcement learningRL, alsohave properties that make distri...
Jeff G. Schneider, Weng-Keen Wong, Andrew W. Moore...
PAKDD
2010
ACM
117views Data Mining» more  PAKDD 2010»
15 years 2 months ago
BASSET: Scalable Gateway Finder in Large Graphs
Given a social network, who is the best person to introduce you to, say, Chris Ferguson, the poker champion? Or, given a network of people and skills, who is the best person to he...
Hanghang Tong, Spiros Papadimitriou, Christos Falo...
MLMI
2007
Springer
15 years 3 months ago
Meeting State Recognition from Visual and Aural Labels
In this paper we present a meeting state recognizer based on a combination of multi-modal sensor data in a smart room. Our approach is based on the training of a statistical model ...
Jan Curín, Pascal Fleury, Jan Kleindienst, ...
UAI
2008
14 years 11 months ago
Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping
We consider the problem of efficiently learning optimal control policies and value functions over large state spaces in an online setting in which estimates must be available afte...
Richard S. Sutton, Csaba Szepesvári, Alborz...