We introduce the ALeRT (Action-dependent Learning Rates with Trends) algorithm that makes two modifications to the learning rate and one change to the exploration rate of traditio...
Maria Cutumisu, Duane Szafron, Michael H. Bowling,...
We bound the future loss when predicting any (computably) stochastic sequence online. Solomonoff finitely bounded the total deviation of his universal predictor M from the true ...
abstract Niels Landwehr1 , Taneli Mielik¨ainen2 , Lauri Eronen2 , Hannu Toivonen1,2 , and Heikki Mannila2 1 Machine Learning Lab, Dept. of Comp. Science, University of Freiburg, G...
Eligibility traces have been shown to speed reinforcement learning, to make it more robust to hidden states, and to provide a link between Monte Carlo and temporal-difference meth...
Doina Precup, Richard S. Sutton, Satinder P. Singh
Abstract. We present first experiments using Support Vector Regression as function approximator for an on-line, sarsa-like reinforcement learner. To overcome the batch nature of S...