We introduce the ALeRT (Action-dependent Learning Rates with Trends) algorithm that makes two modifications to the learning rate and one change to the exploration rate of traditio...
Maria Cutumisu, Duane Szafron, Michael H. Bowling,...
Abstract. We consider batch reinforcement learning problems in continuous space, expected total discounted-reward Markovian Decision Problems. As opposed to previous theoretical wo...
In order to claim fully general intelligence in an autonomous agent, the ability to learn is one of the most central capabilities. Classical machine learning techniques have had ma...
The purpose of this paper is to show that a well known machine learning technique based on Decision Trees can be effectively used to select the best approach (in terms of efficien...
In reinforcement learning, least-squares temporal difference methods (e.g., LSTD and LSPI) are effective, data-efficient techniques for policy evaluation and control with linear v...
Michael H. Bowling, Alborz Geramifard, David Winga...