Reinforcement learning problems are commonly tackled with temporal difference methods, which use dynamic programming and statistical sampling to estimate the long-term value of ta...
We introduce the first algorithm for off-policy temporal-difference learning that is stable with linear function approximation. Off-policy learning is of interest because it forms...
We consider learning in a Markov decision process where we are not explicitly given a reward function, but where instead we can observe an expert demonstrating the task that we wa...
This is an introductory book about machine learning. Notice that this is a draft book. It may contain typos, mistakes, etc.
The book covers the following topics: Boolean Functio...
Abstract Consider a situation where a group of agents wishes to share the costs of their joint actions, and needs to determine how to distribute the costs amongst themselves in a f...