Actor-critic algorithms for reinforcement learning are achieving renewed popularity due to their good convergence properties in situations where other approaches often fail (e.g.,...
We consider the task of reinforcement learning with linear value function approximation. Temporal difference algorithms, and in particular the Least-Squares Temporal Difference (L...
This paper deals with value (and Q-) function approximation in deterministic Markovian decision processes (MDPs). A general statistical framework based on the Kalman filtering pa...
Approximate policy iteration methods based on temporal differences are popular in practice, and have been tested extensively, dating to the early nineties, but the associated conve...
Reinforcement learning problems are commonly tackled with temporal difference methods, which use dynamic programming and statistical sampling to estimate the long-term value of ta...