Sciweavers

ICML
2003
IEEE

TD(0) Converges Provably Faster than the Residual Gradient Algorithm

14 years 5 months ago
TD(0) Converges Provably Faster than the Residual Gradient Algorithm
In Reinforcement Learning (RL) there has been some experimental evidence that the residual gradient algorithm converges slower than the TD(0) algorithm. In this paper, we use the concept of asymptotic convergence rate to prove that under certain conditions the synchronous off-policy TD(0) algorithm converges faster than the synchronous offpolicy residual gradient algorithm if the value function is represented in tabular form. This is the first theoretical result comparing the convergence behaviour of two RL algorithms. We also show that as soon as linear function approximation is involved no general statement concerning the superiority of one of the algorithms can be made.
Ralf Schoknecht, Artur Merke
Added 17 Nov 2009
Updated 17 Nov 2009
Type Conference
Year 2003
Where ICML
Authors Ralf Schoknecht, Artur Merke
Comments (0)