Abstract. We present first experiments using Support Vector Regression as function approximator for an on-line, sarsa-like reinforcement learner. To overcome the batch nature of S...
We propose a general method to watermark and probabilistically identify the structured outputs of machine learning algorithms. Our method is robust to local editing operations and...
Ashish Venugopal, Jakob Uszkoreit, David Talbot, F...
We present metric?? , a provably near-optimal algorithm for reinforcement learning in Markov decision processes in which there is a natural metric on the state space that allows t...
Approximate Linear Programming (ALP) is a reinforcement learning technique with nice theoretical properties, but it often performs poorly in practice. We identify some reasons for...
In Reinforcement Learning (RL) there has been some experimental evidence that the residual gradient algorithm converges slower than the TD(0) algorithm. In this paper, we use the ...