Although several researchers have integrated methods for reinforcement learning (RL) with case-based reasoning (CBR) to model continuous action spaces, existing integrations typic...
We consider the task of reinforcement learning with linear value function approximation. Temporal difference algorithms, and in particular the Least-Squares Temporal Difference (L...
We call data weakly labeled if it has no exact label but rather a numerical indication of correctness of the label "guessed" by the learning algorithm - a situation comm...
Observational learning algorithm is an ensemble algorithm where each network is initially trained with a bootstrapped data set and virtual data are generated from the ensemble for ...
We consider the policy search approach to reinforcement learning. We show that if a “baseline distribution” is given (indicating roughly how often we expect a good policy to v...
J. Andrew Bagnell, Sham Kakade, Andrew Y. Ng, Jeff...