We consider the policy search approach to reinforcement learning. We show that if a “baseline distribution” is given (indicating roughly how often we expect a good policy to v...
J. Andrew Bagnell, Sham Kakade, Andrew Y. Ng, Jeff...
In this work we explore how the complexity of a problem domain affects the performance of evolutionary search using a performance-based restart policy. Previous research indicates...
Real-time dynamic programming (RTDP) is a heuristic search algorithm for solving MDPs. We present a modified algorithm called Focused RTDP with several improvements. While RTDP ma...
We consider the task of learning to accurately follow a trajectory in a vehicle such as a car or helicopter. A number of dynamic programming algorithms such as Differential Dynami...
J. Zico Kolter, Adam Coates, Andrew Y. Ng, Yi Gu, ...
Approximate policy iteration methods based on temporal differences are popular in practice, and have been tested extensively, dating to the early nineties, but the associated conve...