Policy gradient methods for reinforcement learning avoid some of the undesirable properties of the value function approaches, such as policy degradation (Baxter and Bartlett, 2001...
Evan Greensmith, Peter L. Bartlett, Jonathan Baxte...
Human pose estimation is the task of determining the states (location, orientation and scale) of each body part. It is important for many vision understanding applications, e.g. v...
Occlusion is a difficult problem for appearance-based target tracking, especially when we need to track multiple targets simultaneously and maintain the target identities during t...
In a variety of disciplines such as social sciences, psychology, medicine and economics, the recorded data are considered to be noisy measurements of latent variables connected by...
We show that states of a dynamical system can be usefully represented by multi-step, action-conditional predictions of future observations. State representations that are grounded...
Michael L. Littman, Richard S. Sutton, Satinder P....