We propose a new approach to value function approximation which combines linear temporal difference reinforcement learning with subspace identification. In practical applications...
A local cell quality metric is introduced and used to construct a variational functional for a grid smoothing algorithm. A maximum principle is proved and the properties of the loc...
— The aquisition and improvement of motor skills and control policies for robotics from trial and error is of essential importance if robots should ever leave precisely pre-struc...
This paper investigates a novel model-free reinforcement learning architecture, the Natural Actor-Critic. The actor updates are based on stochastic policy gradients employing Amari...
We present new algorithms for inverse optimal control (or inverse reinforcement learning, IRL) within the framework of linearlysolvable MDPs (LMDPs). Unlike most prior IRL algorit...