We consider reinforcement learning in systems with unknown dynamics. Algorithms such as E3 (Kearns and Singh, 2002) learn near-optimal policies by using "exploration policies...
Inverse Reinforcement Learning (IRL) is the problem of learning the reward function underlying a Markov Decision Process given the dynamics of the system and the behaviour of an e...
— Biped robots based on the concept of (passive) dynamic walking are far simpler than the traditional fullycontrolled walking robots, while achieving a more natural gait and cons...
Shouyi Wang, Jelmer Braaksma, Robert Babuska, Daan...
Abstract— Groups of reinforcement learning agents interacting in a common environment often fail to learn optimal behaviors. Poor performance is particularly common in environmen...
In this article, we propose a method to adapt stepsize parameters used in reinforcement learning for dynamic environments. In general reinforcement learning situations, a stepsize...