We consider reinforcement learning in systems with unknown dynamics. Algorithms such as E3 (Kearns and Singh, 2002) learn near-optimal policies by using "exploration policies...
Abstract. Many reinforcement learning domains are highly relational. While traditional temporal-difference methods can be applied to these domains, they are limited in their capaci...
Trevor Walker, Lisa Torrey, Jude W. Shavlik, Richa...
Abstract. This paper proposes a novel approach to discover options in the form of conditionally terminating sequences, and shows how they can be integrated into reinforcement learn...
We present metric?? , a provably near-optimal algorithm for reinforcement learning in Markov decision processes in which there is a natural metric on the state space that allows t...
This paper describes a method for hierarchical reinforcement learning in which high-level policies automatically discover subgoals, and low-level policies learn to specialize for ...