Hierarchical reinforcement learning with subpolicies specializing for learned subgoals

13 years 6 months ago

Download staff.science.uva.nl

This paper describes a method for hierarchical reinforcement learning in which high-level policies automatically discover subgoals, and low-level policies learn to specialize for different subgoals. Subgoals are represented as destract observations which cluster raw input data. High-level value functions cover the state space at a coarse level; low-level value functions cover only parts of the state space at a fine-grained level. An experiment shows that this method outperforms several flat reinforcement learning methods. A second experiment shows how problems of observability due to observation abstraction can be overcome using high-level policies with memory. Key words Reinforcement learning, hierarchical reinforcement learning, feedforward neural networks, recurrent neural networks, MDPs, POMDPs, short-term memory

Bram Bakker, Jürgen Schmidhuber

Real-time Traffic