Consider an agent interacting with an environment in cycles. In every interaction cycle the agent is rewarded for its performance. We compare the average reward U from cycle 1 to ...
We provide an analytical comparison between discounted and average reward temporal-difference (TD) learning with linearly parameterized approximations. We first consider the asympt...
Hierarchical reinforcement learning (RL) is a general framework which studies how to exploit the structure of actions and tasks to accelerate policy learning in large domains. Pri...
A general and expressive model of sequential decision making under uncertainty is provided by the Markov decision processes (MDPs) framework. Complex applications with very large ...
Two notions of optimality have been explored in previous work on hierarchical reinforcement learning (HRL): hierarchical optimality, or the optimal policy in the space defined by ...