Two notions of optimality have been explored in previous work on hierarchical reinforcement learning (HRL): hierarchical optimality, or the optimal policy in the space defined by ...
Aiming to clarify the convergence or divergence conditions for Learning Classifier System (LCS), this paper explores: (1) an extreme condition where the reinforcement process of ...
In this paper we consider the problem of policy evaluation in reinforcement learning, i.e., learning the value function of a fixed policy, using the least-squares temporal-differe...
Alessandro Lazaric, Mohammad Ghavamzadeh, Ré...
Prioritized sweeping is a model-based reinforcement learning method that attempts to focus an agent’s limited computational resources to achieve a good estimate of the value of ...
RRL is a relational reinforcement learning system based on Q-learning in relational state-action spaces. It aims to enable agents to learn how to act in an environment that has no ...