Gaussian Process Temporal Difference (GPTD) learning offers a Bayesian solution to the policy evaluation problem of reinforcement learning. In this paper we extend the GPTD framew...
In this paper, we investigate the use of hierarchical reinforcement learning (HRL) to speed up the acquisition of cooperative multi-agent tasks. We introduce a hierarchical multi-a...
Rajbala Makar, Sridhar Mahadevan, Mohammad Ghavamz...
As computational learning agents move into domains that incur real costs (e.g., autonomous driving or financial investment), it will be necessary to learn good policies without n...
Reinforcement techniques have been successfully used to maximise the expected cumulative reward of statistical dialogue systems. Typically, reinforcement learning is used to estim...
RRL is a relational reinforcement learning system based on Q-learning in relational state-action spaces. It aims to enable agents to learn how to act in an environment that has no ...