Planning in single-agent models like MDPs and POMDPs can be carried out by resorting to Q-value functions: a (near-) optimal Q-value function is computed in a recursive manner by ...
In this paper, we study a sequential decision making problem. The objective is to maximize the average reward accumulated over time subject to temporal cost constraints. The novel...
Many signals of interest are corrupted by faults of an unknown type. We propose an approach that uses Gaussian processes and a general “fault bucket” to capture a priori uncha...
Michael A. Osborne, Roman Garnett, Kevin Swersky, ...
This paper reports on a novel decentralised technique for planning agent schedules in dynamic task allocation problems. Specifically, we use a Markov game formulation of these pr...
Archie C. Chapman, Rosa Anna Micillo, Ramachandra ...
It is known that the complexity of the reinforcement learning algorithms, such as Q-learning, may be exponential in the number of environment’s states. It was shown, however, th...