Decentralized partially observable Markov decision processes (DEC-POMDPs) form a general framework for planning for groups of cooperating agents that inhabit a stochastic and part...
Matthijs T. J. Spaan, Geoffrey J. Gordon, Nikos A....
Temporal difference methods are theoretically grounded and empirically effective methods for addressing reinforcement learning problems. In most real-world reinforcement learning ...
This paper investigates the problem of automatically learning how to restructure the reward function of a Markov decision process so as to speed up reinforcement learning. We begi...
Abstract—In the field of evolutionary multi-criterion optimization, the hypervolume indicator is the only single set quality measure that is known to be strictly monotonic with ...
Attempts at classifying computational problems as polynomial time solvable, NP-complete, or belonging to a higher level in the polynomial hierarchy, face the difficulty of undecid...