Abstract— In this paper, we consider a class of continuoustime, continuous-space stochastic optimal control problems. Building upon recent advances in Markov chain approximation ...
A simple learning rule is derived, the VAPS algorithm, which can be instantiated to generate a wide range of new reinforcementlearning algorithms. These algorithms solve a number ...
Decentralized partially observable MDPs (DEC-POMDPs) provide a rich framework for modeling decision making by a team of agents. Despite rapid progress in this area, the limited sc...
Recent scaling up of decentralized partially observable Markov decision process (DEC-POMDP) solvers towards realistic applications is mainly due to approximate methods. Of this fa...
We consider the problem of efficiently learning optimal control policies and value functions over large state spaces in an online setting in which estimates must be available afte...