We propose a new approach to the problem of searching a space of policies for a Markov decision process (MDP) or a partially observable Markov decision process (POMDP), given a mo...
The problem of deriving joint policies for a group of agents that maximize some joint reward function can be modeled as a decentralized partially observable Markov decision proces...
Ranjit Nair, Milind Tambe, Makoto Yokoo, David V. ...
High dimensionality of belief space in DEC-POMDPs is one of the major causes that makes the optimal joint policy computation intractable. The belief state for a given agent is a p...
Planning in single-agent models like MDPs and POMDPs can be carried out by resorting to Q-value functions: a (near-) optimal Q-value function is computed in a recursive manner by ...
Memory-bounded techniques have shown great promise in solving complex multi-agent planning problems modeled as DEC-POMDPs. Much of the performance gains can be attributed to pruni...