Sciweavers

JAIR
2008

Optimal and Approximate Q-value Functions for Decentralized POMDPs

13 years 4 months ago
Optimal and Approximate Q-value Functions for Decentralized POMDPs
Decision-theoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In single-agent frameworks like MDPs and POMDPs, planning can be carried out by resorting to Q-value functions: an optimal Q-value function Q is computed in a recursive manner by dynamic programming, and then an optimal policy is extracted from Q . In this paper we study whether similar Q-value functions can be defined for decentralized POMDP models (DecPOMDPs), and how policies can be extracted from such value functions. We define two forms of the optimal Q-value function for Dec-POMDPs: one that gives a normative description as the Q-value function of an optimal pure joint policy and another one that is sequentially rational and thus gives a recipe for computation. This computation, however, is infeasible for all but the smallest problems. Therefore, we analyze various approximate Q-value functions that allow for efficient...
Frans A. Oliehoek, Matthijs T. J. Spaan, Nikos A.
Added 12 Dec 2010
Updated 12 Dec 2010
Type Journal
Year 2008
Where JAIR
Authors Frans A. Oliehoek, Matthijs T. J. Spaan, Nikos A. Vlassis
Comments (0)