The problem of deriving joint policies for a group of agents that maximize some joint reward function can be modeled as a decentralized partially observable Markov decision proces...
Ranjit Nair, Milind Tambe, Makoto Yokoo, David V. ...
We extend the theory of labeled Markov processes with internal nondeterminism, a fundamental concept for the further development of a process theory with abstraction on nondetermi...
To gain insights into the neural basis of such adaptive decision-making processes, we investigated the nature of learning process in humans playing a competitive game with binary ...
Policy evaluation is a critical step in the approximate solution of large Markov decision processes (MDPs), typically requiring O(|S|3 ) to directly solve the Bellman system of |S...
In apprenticeship learning, the goal is to learn a policy in a Markov decision process that is at least as good as a policy demonstrated by an expert. The difficulty arises in tha...