Distributed Partially Observable Markov Decision Problems (Distributed POMDPs) are evolving as a popular approach for modeling multiagent systems, and many different algorithms ha...
We introduce the controlled predictive linearGaussian model (cPLG), a model that uses predictive state to model discrete-time dynamical systems with real-valued observations and v...
This work presents a lookahead-based exploration strategy for a model-based learning agent that enables exploration of the opponent's behavior during interaction in a multi-a...
We address the problem of autonomously learning controllers for visioncapable mobile robots. We extend McCallum's (1995) Nearest-Sequence Memory algorithm to allow for genera...
Viktor Zhumatiy, Faustino J. Gomez, Marcus Hutter,...
An efficient policy search algorithm should estimate the local gradient of the objective function, with respect to the policy parameters, from as few trials as possible. Whereas m...