Approximate dynamic programming has been used successfully in a large variety of domains, but it relies on a small set of provided approximation features to calculate solutions re...
Marek Petrik, Gavin Taylor, Ronald Parr, Shlomo Zi...
This paper examines the problem of finding an optimal policy for a Partially Observable Markov Decision Process (POMDP) when the model is not known or is only poorly specified. W...
Stochastic games generalize Markov decision processes MDPs to a multiagent setting by allowing the state transitions to depend jointly on all player actions, and having rewards de...
Michael J. Kearns, Yishay Mansour, Satinder P. Sin...
We present metric?? , a provably near-optimal algorithm for reinforcement learning in Markov decision processes in which there is a natural metric on the state space that allows t...
In environmental and natural resource planning domains actions are taken at a large number of locations over multiple time periods. These problems have enormous state and action s...