Targeting Specific Distributions of Trajectories in MDPs

15 years 5 months ago

Download www.cc.gatech.edu

We define TTD-MDPs, a novel class of Markov decision processes where the traditional goal of an agent is changed from finding an optimal trajectory through a state space to realizing a specified distribution of trajectories through the space. After motivating this formulation, we show how to convert a traditional MDP into a TTD-MDP. We derive an algorithm for finding non-deterministic policies by constructing a trajectory tree that allows us to compute locally-consistent policies. We specify the necessary conditions for solving the problem exactly and present a heuristic algorithm for constructing policies when an exact answer is impossible or impractical. We present empirical results for our algorithm in two domains: a synthetic grid world and stories in an interactive drama or game.

David L. Roberts, Mark J. Nelson, Charles Lee Isbe

Real-time Traffic

AAAI 2006 | Algorithm | Intelligent Agents | Non-deterministic Policies | Optimal Trajectory |

claim paper

» Efficient Solutions to Factored MDPs with Imprecise Transition Probabilities

» A globally optimal algorithm for TTDMDPs

Post Info
More Details (n/a)

Added	30 Oct 2010
Updated	30 Oct 2010
Type	Conference
Year	2006
Where	AAAI
Authors	David L. Roberts, Mark J. Nelson, Charles Lee Isbell Jr., Michael Mateas, Michael L. Littman

Comments (0)

Sciweavers

Targeting Specific Distributions of Trajectories in MDPs

AAAI 2006 | Algorithm | Intelligent Agents | Non-deterministic Policies | Optimal Trajectory |

Explore & Download

Productivity Tools

Sciweavers