Authorial Idioms for Target Distributions in TTD-MDPs

15 years 1 months ago

Download www.cc.gatech.edu

In designing Markov Decision Processes (MDP), one must deﬁne the world, its dynamics, a set of actions, and a reward function. MDPs are often applied in situations where there is a clear choice of reward functions and in these cases signiﬁcant care must be taken to construct a reward function that induces the desired behavior. In this paper, we consider an analogous design problem: crafting a target distribution in Targeted Trajectory Distribution MDPs (TTD-MDPs). TTD-MDPs produce probabilistic policies that minimize divergence from a target distribution of trajectories from an underlying MDP. They are an extension of MDPs that provide variety of experience during repeated execution. Here, we present a brief overview of TTD-MDPs with approaches for constructing target distributions. Then we present a novel authorial idiom for creating target distributions using prototype trajectories. We evaluate these approaches on a drama manager for an interactive game.

David L. Roberts, Sooraj Bhat, Kenneth St. Clair,

Real-time Traffic

AAAI 2007 | Cases Signiﬁcant Care | Intelligent Agents | Reward Function | Target Distributions |

claim paper

Post Info
More Details (n/a)

Added	02 Oct 2010
Updated	02 Oct 2010
Type	Conference
Year	2007
Where	AAAI
Authors	David L. Roberts, Sooraj Bhat, Kenneth St. Clair, Charles Lee Isbell Jr.

Comments (0)

Sciweavers

Authorial Idioms for Target Distributions in TTD-MDPs

AAAI 2007 | Cases Signiﬁcant Care | Intelligent Agents | Reward Function | Target Distributions |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers