Multi-Objective MDPs with Conditional Lexicographic Reward Preferences

10 years 27 days ago

Download rbr.cs.umass.edu

Sequential decision problems that involve multiple objectives are prevalent. Consider for example a driver of a semiautonomous car who may want to optimize competing objectives such as travel time and the effort associated with manual driving. We introduce a rich model called Lexicographic MDP (LMDP) and a corresponding planning algorithm called LVI that generalize previous work by allowing for conditional lexicographic preferences with slack. We analyze the convergence characteristics of LVI and establish its game theoretic properties. The performance of LVI in practice is tested within a realistic benchmark problem in the domain of semi-autonomous driving. Finally, we demonstrate how GPU-based optimization can improve the scalability of LVI and other value iteration algorithms for MDPs.

Kyle Hollins Wray, Shlomo Zilberstein, Abdel-Illah

Real-time Traffic

AAAI 2015 | Intelligent Agents |

claim paper

Post Info
More Details (n/a)

Added	27 Mar 2016
Updated	27 Mar 2016
Type	Journal
Year	2015
Where	AAAI
Authors	Kyle Hollins Wray, Shlomo Zilberstein, Abdel-Illah Mouaddib

Comments (0)

Sciweavers

Multi-Objective MDPs with Conditional Lexicographic Reward Preferences

AAAI 2015 | Intelligent Agents |

Explore & Download

Productivity Tools

Sciweavers