Policy teaching through reward function learning

15 years 11 months ago

Download www.eecs.harvard.edu

Policy teaching considers a Markov Decision Process setting in which an interested party aims to inﬂuence an agent’s decisions by providing limited incentives. In this paper, we consider the speciﬁc objective of inducing a pre-speciﬁed desired policy. We examine both the case in which the agent’s reward function is known and unknown to the interested party, presenting a linear program for the former case and formulating an active, indirect elicitation method for the latter. We provide conditions for logarithmic convergence, and present a polynomial time algorithm that ensures logarithmic convergence with arbitrarily high probability. We also oﬀer practical elicitation heuristics that can be formulated as linear programs, and demonstrate their eﬀectiveness on a policy teaching problem in a simulated ad network setting. We extend our methods to handle partial observations and partial target policies, and provide a game-theoretic interpretation of our methods for handling s...

Haoqi Zhang, David C. Parkes, Yiling Chen

Real-time Traffic