Policy teaching through reward function learning

12 years 6 months ago
Policy teaching through reward function learning
Policy teaching considers a Markov Decision Process setting in which an interested party aims to influence an agent’s decisions by providing limited incentives. In this paper, we consider the specific objective of inducing a pre-specified desired policy. We examine both the case in which the agent’s reward function is known and unknown to the interested party, presenting a linear program for the former case and formulating an active, indirect elicitation method for the latter. We provide conditions for logarithmic convergence, and present a polynomial time algorithm that ensures logarithmic convergence with arbitrarily high probability. We also offer practical elicitation heuristics that can be formulated as linear programs, and demonstrate their effectiveness on a policy teaching problem in a simulated ad network setting. We extend our methods to handle partial observations and partial target policies, and provide a game-theoretic interpretation of our methods for handling s...
Haoqi Zhang, David C. Parkes, Yiling Chen
Added 28 May 2010
Updated 28 May 2010
Type Conference
Year 2009
Authors Haoqi Zhang, David C. Parkes, Yiling Chen
Comments (0)