This paper develops a generalized apprenticeship learning protocol for reinforcementlearning agents with access to a teacher who provides policy traces (transition and reward observations). We characterize sufficient conditions of the underlying models for efficient apprenticeship learning and link this criteria to two established learnability classes (KWIK and Mistake Bound). We then construct efficient apprenticeship-learning algorithms in a number of domains, including two types of relational MDPs. We instantiate our approach in a software agent and a robot agent that learn effectively from a human teacher.
Thomas J. Walsh, Kaushik Subramanian, Michael L. L