Probabilistic policy reuse in a reinforcement learning agent

11 years 12 months ago
Probabilistic policy reuse in a reinforcement learning agent
We contribute Policy Reuse as a technique to improve a reinforcement learning agent with guidance from past learned similar policies. Our method relies on using the past policies as a probabilistic bias where the learning agent faces three choices: the exploitation of the ongoing learned policy, the exploration of random unexplored actions, and the exploitation of past policies. We introduce the algorithm and its major components: an exploration strategy to include the new reuse bias, and a similarity function to estimate the similarity of past policies with respect to a new one. We provide empirical results demonstrating that Policy Reuse improves the learning performance over different strategies that learn without reuse. Interestingly and almost as a side effect, Policy Reuse also identifies classes of similar policies revealing a basis of core policies of the domain. We demonstrate that such a basis can be built incrementally, contributing the learning of the structure of a domain...
Fernando Fernández, Manuela M. Veloso
Added 20 Aug 2010
Updated 20 Aug 2010
Type Conference
Year 2006
Where ATAL
Authors Fernando Fernández, Manuela M. Veloso
Comments (0)