Model-Based Online Learning of POMDPs

9 years 7 months ago
Model-Based Online Learning of POMDPs
Abstract. Learning to act in an unknown partially observable domain is a difficult variant of the reinforcement learning paradigm. Research in the area has focused on model-free methods — methods that learn a policy without learning a model of the world. When sensor noise increases, model-free methods provide less accurate policies. The model-based approach — learning a POMDP model of the world, and computing an optimal policy for the learned model — may generate superior results in the presence of sensor noise, but learning and solving a model of the environment is a difficult problem. We have previously shown how such a model can be obtained from the learned policy of model-free methods, but this approach implies a distinction between a learning phase and an acting phase that is undesirable. In this paper we present a novel method for learning a POMDP model online, based on McCallums’ Utile Suffix Memory (USM), in conjunction with an approximate policy obtained using an in...
Guy Shani, Ronen I. Brafman, Solomon Eyal Shimony
Added 27 Jun 2010
Updated 27 Jun 2010
Type Conference
Year 2005
Where ECML
Authors Guy Shani, Ronen I. Brafman, Solomon Eyal Shimony
Comments (0)