Sciweavers

IJCAI
2007

Using Linear Programming for Bayesian Exploration in Markov Decision Processes

13 years 6 months ago
Using Linear Programming for Bayesian Exploration in Markov Decision Processes
A key problem in reinforcement learning is finding a good balance between the need to explore the environment and the need to gain rewards by exploiting existing knowledge. Much research has been devoted to this topic, and many of the proposed methods are aimed simply at ensuring that enough samples are gathered to estimate well the value function. In contrast, [Bellman and Kalaba, 1959] proposed constructing a representation in which the states of the original system are paired with knowledge about the current model. Hence, knowledge about the possible Markov models of the environment is represented and maintained explicitly. Unfortunately, this approach is intractable except for bandit problems (where it gives rise to Gittins indices, an optimal exploration method). In this paper, we explore ideas for making this method computationally tractable. We maintain a model of the environment as a Markov Decision Process. We sample finite-length trajectories from the infinite tree using ...
Pablo Samuel Castro, Doina Precup
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2007
Where IJCAI
Authors Pablo Samuel Castro, Doina Precup
Comments (0)