Sciweavers

EWRL
2008

Efficient Reinforcement Learning in Parameterized Models: Discrete Parameter Case

13 years 6 months ago
Efficient Reinforcement Learning in Parameterized Models: Discrete Parameter Case
We consider reinforcement learning in the parameterized setup, where the model is known to belong to a parameterized family of Markov Decision Processes (MDPs). We further impose here the assumption that set of possible parameters is finite, and consider the discounted return. We propose an on-line algorithm for learning in such parameterized models, dubbed the Parameter Elimination (PEL) algorithm, and analyze its performance in terms of the the total mistake bound criterion (also known as the sample complexity of exploration). The algorithm relies on Wald's Sequential Probability Ratio Test to eliminate unlikely parameters, and uses an optimistic policy for effective exploration. We establish that, with high probability, the total mistake bound for the algorithm is linear (up to a logarithmic term) in the size of the parameter space, independently of the cardinality of the state and action spaces.
Kirill Dyagilev, Shie Mannor, Nahum Shimkin
Added 19 Oct 2010
Updated 19 Oct 2010
Type Conference
Year 2008
Where EWRL
Authors Kirill Dyagilev, Shie Mannor, Nahum Shimkin
Comments (0)