Sciweavers

NN
2010
Springer

Efficient exploration through active learning for value function approximation in reinforcement learning

12 years 11 months ago
Efficient exploration through active learning for value function approximation in reinforcement learning
Appropriately designing sampling policies is highly important for obtaining better control policies in reinforcement learning. In this paper, we first show that the least-squares policy iteration (LSPI) framework allows us to employ statistical active learning methods for linear regression. Then we propose a design method of good sampling policies for efficient exploration, which is particularly useful when the sampling cost of immediate rewards is high. The effectiveness of the proposed method, which we call active policy iteration (API), is demonstrated through simulations with a batting robot. Keywords reinforcement learning, Markov decision process, least-squares policy iteration, active learning, batting robot
Takayuki Akiyama, Hirotaka Hachiya, Masashi Sugiya
Added 20 May 2011
Updated 20 May 2011
Type Journal
Year 2010
Where NN
Authors Takayuki Akiyama, Hirotaka Hachiya, Masashi Sugiyama
Comments (0)