The asymptotic equipartition property in reinforcement learning and its relation to return maximization

14 years 11 months ago

Download www.ece.uvic.ca

We discuss an important property called the asymptotic equipartition property on empirical sequences in reinforcement learning. This states that the typical set of empirical sequences has probability nearly one, that all elements in the typical set are nearly equi-probable, and that the number of elements in the typical set is an exponential function of the sum of conditional entropies if the number of time steps is sufficiently large. The sum is referred to as stochastic complexity. Using the property we elucidate the fact that the return maximization depends on two factors, the stochastic complexity and a quantity depending on the parameters of environment. Here, the return maximization means that the best sequences in terms of expected return have probability one. We also examine the sensitivity of stochastic complexity, which is a qualitative guide in tuning the parameters of action-selection strategy, and show a sufficient condition for return maximization in probability. q 2005 ...

Kazunori Iwata, Kazushi Ikeda, Hideaki Sakai

Real-time Traffic

Empirical Sequences | Neural Networks | NN 2006 | Return Maximization | Stochastic Complexity |

claim paper

Post Info
More Details (n/a)

Added	14 Dec 2010
Updated	14 Dec 2010
Type	Journal
Year	2006
Where	NN
Authors	Kazunori Iwata, Kazushi Ikeda, Hideaki Sakai

Comments (0)

Sciweavers

The asymptotic equipartition property in reinforcement learning and its relation to return maximization

Empirical Sequences | Neural Networks | NN 2006 | Return Maximization | Stochastic Complexity |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers