Sciweavers

GECCO
2005
Springer

An autonomous explore/exploit strategy

13 years 10 months ago
An autonomous explore/exploit strategy
In reinforcement learning problems it has been considered that neither exploitation nor exploration can be pursued exclusively without failing at the task. The optimal balance between exploring and exploiting changes as the training progresses due to the increasing amount of learnt knowledge. This shift in balance is not known a priori so an autonomous online adjustment is sought. Human beings manage this balance through logic and emotions based on feedback from the environment. The XCS learning classifier system uses a fixed explore/exploit balance, but does keep multiple statistics about its performance and interaction in an environment. Utilizing these statistics in a manner analogous to logic/emotion, autonomous adjustment of the explore/exploit balance was achieved. This resulted in reduced exploration in simple environments, which increased with the complexity of the problem domain. It also prevented unsuccessful 'loop' exploit trials and suggests a method of dynamic c...
Alex McMahon, Dan Scott, William N. L. Browne
Added 27 Jun 2010
Updated 27 Jun 2010
Type Conference
Year 2005
Where GECCO
Authors Alex McMahon, Dan Scott, William N. L. Browne
Comments (0)