Sciweavers

ICML
2001
IEEE

Direct Policy Search using Paired Statistical Tests

14 years 5 months ago
Direct Policy Search using Paired Statistical Tests
Direct policy search is a practical way to solve reinforcement learning problems involving continuous state and action spaces. The goal becomes finding policy parameters that maximize a noisy objective function. The Pegasus method converts this stochastic optimization problem into a deterministic one, by using fixed start states and fixed random number sequences for comparing policies (Ng & Jordan, 1999). We evaluate Pegasus, and other paired comparison methods, using the mountain car problem, and a difficult pursuer-evader problem. We conclude that: (i) Paired tests can improve performance of deterministic and stochastic optimization procedures. (ii) Our proposed alternatives to Pegasus can generalize better, by using a different test statistic, or changing the scenarios during learning. (iii) Adapting the number of trials used for each policy comparison yields fast and robust learning.
Malcolm J. A. Strens, Andrew W. Moore
Added 17 Nov 2009
Updated 17 Nov 2009
Type Conference
Year 2001
Where ICML
Authors Malcolm J. A. Strens, Andrew W. Moore
Comments (0)