Sciweavers

JMLR
2010

On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation

12 years 11 months ago
On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation
Model selection strategies for machine learning algorithms typically involve the numerical optimisation of an appropriate model selection criterion, often based on an estimator of generalisation performance, such as k-fold cross-validation. The error of such an estimator can be broken down into bias and variance components. While unbiasedness is often cited as a beneficial quality of a model selection criterion, we demonstrate that a low variance is at least as important, as a nonnegligible variance introduces the potential for over-fitting in model selection as well as in training the model. While this observation is in hindsight perhaps rather obvious, the degradation in performance due to over-fitting the model selection criterion can be surprisingly large, an observation that appears to have received little attention in the machine learning literature to date. In this paper, we show that the effects of this form of over-fitting are often of comparable magnitude to differences in p...
Gavin C. Cawley, Nicola L. C. Talbot
Added 19 May 2011
Updated 19 May 2011
Type Journal
Year 2010
Where JMLR
Authors Gavin C. Cawley, Nicola L. C. Talbot
Comments (0)