Sciweavers

BIOINFORMATICS
2005

Prediction error estimation: a comparison of resampling methods

13 years 4 months ago
Prediction error estimation: a comparison of resampling methods
In genomic studies, thousands of features are collected on relatively few samples. One of the goals of these studies is to build classifiers to predict the outcome of future observations. There are three inherent steps to this process: feature selection, model selection, and prediction assessment. With a focus on prediction assessment, we compare several methods for estimating the 'true' prediction error of a prediction model in the presence of feature selection. For small studies where features are selected from thousands of candidates, the resubstitution and simple split-sample estimates are seriously biased. In these small samples, leave-one-out (LOOCV), 10-fold cross-validation (CV), and the .632+ bootstrap have the smallest bias for diagonal discriminant analysis, nearest neighbor, and classification trees. LOOCV and 10-fold CV have the smallest bias for linear discriminant analysis. Additionally, LOOCV, 5- and 10-fold CV, and the .632+ bootstrap have the lowest mean sq...
Annette M. Molinaro, Richard Simon, Ruth M. Pfeiff
Added 15 Dec 2010
Updated 15 Dec 2010
Type Journal
Year 2005
Where BIOINFORMATICS
Authors Annette M. Molinaro, Richard Simon, Ruth M. Pfeiffer
Comments (0)