Sciweavers

ML
2002
ACM

Estimating Generalization Error on Two-Class Datasets Using Out-of-Bag Estimates

13 years 4 months ago
Estimating Generalization Error on Two-Class Datasets Using Out-of-Bag Estimates
For two-class datasets, we provide a method for estimating the generalization error of a bag using out-of-bag estimates. In bagging, each predictor (single hypothesis) is learned from a bootstrap sample of the training examples; the output of a bag (a set of predictors) on an example is determined by voting. The out-of-bag estimate is based on recording the votes of each predictor on those training examples omitted from its bootstrap sample. Because no additional predictors are generated, the out-of-bag estimate requires considerably less time than 10-fold cross-validation. We address the question of how to use the outof-bag estimate to estimate generalization error on two-class datasets. Our experiments on several datasets show that the out-of-bag estimate and 10-fold cross-validation have similar performance, but are both biased. We can eliminate most of the bias in the out-of-bag estimate and increase accuracy by incorporating a correction based on the distribution of the out-of-ba...
Tom Bylander
Added 22 Dec 2010
Updated 22 Dec 2010
Type Journal
Year 2002
Where ML
Authors Tom Bylander
Comments (0)