Sciweavers

PR
2010

Out-of-bag estimation of the optimal sample size in bagging

13 years 3 months ago
Out-of-bag estimation of the optimal sample size in bagging
The performance of m-out-of-n bagging with and without replacement in terms of the sampling ratio (m/n) is analyzed. Standard bagging uses resampling with replacement to generate bootstrap samples of equal size as the original training set mwor = n. Without-replacement methods typically use half samples mwr = n/2. These choices of sampling sizes are arbitrary and need not be optimal in terms of the classification performance of the ensemble. We propose to use the out-of-bag estimates of the generalization accuracy to select a near-optimal value for the sampling ratio. Ensembles of classifiers trained on independent samples whose size is such that the out-of-bag error of the ensemble is as low as possible generally improve the performance of standard bagging and can be efficiently built. Key words: Bagging, subagging, Bootstrap sampling, subsampling, Optimal sampling ratio, Ensembles of Classifiers, Decision Trees
Gonzalo Martínez-Muñoz, Alberto Su&a
Added 29 Jan 2011
Updated 29 Jan 2011
Type Journal
Year 2010
Where PR
Authors Gonzalo Martínez-Muñoz, Alberto Suárez
Comments (0)