Avoiding Boosting Overfitting by Removing Confusing Samples

9 years 7 months ago
Avoiding Boosting Overfitting by Removing Confusing Samples
Boosting methods are known to exhibit noticeable overfitting on some datasets, while being immune to overfitting on other ones. In this paper we show that standard boosting algorithms are not appropriate in case of overlapping classes. This inadequateness is likely to be the major source of boosting overfitting while working with real world data. To verify our conclusion we use the fact that any overlapping classes’ task can be reduced to a deterministic task with the same Bayesian separating surface. This can be done by removing “confusing samples” – samples that are misclassified by a “perfect” Bayesian classifier. We propose an algorithm for removing confusing samples and experimentally study behavior of AdaBoost trained on the resulting data sets. Experiments confirm that removing confusing samples helps boosting to reduce the generalization error and to avoid overfitting on both synthetic and real world. Process of removing confusing samples also provides an accurate e...
Alexander Vezhnevets, Olga Barinova
Added 07 Jun 2010
Updated 30 Aug 2010
Type Conference
Year 2007
Where ECML
Authors Alexander Vezhnevets, Olga Barinova
Comments (0)