Sciweavers

HIS
2008

Evolutionary Training Set Selection to Optimize C4.5 in Imbalanced Problems

13 years 6 months ago
Evolutionary Training Set Selection to Optimize C4.5 in Imbalanced Problems
Classification in imbalanced domains is a recent challenge in machine learning. We refer to imbalanced classification when data presents many examples from one class and few from the other class, and the less representative class is the one which has more interest. One of the most used techniques to tackle this problem consists in preprocessing the data previously to the learning process. This preprocessing could be done through under-sampling; removing examples, mainly belonging to the majority class; and over-sampling, by means of replicating or generating new minority examples. This contribution proposes an undersampling procedure based on evolutionary algorithms to perform a training set selection for optimizing the models obtained by the C4.5 decision tree. The proposal has been compared with other under-sampling and over-sampling techniques and the results are very competitive in terms of accuracy, and the obtained models are more interpretable.
Salvador García, Francisco Herrera
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2008
Where HIS
Authors Salvador García, Francisco Herrera
Comments (0)