Sciweavers

MICAI
2007
Springer

Taking Advantage of the Web for Text Classification with Imbalanced Classes

13 years 10 months ago
Taking Advantage of the Web for Text Classification with Imbalanced Classes
A problem of supervised approaches for text classification is that they commonly require high-quality training data to construct an accurate classifier. Unfortunately, in many real-world applications the training sets are extremely small and present imbalanced class distributions. In order to confront these problems, this paper proposes a novel approach for text classification that combines under-sampling with a semi-supervised learning method. In particular, the proposed semi-supervised method is specially suited to work with very few training examples and considers the automatic extraction of untagged data from the Web. Experimental results on a subset of Reuters-21578 text collection indicate that the proposed approach can be a practical solution for dealing with the class-imbalance problem, since it allows achieving very good results using very small training sets.
Rafael Guzmán-Cabrera, Manuel Montes-y-G&oa
Added 08 Jun 2010
Updated 08 Jun 2010
Type Conference
Year 2007
Where MICAI
Authors Rafael Guzmán-Cabrera, Manuel Montes-y-Gómez, Paolo Rosso, Luis Villaseñor Pineda
Comments (0)