Taking Advantage of the Web for Text Classification with Imbalanced Classes

15 years 9 months ago

Download ccc.inaoep.mx

A problem of supervised approaches for text classification is that they commonly require high-quality training data to construct an accurate classifier. Unfortunately, in many real-world applications the training sets are extremely small and present imbalanced class distributions. In order to confront these problems, this paper proposes a novel approach for text classification that combines under-sampling with a semi-supervised learning method. In particular, the proposed semi-supervised method is specially suited to work with very few training examples and considers the automatic extraction of untagged data from the Web. Experimental results on a subset of Reuters-21578 text collection indicate that the proposed approach can be a practical solution for dealing with the class-imbalance problem, since it allows achieving very good results using very small training sets.

Rafael Guzmán-Cabrera, Manuel Montes-y-G&oa

Real-time Traffic

Artificial Intelligence | MICAI 2007 | Semi-supervised Learning Method | Text Classification | Training Sets |

claim paper

» Employing EM and PoolBased Active Learning for Text Classification

» Hierarchical classification of Web content

» Impact on Performance of Hypertext Classification of Selective Rich HTML Capture

» CoTraining on Textual Documents with a Single Natural Feature Set

» Web Page Classification A Soft Computing Approach

» Neighbourhood Exploitation in Hypertext Categorization

» A classfeaturecentroid classifier for text categorization

» TwoView Transductive Support Vector Machines

Post Info
More Details (n/a)

Added	08 Jun 2010
Updated	08 Jun 2010
Type	Conference
Year	2007
Where	MICAI
Authors	Rafael Guzmán-Cabrera, Manuel Montes-y-Gómez, Paolo Rosso, Luis Villaseñor Pineda

Comments (0)

Sciweavers

Taking Advantage of the Web for Text Classification with Imbalanced Classes

Artificial Intelligence | MICAI 2007 | Semi-supervised Learning Method | Text Classification | Training Sets |

Explore & Download

Productivity Tools

Sciweavers