Scalable Term Selection for Text Categorization

9 years 2 months ago
Scalable Term Selection for Text Categorization
In text categorization, term selection is an important step for the sake of both categorization accuracy and computational efficiency. Different dimensionalities are expected under different practical resource restrictions of time or space. Traditionally in text categorization, the same scoring or ranking criterion is adopted for all target dimensionalities, which considers both the discriminability and the coverage of a term, such as χ2 or IG. In this paper, the poor accuracy at a low dimensionality is imputed to the small average vector length of the documents. Scalable term selection is proposed to optimize the term set at a given dimensionality according to an expected average vector length. Discriminability and coverage are separately measured; by adjusting the ratio of their weights in a combined criterion, the expected average vector length can be reached, which means a good compromise between the specificity and the exhaustivity of the term subset. Experiments show that the...
Jingyang Li, Maosong Sun
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2007
Authors Jingyang Li, Maosong Sun
Comments (0)