Sciweavers

ICML
2003
IEEE

An Evaluation on Feature Selection for Text Clustering

13 years 9 months ago
An Evaluation on Feature Selection for Text Clustering
Feature selection methods have been successfully applied to text categorization but seldom applied to text clustering due to the unavailability of class label information. In this paper, we first give empirical evidence that feature selection methods can improve the efficiency and performance of text clustering algorithm. Then we propose a new feature selection method called “Term Contribution (TC)” and perform a comparative study on a variety of feature selection methods for text clustering, including Document Frequency (DF), Term Strength (TS), Entropy-based (En), Information Gain (IG) and 2 statistic (CHI). Finally, we propose an “Iterative Feature Selection (IF)” method that addresses the unavailability of label problem by utilizing effective supervised feature selection method to iteratively select features and perform clustering. Detailed experimental results on Web Directory data are provided in the paper.
Tao Liu, Shengping Liu, Zheng Chen, Wei-Ying Ma
Added 05 Jul 2010
Updated 05 Jul 2010
Type Conference
Year 2003
Where ICML
Authors Tao Liu, Shengping Liu, Zheng Chen, Wei-Ying Ma
Comments (0)