Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

159

TKDE
2008

111views more TKDE 2008»

Text Clustering with Feature Selection by Using Statistical Data

15 years 4 months ago

Text Clustering with Feature Selection by Using Statistical Data

Download dblab.mgt.ncu.edu.tw

Abstract-- Feature selection is an important method for improving the efficiency and accuracy of text categorization algorithms by removing redundant and irrelevant terms from the corpus. In this paper, we propose a new supervised feature selection method, named CHIR, which is based on the 2 statistic and new statistical data that can measure the positive termcategory dependency. We also propose a new text clustering algorithm TCFS, which stands for Text Clustering with Feature Selection. TCFS can incorporate CHIR to identify relevant features (i.e., terms) iteratively, and the clustering becomes a learning process. We compared TCFS and the k-means clustering algorithm in combination with different feature selection methods for various real data sets. Our experimental results show that TCFS with CHIR has better clustering accuracy in terms of the F-measure and the purity.

Yanjun Li, Congnan Luo, Soon M. Chung

Real-time Traffic

Feature Selection | Feature Selection Methods | Supervised Feature Selection | TKDE 2008 |

claim paper

Related Content

» An Evaluation on Feature Selection for Text Clustering

» A Bayesian Approach to Unsupervised Feature Selection and Density Estimation Using Expecta...

» Semantic Scoring Based on SmallWorld Phenomenon for Feature Selection in Text Mining

» Local Feature Selection in Text Clustering

» Feature selection in robust clustering based on Laplace mixture

» Improving the Dynamic Hierarchical Compact Clustering Algorithm by Using Feature Selection

» Word Clustering and Word Selection Based Feature Reduction for MaxEnt Based Hindi NER

» Unsupervised Feature Selection for Text Data

» Clusterbased concept invention for statistical relational learning

Post Info
More Details (n/a)

Added	15 Dec 2010
Updated	15 Dec 2010
Type	Journal
Year	2008
Where	TKDE
Authors	Yanjun Li, Congnan Luo, Soon M. Chung

Comments (0)