Sciweavers

ESWA
2007

A novel feature selection algorithm for text categorization

13 years 4 months ago
A novel feature selection algorithm for text categorization
With the development of the web, large numbers of documents are available on the Internet. Digital libraries, news sources and inner data of companies surge more and more. Automatic text categorization becomes more and more important for dealing with massive data. However the major problem of text categorization is the high dimensionality of the feature space. At present there are many methods to deal with text feature selection. To improve the performance of text categorization, we present another method of dealing with text feature selection. Our study is based on Gini index theory and we design a novel Gini index algorithm to reduce the high dimensionality of the feature space. A new measure function of Gini index is constructed and made to fit text categorization. The results of experiments show that our improvements of Gini index behave better than other methods of feature selection. Ó 2006 Elsevier Ltd. All rights reserved.
Wenqian Shang, Houkuan Huang, Haibin Zhu, Yongmin
Added 14 Dec 2010
Updated 14 Dec 2010
Type Journal
Year 2007
Where ESWA
Authors Wenqian Shang, Houkuan Huang, Haibin Zhu, Yongmin Lin, Youli Qu, Zhihai Wang
Comments (0)