Sciweavers

JMLR
2010

Feature Selection for Text Classification Based on Gini Coefficient of Inequality

12 years 11 months ago
Feature Selection for Text Classification Based on Gini Coefficient of Inequality
A number of feature selection mechanisms have been explored in text categorization, among which mutual information, information gain and chi-square are considered most effective. In this paper, we study another method known as within class popularity to deal with feature selection based on the concept Gini coefficient of inequality (a commonly used measure of inequality of income). The proposed measure explores the relative distribution of a feature among different classes. From extensive experiments with four text classifiers over three datasets of different levels of heterogeneity, we observe that the proposed measure outperforms the mutual information, information gain and chi-square static with an average improvement of approximately 28.5%, 19% and 9.2% respectively.
Sanasam Ranbir Singh, Hema A. Murthy, Timothy A. G
Added 19 May 2011
Updated 19 May 2011
Type Journal
Year 2010
Where JMLR
Authors Sanasam Ranbir Singh, Hema A. Murthy, Timothy A. Gonsalves
Comments (0)