Ranking Categorical Features Using Generalization Properties

12 years 1 months ago
Ranking Categorical Features Using Generalization Properties
Feature ranking is a fundamental machine learning task with various applications, including feature selection and decision tree learning. We describe and analyze a new feature ranking method that supports categorical features with a large number of possible values. We show that existing ranking criteria rank a feature according to the training error of a predictor based on the feature. This approach can fail when ranking categorical features with many values. We propose the Ginger ranking criterion, that estimates the generalization error of the predictor associated with the Gini index. We show that for almost all training sets, the Ginger criterion produces an accurate estimation of the true generalization error, regardless of the number of values in a categorical feature. We also address the question of finding the optimal predictor that is based on a single categorical feature. It is shown that the predictor associated with the misclassification error criterion has the minimal expe...
Sivan Sabato, Shai Shalev-Shwartz
Added 13 Dec 2010
Updated 13 Dec 2010
Type Journal
Year 2008
Where JMLR
Authors Sivan Sabato, Shai Shalev-Shwartz
Comments (0)