Clustering the Feature Space

10 years 4 months ago
Clustering the Feature Space
Abstract Dino Ienco and Rosa Meo Dipartimento di Informatica, Universit`a di Torino, Italy In this paper we propose and test the use of hierarchical clustering for feature selection in databases. The clustering method is Ward's with a distance measure based on Goodman-Kruskal . We motivate the choice of this measure and compare it with other ones. Our hierarchical clustering is applied to over 40 data-sets from UCI archive. The proposed approach is interesting from many viewpoints. First, it produces the feature subsets dendrogram which serves as a valuable tool to study relevance relationships among features. Secondarily, the dendrogram is used in a feature selection algorithm to select the best features by a wrapper method. Experiments were run with three different families of classifiers: Naive Bayes, decision trees and k nearest neighbours. Our method allows all the three classifiers to generally outperform their corresponding ones without feature selection. We compare our fea...
Dino Ienco, Rosa Meo
Added 30 Oct 2010
Updated 30 Oct 2010
Type Conference
Year 2008
Where SEBD
Authors Dino Ienco, Rosa Meo
Comments (0)