Using PageRank in Feature Selection

8 years 4 months ago
Using PageRank in Feature Selection
Abstract. Feature selection is an important task in data mining because it allows to reduce the data dimensionality and eliminates the noisy variables. Traditionally, feature selection has been applied in supervised scenarios rather than in unsupervised ones. Nowadays, the amount of unsupervised data available on the web is huge, thus motivating an increasing interest in feature selection for unsupervised data. In this paper we present some results in the domain of document categorization. We use the well-known PageRank algorithm to perform a random-walk through the feature space of the documents. This allows to rank and subsequently choose those features that better represent the data set. When compared with previous work based on information gain, our method allows classifiers to obtain good accuracy especially when few features are retained.
Dino Ienco, Rosa Meo, Marco Botta
Added 30 Oct 2010
Updated 30 Oct 2010
Type Conference
Year 2008
Where SEBD
Authors Dino Ienco, Rosa Meo, Marco Botta
Comments (0)