Sciweavers

KDD
2002
ACM

Integrating feature and instance selection for text classification

14 years 5 months ago
Integrating feature and instance selection for text classification
Instance selection and feature selection are two orthogonal methods for reducing the amount and complexity of data. Feature selection aims at the reduction of redundant features in a dataset whereas instance selection aims at the reduction of the number of instances. So far, these two methods have mostly been considered in isolation. In this paper, we present a new algorithm, which we call FIS (Feature and Instance Selection) that targets both problems simultaneously in the context of text classification Our experiments on the Reuters and 20-Newsgroups datasets show that FIS considerably reduces both the number of features and the number of instances. The accuracy of a range of classifiers including Na?ve Bayes, TAN and LB considerably improves when using the FIS preprocessed datasets, matching and exceeding that of Support Vector Machines, which is currently considered to be one of the best text classification methods. In all cases the results are much better compared to Mutual Infor...
Dimitris Fragoudis, Dimitris Meretakis, Spiros Lik
Added 30 Nov 2009
Updated 30 Nov 2009
Type Conference
Year 2002
Where KDD
Authors Dimitris Fragoudis, Dimitris Meretakis, Spiros Likothanassis
Comments (0)