Sciweavers

CIARP
2006
Springer

Oscillating Feature Subset Search Algorithm for Text Categorization

14 years 2 months ago
Oscillating Feature Subset Search Algorithm for Text Categorization
Abstract. A major characteristic of text document categorization problems is the extremely high dimensionality of text data. In this paper we explore the usability of the Oscillating Search algorithm for feature/word selection in text categorization. We propose to use the multiclass Bhattacharyya distance for multinomial model as the global feature subset selection criterion for reducing the dimensionality of the bag of words vector document representation. This criterion takes into consideration inter-feature relationships. We experimentally compare three subset selection procedures: the commonly used best individual feature selection based on information gain, the same based on individual Bhattacharyya distance, and the Oscillating Search to maximize Bhattacharyya distance on groups of features. The obtained feature subsets are then tested on the standard Reuters data with two classifiers: the multinomial Bayes and the linear SVM. The presented experimental results illustrate that us...
Jana Novovicová, Petr Somol, Pavel Pudil
Added 20 Aug 2010
Updated 20 Aug 2010
Type Conference
Year 2006
Where CIARP
Authors Jana Novovicová, Petr Somol, Pavel Pudil
Comments (0)