Sciweavers

Share
AUSDM
2008
Springer

Categorical Proportional Difference: A Feature Selection Method for Text Categorization

9 years 5 months ago
Categorical Proportional Difference: A Feature Selection Method for Text Categorization
Supervised text categorization is a machine learning task where a predefined category label is automatically assigned to a previously unlabelled document based upon characteristics of the words contained in the document. Since the number of unique words in a learning task (i.e., the number of features) can be very large, the efficiency and accuracy of the learning task can be increased by using feature selection methods to extract from a document a subset of the features that are considered most relevant. In this paper, we introduce a new feature selection method called categorical proportional difference (CPD), a measure of the degree to which a word contributes to differentiating a particular category from other categories. The CPD for a word in a particular category in a text corpus is a ratio that considers the number of documents of a category in which the word occurs and the number of documents from other categories in which the word also occurs. We conducted a series of experim...
Mondelle Simeon, Robert J. Hilderman
Added 12 Oct 2010
Updated 12 Oct 2010
Type Conference
Year 2008
Where AUSDM
Authors Mondelle Simeon, Robert J. Hilderman
Comments (0)
books