Sciweavers

ICML
1999
IEEE

Feature Engineering for Text Classification

13 years 9 months ago
Feature Engineering for Text Classification
Most research in text classification to date has used a “bag of words” representation in which each feature corresponds to a single word. This paper examines some alternative ways to represent text based on syntactic and semantic relationships between words (phrases, synonyms and hypernyms). We describe the new representations and try to justify our hypothesis that they could improve the performance of a rule-based learner. The representations are evaluated using the RIPPER learning algorithm on the Reuters-21578 and DigiTrad test corpora. On their own the new representations are not found to produce significant performance improvements. We also try combining classifiers based on different representations using a majority voting technique, and this improves performance on both test collections. In our opinion, more sophisticated Natural Language Processing techniques need to be developed before better text representations can be produced for classification.
Sam Scott, Stan Matwin
Added 02 Aug 2010
Updated 02 Aug 2010
Type Conference
Year 1999
Where ICML
Authors Sam Scott, Stan Matwin
Comments (0)