Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

13

ICML
1999
IEEE

favoriteEmaildiscussreport

195views Machine Learning» more ICML 1999»

Feature Engineering for Text Classification

13 years 9 months ago

Feature Engineering for Text Classification

Download gking.harvard.edu

Most research in text classification to date has used a “bag of words” representation in which each feature corresponds to a single word. This paper examines some alternative ways to represent text based on syntactic and semantic relationships between words (phrases, synonyms and hypernyms). We describe the new representations and try to justify our hypothesis that they could improve the performance of a rule-based learner. The representations are evaluated using the RIPPER learning algorithm on the Reuters-21578 and DigiTrad test corpora. On their own the new representations are not found to produce significant performance improvements. We also try combining classifiers based on different representations using a majority voting technique, and this improves performance on both test collections. In our opinion, more sophisticated Natural Language Processing techniques need to be developed before better text representations can be produced for classification.

Sam Scott, Stan Matwin

Real-time Traffic

Better Text Representations | DigiTrad Test Corpora | ICML 1999 | Machine Learning | RIPPER Learning Algorithm |

claim paper

Related Content

» Classification of ProteinProtein Interaction FullText Documents Using Text and Citation Ne...

» An Effective and Robust Method for Short Text Classification

» Short text classification in twitter to improve information filtering

» A Comparison of Stylometric and Lexical Features for Web Genre Classification and Emotion ...

» Novel Edge Features for Text Frame Classification in Video

» Evaluating a Text Mining Based Educational Search Portal

» Text classification using multiword features

» Encoding Ordinal Features into Binary Features for Text Classification

» Using feature construction to avoid large feature spaces in text classification

Post Info
More Details (n/a)

Added	02 Aug 2010
Updated	02 Aug 2010
Type	Conference
Year	1999
Where	ICML
Authors	Sam Scott, Stan Matwin

Comments (0)