Sciweavers

CSJM
2006

Text Classification Using Word-Based PPM Models

13 years 4 months ago
Text Classification Using Word-Based PPM Models
Text classification is one of the most actual among the natural language processing problems. In this paper the application of word-based PPM (Prediction by Partial Matching) model for automatic content-based text classification is described. Our main idea is that words and especially word combinations are more relevant features for many text classification tasks. Key-words for a document in most cases are not just single words but combination of two or three words. The main result of the implemented experiments proved applicability of word-based PPM models for content-based text classification. Although in some cases the entropy difference which influenced the choice was rather small (several hundredths), most of the documents (up to 97%) were classified correctly.
Victoria Bobicev
Added 11 Dec 2010
Updated 11 Dec 2010
Type Journal
Year 2006
Where CSJM
Authors Victoria Bobicev
Comments (0)