Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

84

CICLING
2010
Springer

favoriteEmaildiscussreport

116views Natural Language Processing» more CICLING 2010»

An Empirical Study on the Feature's Type Effect on the Automatic Classification of Arabic Documents

14 years 6 months ago

An Empirical Study on the Feature's Type Effect on the Automatic Classification of Arabic Documents

Download www.raheels.net

The Arabic language is a highly flexional and morphologically very rich language. It presents serious challenges to the automatic classification of documents, one of which is determining what type of attribute to use in order to get the optimal classification results. Some people use roots or lemmas which, they say, are able to handle problems with the inflections that do not appear in other languages in that fashion. Others prefer to use character-level n-grams since n-grams are simpler to implement, language independent, and produce satisfactory results. So which of these two approaches is better, if any? This paper tries to answer this question by offering a comparative study between four feature types: words in their original form, lemmas, roots, and character level n-grams and shows how each affects the performance of the classifier. We used and compared the performance of Support Vector Machines and Na

Saeed Raheel, Joseph Dichy

Real-time Traffic

Character Level N-grams | Character-level N-grams | CICLING 2010 | Natural Language Processing | Support Vector Machines |

claim paper

Related Content

» Learning to Classify Texts Using Positive and Unlabeled Data

» Discriminative Frequent Pattern Analysis for Effective Classification

» A comparative study on classifying the functions of web page blocks

» Feature Reinforcement Approach to Polylingual Text Categorization

» Learning Visual Shape Lexicon for Document Image Content Recognition

» A Characterization of Wordnet Features in Boolean Models For Text Classification

» Robust feature induction for support vector machines

» An Empirical Approach to Modeling Uncertainty in Intrusion Analysis

» Percent perfect performance PPP

Post Info
More Details (n/a)

Added	13 May 2011
Updated	13 May 2011
Type	Journal
Year	2010
Where	CICLING
Authors	Saeed Raheel, Joseph Dichy

Comments (0)