Sciweavers

CICLING
2010
Springer

An Empirical Study on the Feature's Type Effect on the Automatic Classification of Arabic Documents

12 years 11 months ago
An Empirical Study on the Feature's Type Effect on the Automatic Classification of Arabic Documents
The Arabic language is a highly flexional and morphologically very rich language. It presents serious challenges to the automatic classification of documents, one of which is determining what type of attribute to use in order to get the optimal classification results. Some people use roots or lemmas which, they say, are able to handle problems with the inflections that do not appear in other languages in that fashion. Others prefer to use character-level n-grams since n-grams are simpler to implement, language independent, and produce satisfactory results. So which of these two approaches is better, if any? This paper tries to answer this question by offering a comparative study between four feature types: words in their original form, lemmas, roots, and character level n-grams and shows how each affects the performance of the classifier. We used and compared the performance of Support Vector Machines and Na
Saeed Raheel, Joseph Dichy
Added 13 May 2011
Updated 13 May 2011
Type Journal
Year 2010
Where CICLING
Authors Saeed Raheel, Joseph Dichy
Comments (0)