Sciweavers

ACL
2008

Arabic Morphological Tagging, Diacritization, and Lemmatization Using Lexeme Models and Feature Ranking

13 years 6 months ago
Arabic Morphological Tagging, Diacritization, and Lemmatization Using Lexeme Models and Feature Ranking
We investigate the tasks of general morphological tagging, diacritization, and lemmatization for Arabic. We show that for all tasks we consider, both modeling the lexeme explicitly, and retuning the weights of individual classifiers for the specific task, improve the performance. 1 Previous Work Arabic has about 14 dimensions of inflection (most of them orthogonal), and in our training corpus of about 288,000 words we find 3279 complete morphological tags, with up to 100,000 possible tags. Because of the large number of tags, it is clear that morphological tagging cannot be construed as a simple classification task. Hajic (2000) is the first to use a dictionary as a source of possible morphological analyses (and hence tags) for an inflected word form, and then redefined the tagging task as a choice among the tags proposed by the dictionary, using a log-linear model trained on specific ambiguity classes for individual morphological features. Hajic et al. (2005) implement the approach o...
Ryan Roth, Owen Rambow, Nizar Habash, Mona T. Diab
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2008
Where ACL
Authors Ryan Roth, Owen Rambow, Nizar Habash, Mona T. Diab, Cynthia Rudin
Comments (0)