Sciweavers

ACL
2009

Part of Speech Tagger for Assamese Text

13 years 2 months ago
Part of Speech Tagger for Assamese Text
Assamese is a morphologically rich, agglutinative and relatively free word order Indic language. Although spoken by nearly 30 million people, very little computational linguistic work has been done for this language. In this paper, we present our work on part of speech (POS) tagging for Assamese using the well-known Hidden Markov Model. Since no well-defined suitable tagset was available, we develop a tagset of 172 tags in consultation with experts in linguistics. For successful tagging, we examine relevant linguistic issues in Assamese. For unknown words, we perform simple morphological analysis to determine probable tags. Using a manually tagged corpus of about 10000 words for training, we obtain a tagging accuracy of nearly 87% for test inputs.
Navanath Saharia, Dhrubajyoti Das, Utpal Sharma, J
Added 16 Feb 2011
Updated 16 Feb 2011
Type Journal
Year 2009
Where ACL
Authors Navanath Saharia, Dhrubajyoti Das, Utpal Sharma, Jugal K. Kalita
Comments (0)