Part of Speech Tagger for Assamese Text

14 years 11 months ago

Download www.aclweb.org

Assamese is a morphologically rich, agglutinative and relatively free word order Indic language. Although spoken by nearly 30 million people, very little computational linguistic work has been done for this language. In this paper, we present our work on part of speech (POS) tagging for Assamese using the well-known Hidden Markov Model. Since no well-defined suitable tagset was available, we develop a tagset of 172 tags in consultation with experts in linguistics. For successful tagging, we examine relevant linguistic issues in Assamese. For unknown words, we perform simple morphological analysis to determine probable tags. Using a manually tagged corpus of about 10000 words for training, we obtain a tagging accuracy of nearly 87% for test inputs.

Navanath Saharia, Dhrubajyoti Das, Utpal Sharma, J

Real-time Traffic

ACL 2009 | Computational Linguistics | Order Indic Language | Relevant Linguistic Issues | Well-known Hidden Markov |

claim paper

» Chunker for Tamil

» A Case Restoration Approach to Named Entity Tagging in Degraded Documents

» Use of a genetic algorithm in brills transformationbased partofspeech tagger

» Probabilistic and RuleBased Tagger of an Inflective Language a Comparison

» A MultiAgent System for POSTagging Vocalized Arabic Texts

» Does BaumWelch Reestimation Help Taggers

» Discovering Word Meanings Based on Frequent Termsets

» ANERsys 20 Conquering the NER Task for the Arabic Language by Combining the Maximum Entrop...

Post Info
More Details (n/a)

Added	16 Feb 2011
Updated	16 Feb 2011
Type	Journal
Year	2009
Where	ACL
Authors	Navanath Saharia, Dhrubajyoti Das, Utpal Sharma, Jugal K. Kalita

Comments (0)

Sciweavers

Part of Speech Tagger for Assamese Text

ACL 2009 | Computational Linguistics | Order Indic Language | Relevant Linguistic Issues | Well-known Hidden Markov |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers