Automatic Part-of-Speech Tagging for Bengali: An Approach for Morphologically Rich Languages in a Poor Resource Scenario

13 years 6 months ago

Download acl.ldc.upenn.edu

This paper describes our work on building Part-of-Speech (POS) tagger for Bengali. We have use Hidden Markov Model (HMM) and Maximum Entropy (ME) based stochastic taggers. Bengali is a morphologically rich language and our taggers make use of morphological and contextual information of the words. Since only a small labeled training set is available (45,000 words), simple stochastic approach does not yield very good results. In this work, we have studied the effect of using a morphological analyzer to improve the performance of the tagger. We find that the use of morphology helps improve the accuracy of the tagger especially when less amount of tagged corpora are available.

Sandipan Dandapat, Sudeshna Sarkar, Anupam Basu

Real-time Traffic

ACL 2007 | Computational Linguistics | Hidden Markov Model | Simple Stochastic Approach | Stochastic Taggers |

claim paper

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2007
Where	ACL
Authors	Sandipan Dandapat, Sudeshna Sarkar, Anupam Basu

Sciweavers

Automatic Part-of-Speech Tagging for Bengali: An Approach for Morphologically Rich Languages in a Poor Resource Scenario

ACL 2007 | Computational Linguistics | Hidden Markov Model | Simple Stochastic Approach | Stochastic Taggers |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers