Sciweavers

FLAIRS
2007

Lexicon Development and POS Tagging Using a Tagged Bengali News Corpus

13 years 6 months ago
Lexicon Development and POS Tagging Using a Tagged Bengali News Corpus
Lexicon development and Part of Speech (POS) tagging are very important for almost all Natural Language Processing(NLP) application areas. The rapid development of these resources and tools using machine learning techniques for less computerized languages requires appropriately tagged corpus. A tagged Bengali news corpus has been developed from the web archive of a widely read Bengali newspaper. This corpus is then used for lexicon development and POS tagging. Tagged Bengali News Corpus Development Newspaper is a huge source of readily available documents. A tagged corpus has been developed from the web archive of a very well known and widely read Bengali News Paper. The development of the tagged Bengali news corpus includes language resource acquisition using a web crawler, language resource creation which includes HTML file cleaning and code conversion, as well as language resource annotation that involves defining a tag set and subsequent tagging of the news corpus. Code conversi...
Asif Ekbal, Sivaji Bandyopadhyay
Added 02 Oct 2010
Updated 02 Oct 2010
Type Conference
Year 2007
Where FLAIRS
Authors Asif Ekbal, Sivaji Bandyopadhyay
Comments (0)