Sciweavers

ARTCOM
2009
IEEE

Chunker for Tamil

13 years 11 months ago
Chunker for Tamil
This paper presents the Part Of Speech tagger and Chunker for Tamil using Machine learning techniques. Part Of Speech tagging and chunking are the fundamental processing steps for any language processing task. Part of speech (POS) tagging is the process of labeling automatic annotation of syntactic categories for each word in a corpus. Chunking is the task of identifying and segmenting the text into syntactically correlated word groups. These are done by the machine learning techniques, where the linguistical knowledge is automatically extracted from the annotated corpus. We have developed our own tagset for annotating the corpus, which is used for training and testing the POS tagger generator and the chunker. The present tagset consists of thirty-two tags for POS and nine tags for chunking. A corpus size of two hundred and twenty five thousand words was used for training and testing the accuracy of the POS tagger and Chunker. We found that SVM based machine learning tool affords the ...
V. Dhanalakshmi, P. Padmavathy, M. Anand Kumar, K.
Added 18 May 2010
Updated 18 May 2010
Type Conference
Year 2009
Where ARTCOM
Authors V. Dhanalakshmi, P. Padmavathy, M. Anand Kumar, K. P. Soman, S. Rajendran
Comments (0)