Chunker for Tamil

15 years 12 months ago

Download www.infitt.org

This paper presents the Part Of Speech tagger and Chunker for Tamil using Machine learning techniques. Part Of Speech tagging and chunking are the fundamental processing steps for any language processing task. Part of speech (POS) tagging is the process of labeling automatic annotation of syntactic categories for each word in a corpus. Chunking is the task of identifying and segmenting the text into syntactically correlated word groups. These are done by the machine learning techniques, where the linguistical knowledge is automatically extracted from the annotated corpus. We have developed our own tagset for annotating the corpus, which is used for training and testing the POS tagger generator and the chunker. The present tagset consists of thirty-two tags for POS and nine tags for chunking. A corpus size of two hundred and twenty five thousand words was used for training and testing the accuracy of the POS tagger and Chunker. We found that SVM based machine learning tool affords the ...

V. Dhanalakshmi, P. Padmavathy, M. Anand Kumar, K.

Real-time Traffic

ARTCOM 2009 | Communications | Machine Learning Techniques | Part Of Speech | POS Tagger |

claim paper

Post Info
More Details (n/a)

Added	18 May 2010
Updated	18 May 2010
Type	Conference
Year	2009
Where	ARTCOM
Authors	V. Dhanalakshmi, P. Padmavathy, M. Anand Kumar, K. P. Soman, S. Rajendran

Comments (0)

Sciweavers

Chunker for Tamil

ARTCOM 2009 | Communications | Machine Learning Techniques | Part Of Speech | POS Tagger |

Explore & Download

Productivity Tools

Sciweavers