Sciweavers

LREC
2008

Unsupervised Lexical Acquisition for Part of Speech Tagging

13 years 6 months ago
Unsupervised Lexical Acquisition for Part of Speech Tagging
It is known that POS tagging is not very accurate for unknown words (words which the POS tagger has not seen in the training corpora). Thus, a first step to improve the tagging accuracy would be to extend the coverage of the tagger's learned lexicon. It turns out that, through the use of a simple procedure, one can extend this lexicon without using additional, hard to obtain, hand-validated training corpora. The basic idea consists of merely adding new words along with their (correct) POS tags to the lexicon and trying to estimate the lexical distribution of these words according to similar ambiguity classes already present in the lexicon. We present a method of automatically acquire high quality POS tagging lexicons based on morphologic analysis and generation. Currently, this procedure works on Romanian for which we have a required paradigmatic generation procedure but the architecture remains general in the sense that given the appropriate substitutes for the morphological gen...
Dan Tufis, Elena Irimia, Radu Ion, Alexandru Ceaus
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2008
Where LREC
Authors Dan Tufis, Elena Irimia, Radu Ion, Alexandru Ceausu
Comments (0)