Sciweavers

EMNLP
2007

Mandarin Part-of-Speech Tagging and Discriminative Reranking

13 years 6 months ago
Mandarin Part-of-Speech Tagging and Discriminative Reranking
We present in this paper methods to improve HMM-based part-of-speech (POS) tagging of Mandarin. We model the emission probability of an unknown word using all the characters in the word, and enrich the standard left-to-right trigram estimation of word emission probabilities with a right-to-left prediction of the word by making use of the current and next tags. In addition, we utilize the RankBoost-based reranking algorithm to rerank the N-best outputs of the HMMbased tagger using various n-gram, morphological, and dependency features. Two methods are proposed to improve the generalization performance of the reranking algorithm. Our reranking model achieves an accuracy of 94.68% using n-gram and morphological features on the Penn Chinese Treebank 5.2, and is able to further improve the accuracy to 95.11% with the addition of dependency features.
Zhongqiang Huang, Mary P. Harper, Wen Wang
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2007
Where EMNLP
Authors Zhongqiang Huang, Mary P. Harper, Wen Wang
Comments (0)