Mandarin Part-of-Speech Tagging and Discriminative Reranking

13 years 6 months ago

Download www.speech.sri.com

We present in this paper methods to improve HMM-based part-of-speech (POS) tagging of Mandarin. We model the emission probability of an unknown word using all the characters in the word, and enrich the standard left-to-right trigram estimation of word emission probabilities with a right-to-left prediction of the word by making use of the current and next tags. In addition, we utilize the RankBoost-based reranking algorithm to rerank the N-best outputs of the HMMbased tagger using various n-gram, morphological, and dependency features. Two methods are proposed to improve the generalization performance of the reranking algorithm. Our reranking model achieves an accuracy of 94.68% using n-gram and morphological features on the Penn Chinese Treebank 5.2, and is able to further improve the accuracy to 95.11% with the addition of dependency features.

Zhongqiang Huang, Mary P. Harper, Wen Wang

Real-time Traffic

EMNLP 2007 | Natural Language Processing | RankBoost-based Reranking Algorithm | Reranking Algorithm | Word Emission Probabilities |

claim paper

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2007
Where	EMNLP
Authors	Zhongqiang Huang, Mary P. Harper, Wen Wang

Comments (0)

Sciweavers

Mandarin Part-of-Speech Tagging and Discriminative Reranking

EMNLP 2007 | Natural Language Processing | RankBoost-based Reranking Algorithm | Reranking Algorithm | Word Emission Probabilities |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers