Sciweavers

ACL
2006

Subword-Based Tagging for Confidence-Dependent Chinese Word Segmentation

13 years 6 months ago
Subword-Based Tagging for Confidence-Dependent Chinese Word Segmentation
We proposed a subword-based tagging for Chinese word segmentation to improve the existing character-based tagging. The subword-based tagging was implemented using the maximum entropy (MaxEnt) and the conditional random fields (CRF) methods. We found that the proposed subword-based tagging outperformed the character-based tagging in all comparative experiments. In addition, we proposed a confidence measure approach to combine the results of a dictionary-based and a subword-tagging-based segmentation. This approach can produce an ideal tradeoff between the in-vocaulary rate and out-of-vocabulary rate. Our techniques were evaluated using the test data from Sighan Bakeoff 2005. We achieved higher F-scores than the best results in three of the four corpora: PKU(0.951), CITYU(0.950) and MSR(0.971).
Ruiqiang Zhang, Gen-ichiro Kikui, Eiichiro Sumita
Added 30 Oct 2010
Updated 30 Oct 2010
Type Conference
Year 2006
Where ACL
Authors Ruiqiang Zhang, Gen-ichiro Kikui, Eiichiro Sumita
Comments (0)