Self-Supervised Chinese Word Segmentation

11 years 10 months ago
Self-Supervised Chinese Word Segmentation
Abstract. We propose a new unsupervised training method for acquiring probability models that accurately segment Chinese character sequences into words. By constructing a core lexicon to guide unsupervised word learning, self-supervised segmentation overcomes the local maxima problems that hamper standard EM training. Our procedure uses successive EM phases to learn a good probability model over character strings, and then prunes this model with a mutual information selection criterion to obtain a more accurate word lexicon. The segmentations produced by these models are more accurate than those produced by training with EM alone.
Fuchun Peng, Dale Schuurmans
Added 30 Jul 2010
Updated 30 Jul 2010
Type Conference
Year 2001
Where IDA
Authors Fuchun Peng, Dale Schuurmans
Comments (0)