Sciweavers

ACL
2006

Unsupervised Segmentation of Chinese Text by Use of Branching Entropy

13 years 6 months ago
Unsupervised Segmentation of Chinese Text by Use of Branching Entropy
We propose an unsupervised segmentation method based on an assumption about language data: that the increasing point of entropy of successive characters is the location of a word boundary. A large-scale experiment was conducted by using 200 MB of unsegmented training data and 1 MB of test data, and precision of 90% wasattained with recall being around 80%. Moreover, we found that the precision was stable at around 90% independently of the learning data size.
Zhihui Jin, Kumiko Tanaka-Ishii
Added 30 Oct 2010
Updated 30 Oct 2010
Type Conference
Year 2006
Where ACL
Authors Zhihui Jin, Kumiko Tanaka-Ishii
Comments (0)