Sciweavers

COLING
2010

Tibetan Number Identification Based on Classification of Number Components in Tibetan Word Segmentation

12 years 11 months ago
Tibetan Number Identification Based on Classification of Number Components in Tibetan Word Segmentation
Tibetan word segmentation is essential for Tibetan information processing. People mainly use the basic machine matching method which is based on dictionary to segment Tibetan words at present, because there is no segmented Tibetan corpus which can be used for training in Tibetan word segmentation. But the method based on dictionary is not fit to Tibetan number identification. This paper studies the characteristics of Tibetan numbers, and then, proposes a method to identify Tibetan numbers based on classification of number components. The method first tags every number component according to the class it belongs to while segmenting, and then updates the tag series according to some predefined rules. At last adjacent number components are combined to form a Tibetan number if they meet a certain requirement. In the testing result from 7938K Tibetan corpus, the identification accuracy is 99.21%.
Huidan Liu, Weina Zhao, Minghua Nuo, Li Jiang, Jia
Added 13 May 2011
Updated 13 May 2011
Type Journal
Year 2010
Where COLING
Authors Huidan Liu, Weina Zhao, Minghua Nuo, Li Jiang, Jian Wu, Yeping He
Comments (0)