Sciweavers

EMNLP
2007

Extending a Thesaurus in the Pan-Chinese Context

13 years 5 months ago
Extending a Thesaurus in the Pan-Chinese Context
In this paper, we address a unique problem in Chinese language processing and report on our study on extending a Chinese thesaurus with region-specific words, mostly from the financial domain, from various Chinese speech communities. With the larger goal of automatically constructing a Pan-Chinese lexical resource, this work aims at taking an existing semantic classificatory structure as leverage and incorporating new words into it. In particular, it is important to see if the classification could accommodate new words from heterogeneous data sources, and whether simple similarity measures and clustering methods could cope with such variation. We use the cosine function for similarity and test it on automatically classifying 120 target words from four regions, using different datasets for the extraction of feature vectors. The automatic classification results were evaluated against human judgement, and the performance was encouraging, with accuracy reaching over 85% in some cases. Thu...
Oi Yee Kwong, Benjamin Ka-Yin T'sou
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2007
Where EMNLP
Authors Oi Yee Kwong, Benjamin Ka-Yin T'sou
Comments (0)