Sciweavers

APWEB
2008
Springer

A Study on Multi-word Extraction from Chinese Documents

13 years 5 months ago
A Study on Multi-word Extraction from Chinese Documents
As a sequence of two or more consecutive individual words inherent with contextual semantics of individual words, multi-word attracts much attention from statistical linguistics and of extensive applications in text mining. In this paper, we carried out a series studies on multi-word extraction from Chinese documents. Firstly, we proposed a new statistical method, augmented mutual information (AMI), for words’ dependency. Experiment results demonstrate that AMI method can produce a recall on average as 80% and its precision is about 20%-30%. Secondly, we attempt to utilize the variance of occurrence frequencies of individual words in a multi-word candidate to deal with the rare occurrence problem. But experimental results cannot validate the effectiveness of variance. Thirdly, we developed a syntactic method based on lexical regularities of Chinese multi-word to extract the multi-words from Chinese documents. Experimental results demonstrate that this syntactical method can produce a...
Wen Zhang, Taketoshi Yoshida, Xijin Tang
Added 08 Nov 2010
Updated 08 Nov 2010
Type Conference
Year 2008
Where APWEB
Authors Wen Zhang, Taketoshi Yoshida, Xijin Tang
Comments (0)