Design and Prototype of a Large-Scale and Fully Sense-Tagged Corpus

13 years 6 months ago

Download cwn.ling.sinica.edu.tw

Sense tagged corpus plays a very crucial role to Natural Language Processing, especially on the research of word sense disambiguation and natural language understanding. Having a large-scale Chinese sense tagged corpus seems to be very essential, but in fact, such large-scale corpus is the critical deficiency at the current stage. This paper is aimed to design a large-scale Chinese full text sense tagged Corpus, which contains over 110,000 words. The Academia Sinica Balanced Corpus of Modern Chinese (also named Sinica Corpus) is treated as the tagging object, and there are 56 full texts extracted from this corpus. By using the N-gram statistics and the information of collocation, the preparation work for automatic sense tagging is planned by combining the techniques and methods of machine learning and the probability model. In order to achieve a highly precise result, the result of automatic sense tagging needs the touch of manual revising.

Sue-jin Ker, Chu-Ren Huang, Jia-Fei Hong, Shi-Yin

Real-time Traffic

Knowledge Management | Large-scale Chinese Sense | LKR 2008 | Sense Tagged Corpus | Word Sense Disambiguation |

claim paper

» LargeScale Deployment in P2P Experiments Using the JXTA Distributed Framework

» Participatory design the will to succeed

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2008
Where	LKR
Authors	Sue-jin Ker, Chu-Ren Huang, Jia-Fei Hong, Shi-Yin Liu, Hui-Ling Jian, I-Li Su, Shu-Kai Hsieh

Comments (0)

Sciweavers

Design and Prototype of a Large-Scale and Fully Sense-Tagged Corpus

Knowledge Management | Large-scale Chinese Sense | LKR 2008 | Sense Tagged Corpus | Word Sense Disambiguation |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers