As a sequence of two or more consecutive individual words inherent with contextual semantics of individual words, multi-word attracts much attention from statistical linguistics an...
In Chinese, phrases and named entities play a central role in information retrieval. Abbreviations, however, make keyword-based approaches less effective. This paper presents an em...
Improving the precision of information retrieval has been a challenging issue on Chinese Web. As exemplified by Chinese recipes on the Web, it is not easy/natural for people to us...
This paper proposes a chunking strategy to detect unknown words in Chinese word segmentation. First, a raw sentence is pre-segmented into a sequence of word atoms 1 using a maximum...
Statistical language models should improve as the size of the n-grams increases from 3 to 5 or higher. However, the number of parameters and calculations, and the storage requirem...
Le Quan Ha, Philip Hanna, Darryl Stewart, F. Jack ...