Sciweavers

IRAL
2000
ACM

On the use of words and n-grams for Chinese information retrieval

13 years 9 months ago
On the use of words and n-grams for Chinese information retrieval
: In the processing of Chinese documents and queries in information retrieval (IR), one has to identify the units that are used as indexes. Words and n-grams have been used as indexes in several previous studies, which showed that both kinds of indexes lead to comparable IR performances. In this study, we carry out more experiments on different ways to segment documents and queries, and to combine words with n-grams. Our experiments show that a combination of the longest-matching algorithm with single characters is the best choice.
Jian-Yun Nie, Jianfeng Gao, Jian Zhang, Ming Zhou
Added 01 Aug 2010
Updated 01 Aug 2010
Type Conference
Year 2000
Where IRAL
Authors Jian-Yun Nie, Jianfeng Gao, Jian Zhang, Ming Zhou
Comments (0)