Abstract. We propose a new unsupervised training method for acquiring probability models that accurately segment Chinese character sequences into words. By constructing a core lexi...
We propose a self-supervised word-segmentation technique for Chinese information retrieval. This method combines the advantages of traditional dictionary based approaches with cha...
Fuchun Peng, Xiangji Huang, Dale Schuurmans, Nick ...
This report describes the English-Chinese crosslanguage retrieval experiments at Berkeley for TREC-9 Cross-Language Information Retrieval track. We present a simple and effective ...
: In the processing of Chinese documents and queries in information retrieval (IR), one has to identify the units that are used as indexes. Words and n-grams have been used as inde...
It is commonly believed that word segmentation accuracy is monotonically related to retrieval performance in Chinese information retrieval. In this paper we show that, for Chinese...
Fuchun Peng, Xiangji Huang, Dale Schuurmans, Nick ...