In this paper, we explore a CLIR-based approach to construct large-scale Chinese-English comparable corpora, which is valuable for translation knowledge mining. The initial source...
In this paper, we describe an empirical study of Chinese chunking on a corpus, which is extracted from UPENN Chinese Treebank-4 (CTB4). First, we compare the performance of the st...
: We propose a method for text retrieval from document images without the use of OCR. Documents are segmented into character objects. Image features, namely the Vertical Traverse D...
This paper reports on a study involving the automatic extraction of Chinese legal terms. We used a word segmented corpus of Chinese court judgments to extract salient legal expres...
We present a first known result of high precision rare word bilingual extraction from comparable corpora, using aligned comparable documents and supervised classification. We in...