In Chinese texts, words composed of single or multiple characters are not separated by spaces, unlike most western languages. Therefore Chinese word segmentation is considered an ...
The dominant practice of statistical machine translation (SMT) uses the same Chinese word segmentation specification in both alignment and translation rule induction steps in buil...
Ning Xi, Guangchao Tang, Xinyu Dai, Shujian Huang,...
Mining bilingual data (including bilingual sentences and terms1 ) from the Web can benefit many NLP applications, such as machine translation and cross language information retrie...
Long Jiang, Shiquan Yang, Ming Zhou, Xiaohua Liu, ...
We address the problem of extracting bilingual chunk pairs from parallel text to create training sets for statistical machine translation. We formulate the problem in terms of a s...
We propose a novel approach to crosslingual language model (LM) adaptation based on bilingual Latent Semantic Analysis (bLSA). A bLSA model is introduced which enables latent topi...