We show that unseen words account for a large part of the translation error when moving to new domains. Using an extension of a recent approach to mining translations from compara...
In Chinese texts, words composed of single or multiple characters are not separated by spaces, unlike most western languages. Therefore Chinese word segmentation is considered an ...
Statistical machine translation (SMT) requires a large parallel corpus, which is available only for restricted language pairs and domains. To expand the language pairs and domains...
Current statistical machine translation (SMT) systems are trained on sentencealigned and word-aligned parallel text collected from various sources. Translation model parameters ar...
Spyros Matsoukas, Antti-Veikko I. Rosti, Bing Zhan...