Sciweavers

EMNLP
2010

A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages

13 years 8 months ago
A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages
We propose a language-independent approach for improving statistical machine translation for morphologically rich languages using a hybrid morpheme-word representation where the basic unit of translation is the morpheme, but word boundaries are respected at all stages of the translation process. Our model extends the classic phrase-based model by means of (1) word boundary-aware morpheme-level phrase extraction, (2) minimum error-rate training for a morpheme-level translation model using word-level BLEU, and (3) joint scoring with morpheme- and word-level language models. Further improvements are achieved by combining our model with the classic one. The evaluation on English to Finnish using Europarl (714K sentence pairs; 15.5M English words) shows statistically significant improvements over the classic model based on BLEU and human judgments.
Minh-Thang Luong, Preslav Nakov, Min-Yen Kan
Added 11 Feb 2011
Updated 11 Feb 2011
Type Journal
Year 2010
Where EMNLP
Authors Minh-Thang Luong, Preslav Nakov, Min-Yen Kan
Comments (0)