A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages

15 years 2 months ago

Download www.comp.nus.edu.sg

We propose a language-independent approach for improving statistical machine translation for morphologically rich languages using a hybrid morpheme-word representation where the basic unit of translation is the morpheme, but word boundaries are respected at all stages of the translation process. Our model extends the classic phrase-based model by means of (1) word boundary-aware morpheme-level phrase extraction, (2) minimum error-rate training for a morpheme-level translation model using word-level BLEU, and (3) joint scoring with morpheme- and word-level language models. Further improvements are achieved by combining our model with the classic one. The evaluation on English to Finnish using Europarl (714K sentence pairs; 15.5M English words) shows statistically significant improvements over the classic model based on BLEU and human judgments.

Minh-Thang Luong, Preslav Nakov, Min-Yen Kan

Real-time Traffic

Classic Phrase-based Model | EMNLP 2010 | Model | Morpheme-level Translation Model | Natural Language Processing |

claim paper

» Tackling Sparse Data Issue in Machine Translation Evaluation

» Combination of Arabic Preprocessing Schemes for Statistical Machine Translation

Post Info
More Details (n/a)

Added	11 Feb 2011
Updated	11 Feb 2011
Type	Journal
Year	2010
Where	EMNLP
Authors	Minh-Thang Luong, Preslav Nakov, Min-Yen Kan

Comments (0)

Sciweavers

A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages

Classic Phrase-based Model | EMNLP 2010 | Model | Morpheme-level Translation Model | Natural Language Processing |

Explore & Download

Productivity Tools

Sciweavers