Sciweavers

NLDB
2005
Springer

Automatic Filtering of Bilingual Corpora for Statistical Machine Translation

13 years 10 months ago
Automatic Filtering of Bilingual Corpora for Statistical Machine Translation
Abstract. For many applications such as machine translation and bilingual information retrieval, the bilingual corpora play an important role in training the system. Because they are obtained through automatic or semi automatic methods, they usually include noise, sentence pairs which are worthless or even harmful for training the system. We study the effect of different levels of corpus noise on an end-to-end statistical machine translation system. We also propose an efficient method for corpus filtering. This method filters out the noisy part of a corpus based on the state-of-the-art word alignment models. We show the efficiency of this method on the basis of the sentence misalignment rate of the filtered corpus and its positive effect on the translation quality.
Shahram Khadivi, Hermann Ney
Added 28 Jun 2010
Updated 28 Jun 2010
Type Conference
Year 2005
Where NLDB
Authors Shahram Khadivi, Hermann Ney
Comments (0)