Sciweavers

Share
ACL
2004

Statistical Machine Translation with Word- and Sentence-Aligned Parallel Corpora

3 years 9 months ago
Statistical Machine Translation with Word- and Sentence-Aligned Parallel Corpora
The parameters of statistical translation models are typically estimated from sentence-aligned parallel corpora. We show that significant improvements in the alignment and translation quality of such models can be achieved by additionally including wordaligned data during training. Incorporating wordlevel alignments into the parameter estimation of the IBM models reduces alignment error rate and increases the Bleu score when compared to training the same models only on sentence-aligned data. On the Verbmobil data set, we attain a 38% reduction in the alignment error rate and a higher Bleu score with half as many training examples. We discuss how varying the ratio of word-aligned to sentencealigned data affects the expected performance gain.
Chris Callison-Burch, David Talbot, Miles Osborne
Added 30 Oct 2010
Updated 30 Oct 2010
Type Conference
Year 2004
Where ACL
Authors Chris Callison-Burch, David Talbot, Miles Osborne
Comments (0)
books