Sciweavers

ICMLC
2010
Springer

Approaches to improving corpus quality for statistical machine translation

13 years 3 months ago
Approaches to improving corpus quality for statistical machine translation
: The performance of a statistical machine translation (SMT) system heavily depends on the quantity and quality of the bilingual language resource. However, the pervious work mainly focuses on the quantity and tries to collect more bilingual data. In this paper, we aim to optimize the bilingual corpus to improve the performance of the translation system. We propose methods to process the bilingual language data by filtering noise and selecting more informative sentences from the training corpus and the development corpus. The experimental results show that we can obtain a competitive performance using less data compared with using all available data.
Peng Liu, Yu Zhou, Chengqing Zong
Added 26 Jan 2011
Updated 26 Jan 2011
Type Journal
Year 2010
Where ICMLC
Authors Peng Liu, Yu Zhou, Chengqing Zong
Comments (0)