Sciweavers

ACL
2007

Boosting Statistical Machine Translation by Lemmatization and Linear Interpolation

13 years 6 months ago
Boosting Statistical Machine Translation by Lemmatization and Linear Interpolation
Data sparseness is one of the factors that degrade statistical machine translation (SMT). Existing work has shown that using morphosyntactic information is an effective solution to data sparseness. However, fewer efforts have been made for Chinese-to-English SMT with using English morpho-syntactic analysis. We found that while English is a language with less inflection, using English lemmas in training can significantly improve the quality of word alignment that leads to yield better translation performance. We carried out comprehensive experiments on multiple training data of varied sizes to prove this. We also proposed a new effective linear interpolation method to integrate multiple homologous features of translation models.
Ruiqiang Zhang, Eiichiro Sumita
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2007
Where ACL
Authors Ruiqiang Zhang, Eiichiro Sumita
Comments (0)