Boosting Statistical Machine Translation by Lemmatization and Linear Interpolation

13 years 6 months ago

Download www.mt-archive.info

Data sparseness is one of the factors that degrade statistical machine translation (SMT). Existing work has shown that using morphosyntactic information is an effective solution to data sparseness. However, fewer efforts have been made for Chinese-to-English SMT with using English morpho-syntactic analysis. We found that while English is a language with less inﬂection, using English lemmas in training can signiﬁcantly improve the quality of word alignment that leads to yield better translation performance. We carried out comprehensive experiments on multiple training data of varied sizes to prove this. We also proposed a new effective linear interpolation method to integrate multiple homologous features of translation models.

Ruiqiang Zhang, Eiichiro Sumita

Real-time Traffic

ACL 2007 | Computational Linguistics | Data Sparseness | Effective Linear Interpolation | Statistical Machine Translation |

claim paper

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2007
Where	ACL
Authors	Ruiqiang Zhang, Eiichiro Sumita

Sciweavers

Boosting Statistical Machine Translation by Lemmatization and Linear Interpolation

ACL 2007 | Computational Linguistics | Data Sparseness | Effective Linear Interpolation | Statistical Machine Translation |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers