Combining Unsupervised and Supervised Alignments for MT: An Empirical Study

13 years 2 months ago

Download aclweb.org

Word alignment plays a central role in statistical MT (SMT) since almost all SMT systems extract translation rules from word aligned parallel training data. While most SMT systems use unsupervised algorithms (e.g. GIZA++) for training word alignment, supervised methods, which exploit a small amount of human-aligned data, have become increasingly popular recently. This work empirically studies the performance of these two classes of alignment algorithms and explores strategies to combine them to improve overall system performance. We used two unsupervised aligners, GIZA++ and HMM, and one supervised aligner, ITG, in this study. To avoid language and genre specific conclusions, we ran experiments on test sets consisting of two language pairs (Chinese-to-English and Arabicto-English) and two genres (newswire and weblog). Results show that the two classes of algorithms achieve the same level of MT performance. Modest improvements were achieved by taking the union of the translation gramma...

Jinxi Xu, Antti-Veikko I. Rosti

Real-time Traffic

Algorithms | EMNLP 2010 | Natural Language Processing | Smt Systems | Word Alignment |

claim paper

Added	11 Feb 2011
Updated	11 Feb 2011
Type	Journal
Year	2010
Where	EMNLP
Authors	Jinxi Xu, Antti-Veikko I. Rosti

Sciweavers

Combining Unsupervised and Supervised Alignments for MT: An Empirical Study

Algorithms | EMNLP 2010 | Natural Language Processing | Smt Systems | Word Alignment |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers