Assessing Phrase-Based Translation Models with Oracle Decoding

11 years 4 months ago
Assessing Phrase-Based Translation Models with Oracle Decoding
Extant Statistical Machine Translation (SMT) systems are very complex softwares, which embed multiple layers of heuristics and embark very large numbers of numerical parameters. As a result, it is difficult to analyze output translations and there is a real need for tools that could help developers to better understand the various causes of errors. In this study, we make a step in that direction and present an attempt to evaluate the quality of the phrase-based translation model. In order to identify those translation errors that stem from deficiencies in the phrase table (PT), we propose to compute the oracle BLEU-4 score, that is the best score that a system based on this PT can achieve on a reference corpus. By casting the computation of the oracle BLEU-1 as an Integer Linear Programming (ILP) problem, we show that it is possible to efficiently compute accurate lower-bounds of this score, and report measures performed on several standard benchmarks. Various other applications of th...
Guillaume Wisniewski, Alexandre Allauzen, Fran&cce
Added 11 Feb 2011
Updated 11 Feb 2011
Type Journal
Year 2010
Authors Guillaume Wisniewski, Alexandre Allauzen, François Yvon
Comments (0)