Sciweavers

ACL
2009

Robust Machine Translation Evaluation with Entailment Features

13 years 2 months ago
Robust Machine Translation Evaluation with Entailment Features
Existing evaluation metrics for machine translation lack crucial robustness: their correlations with human quality judgments vary considerably across languages and genres. We believe that the main reason is their inability to properly capture meaning: A good translation candidate means the same thing as the reference translation, regardless of formulation. We propose a metric that evaluates MT output based on a rich set of features motivated by textual entailment, such as lexical-semantic (in-)compatibility and argument structure overlap. We compare this metric against a combination metric of four state-of-theart scores (BLEU, NIST, TER, and METEOR) in two different settings. The combination metric outperforms the individual scores, but is bested by the entailment-based metric. Combining the entailment and traditional features yields further improvements.
Sebastian Padó, Michel Galley, Daniel Juraf
Added 16 Feb 2011
Updated 16 Feb 2011
Type Journal
Year 2009
Where ACL
Authors Sebastian Padó, Michel Galley, Daniel Jurafsky, Christopher D. Manning
Comments (0)