Sciweavers

NAACL
2010

The Best Lexical Metric for Phrase-Based Statistical MT System Optimization

13 years 2 months ago
The Best Lexical Metric for Phrase-Based Statistical MT System Optimization
Translation systems are generally trained to optimize BLEU, but many alternative metrics are available. We explore how optimizing toward various automatic evaluation metrics (BLEU, METEOR, NIST, TER) affects the resulting model. We train a state-of-the-art MT system using MERT on many parameterizations of each metric and evaluate the resulting models on the other metrics and also using human judges. In accordance with popular wisdom, we find that it's important to train on the same metric used in testing. However, we also find that training to a newer metric is only useful to the extent that the MT model's structure and features allow it to take advantage of the metric. Contrasting with TER's good correlation with human judgments, we show that people tend to prefer BLEU and NIST trained models to those trained on edit distance based metrics like TER or WER. Human preferences for METEOR trained models varies depending on the source language. Since using BLEU or NIST prod...
Daniel Cer, Christopher D. Manning, Daniel Jurafsk
Added 14 Feb 2011
Updated 14 Feb 2011
Type Journal
Year 2010
Where NAACL
Authors Daniel Cer, Christopher D. Manning, Daniel Jurafsky
Comments (0)