The Best Lexical Metric for Phrase-Based Statistical MT System Optimization

13 years 2 months ago

Download www.aclweb.org

Translation systems are generally trained to optimize BLEU, but many alternative metrics are available. We explore how optimizing toward various automatic evaluation metrics (BLEU, METEOR, NIST, TER) affects the resulting model. We train a state-of-the-art MT system using MERT on many parameterizations of each metric and evaluate the resulting models on the other metrics and also using human judges. In accordance with popular wisdom, we find that it's important to train on the same metric used in testing. However, we also find that training to a newer metric is only useful to the extent that the MT model's structure and features allow it to take advantage of the metric. Contrasting with TER's good correlation with human judgments, we show that people tend to prefer BLEU and NIST trained models to those trained on edit distance based metrics like TER or WER. Human preferences for METEOR trained models varies depending on the source language. Since using BLEU or NIST prod...

Daniel Cer, Christopher D. Manning, Daniel Jurafsk

Real-time Traffic

BLEU | Computational Linguistics | Human Judgments | Metric | NAACL 2010 |

claim paper

Post Info
More Details (n/a)

Added	14 Feb 2011
Updated	14 Feb 2011
Type	Journal
Year	2010
Where	NAACL
Authors	Daniel Cer, Christopher D. Manning, Daniel Jurafsky

Comments (0)

Sciweavers

The Best Lexical Metric for Phrase-Based Statistical MT System Optimization

BLEU | Computational Linguistics | Human Judgments | Metric | NAACL 2010 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers