PORT: a Precision-Order-Recall MT Evaluation Metric for Tuning

13 years 5 months ago

Download aclweb.org

Many machine translation (MT) evaluation metrics have been shown to correlate better with human judgment than BLEU. In principle, tuning on these metrics should yield better systems than tuning on BLEU. However, due to issues such as speed, requirements for linguistic resources, and optimization difficulty, they have not been widely adopted for tuning. This paper presents PORT1 , a new MT evaluation metric which combines precision, recall and an ordering metric and which is primarily designed for tuning MT systems. PORT does not require external resources and is quick to compute. It has a better correlation with human judgment than BLEU. We compare PORT-tuned MT systems to BLEU-tuned baselines in five experimental conditions involving four language pairs. PORT tuning achieves consistently better performance than BLEU tuning, according to four automated metrics (including BLEU) and to human evaluation: in comparisons of outputs from 300 source sentences, human judges preferred the PORT...

Boxing Chen, Roland Kuhn, Samuel Larkin

Real-time Traffic

ACL 2012 | Computational Linguistics | Evaluation Metrics | Human Judgment | Language Pairs |

claim paper

Added	29 Sep 2012
Updated	29 Sep 2012
Type	Journal
Year	2012
Where	ACL
Authors	Boxing Chen, Roland Kuhn, Samuel Larkin

Sciweavers

PORT: a Precision-Order-Recall MT Evaluation Metric for Tuning

ACL 2012 | Computational Linguistics | Evaluation Metrics | Human Judgment | Language Pairs |

Explore & Download

Productivity Tools

Sciweavers