Sciweavers

ACL
2009

Correlating Human and Automatic Evaluation of a German Surface Realiser

13 years 2 months ago
Correlating Human and Automatic Evaluation of a German Surface Realiser
We examine correlations between native speaker judgements on automatically generated German text against automatic evaluation metrics. We look at a number of metrics from the MT and Summarisation communities and find that for a relative ranking task, most automatic metrics perform equally well and have fairly strong correlations to the human judgements. In contrast, on a naturalness judgement task, the General Text Matcher (GTM) tool correlates best overall, although in general, correlation between the human judgements and the automatic metrics was quite weak.
Aoife Cahill
Added 16 Feb 2011
Updated 16 Feb 2011
Type Journal
Year 2009
Where ACL
Authors Aoife Cahill
Comments (0)