Sciweavers

TASLP
2010

Exploring Correlation Between ROUGE and Human Evaluation on Meeting Summaries

13 years 2 months ago
Exploring Correlation Between ROUGE and Human Evaluation on Meeting Summaries
Abstract—Automatic summarization evaluation is very important to the development of summarization systems. In text summarization, ROUGE has been shown to correlate well with human evaluation when measuring match of content units. However, there are many characteristics of the multiparty meeting domain, which may pose potential problems to ROUGE. The goal of this paper is to examine how well the ROUGE scores correlate with human evaluation for extractive meeting summarization, and explore different meeting domain specific factors that have an impact on the correlation. More analysis than those in our previous work [1] has been conducted in this study. Our experiments show that generally the correlation between ROUGE and human evaluation is not great; however, when accounting for several unique meeting characteristics, such as disfluencies, speaker information, and stopwords in the ROUGE setting, better correlation can be achieved, especially on the system summaries. We also found th...
Feifan Liu, Yang Liu
Added 30 Jan 2011
Updated 30 Jan 2011
Type Journal
Year 2010
Where TASLP
Authors Feifan Liu, Yang Liu
Comments (0)