Exploring Correlation Between ROUGE and Human Evaluation on Meeting Summaries

13 years 2 months ago

Download www.hlt.utdallas.edu

Abstract—Automatic summarization evaluation is very important to the development of summarization systems. In text summarization, ROUGE has been shown to correlate well with human evaluation when measuring match of content units. However, there are many characteristics of the multiparty meeting domain, which may pose potential problems to ROUGE. The goal of this paper is to examine how well the ROUGE scores correlate with human evaluation for extractive meeting summarization, and explore different meeting domain speciﬁc factors that have an impact on the correlation. More analysis than those in our previous work [1] has been conducted in this study. Our experiments show that generally the correlation between ROUGE and human evaluation is not great; however, when accounting for several unique meeting characteristics, such as disﬂuencies, speaker information, and stopwords in the ROUGE setting, better correlation can be achieved, especially on the system summaries. We also found th...

Feifan Liu, Yang Liu

Real-time Traffic