A Hybrid Hierarchical Model for Multi-Document Summarization

10 years 11 months ago
A Hybrid Hierarchical Model for Multi-Document Summarization
Scoring sentences in documents given abstract summaries created by humans is important in extractive multi-document summarization. In this paper, we formulate extractive summarization as a two step learning problem building a generative model for pattern discovery and a regression model for inference. We calculate scores for sentences in document clusters based on their latent characteristics using a hierarchical topic model. Then, using these scores, we train a regression model based on the lexical and structural characteristics of the sentences, and use the model to score sentences of new documents to form a summary. Our system advances current state-of-the-art improving ROUGE scores by 7%. Generated summaries are less redundant and more coherent based upon manual quality evaluations.
Asli Çelikyilmaz, Dilek Hakkani-Tur
Added 10 Feb 2011
Updated 10 Feb 2011
Type Journal
Year 2010
Where ACL
Authors Asli Çelikyilmaz, Dilek Hakkani-Tur
Comments (0)