Latent dirichlet allocation based multi-document summarization

13 years 2 months ago
Latent dirichlet allocation based multi-document summarization
Extraction based Multi-Document Summarization Algorithms consist of choosing sentences from the documents using some weighting mechanism and combining them into a summary. In this article we use Latent Dirichlet Allocation to capture the events being covered by the documents and form the summary with sentences representing these different events. Our approach is distinguished from existing approaches in that we use mixture models to capture the topics and pick up the sentences without paying attention to the details of grammar and structure of the documents. Finally we present the evaluation of the algorithms on the DUC 2002 Corpus multi-document summarization tasks using the ROUGE evaluator to evaluate the summaries. Compared to DUC 2002 winners, our algorithms gave significantly better ROUGE1 recall measures. Categories and Subject Descriptors I.2.7 [Artificial Intelligence]: Natural Language Processing--Text analysis,Multi-Document Summarization Keywords Latent Dirichlet Allocation...
Rachit Arora, Balaraman Ravindran
Added 15 Dec 2010
Updated 15 Dec 2010
Type Journal
Year 2008
Authors Rachit Arora, Balaraman Ravindran
Comments (0)