The task of identifying redundant information in documents that are generated from multiple sources provides a significant challenge for summarization and QA systems. Traditional ...
The Dirichlet compound multinomial (DCM) distribution, also called the multivariate Polya distribution, is a model for text documents that takes into account burstiness: the fact ...
Fast changing knowledge on the Internet can be acquired more efficiently with the help of automatic document summarization and updating techniques. This paper describes a novel app...
The structure of the web is increasingly being used to improve organization, search, and analysis of information on the web. For example, Google uses the text in citing documents ...
Eric J. Glover, Kostas Tsioutsiouliklis, Steve Law...
In this paper, we present a method for structuring a document according to the information present in its Table of Contents. The detection of the ToC as well as the determination ...