A typical tabular business report contains a set of cells. The cells may contain raw numeric values, character labels, and formulas. This paper will present a bottom-up algorithm f...
Existing methods for single document summarization usually make use of only the information contained in the specified document. This paper proposes the technique of document expa...
This paper presents a dynamic approach to document page segmentation based on inter-component relationships and their local features. State-of-the art page segmentation algorithms...
We present our hybrid system for the PAN challenge at CLEF 2010. Our system performs plagiarism detection for translated and non-translated externally as well as intrinsically plag...
Markus Muhr, Roman Kern, Mario Zechner, Michael Gr...
The graph-based ranking algorithm has been recently exploited for multi-document summarization by making only use of the sentence-to-sentence relationships in the documents, under...