Because of the complexity of documents and the variety of applications which must be supported, document understanding requires the integration of image understanding with text un...
Suzanne Liebowitz Taylor, Deborah A. Dahl, Mark Li...
The larger amount of information on the Web is stored in document databases and is not indexed by general-purpose search engines (i.e., Google and Yahoo). Such information is dyna...
Yih-Ling Hedley, Muhammad Younas, Anne E. James, M...
— This paper presents a novel method of generating extractive summaries for multiple documents. Given a cluster of documents, we firstly construct a graph where each vertex repre...
In this paper we present the World-Wide Web Wrapper Factory (W4F), a Java toolkit to generate wrappers for Web data sources. Some key features of W4F are an expressive language to...
Successful applications of digital libraries require structured access to sources of information. This paper presents an approach to extract the logical structure of text document...