In this paper we describe a Cross Document Summarizer XDoX designed specifically to summarize large document sets (50-500 documents and more). Such sets of documents are typically...
In this paper, we report on our experience with the creation of an automated, human-assisted process to extract metadata from documents in a large (>100,000), dynamically growi...
Jianfeng Tang, Kurt Maly, Steven J. Zeil, Mohammad...
Query-independent features (also called document priors), such as the number of incoming links to a document, its Page-Rank, or the type of its associated URL, have been successfu...
—A method for locating mathematical expressions in document images without the use of optical character recognition is presented. An index of document regions is produced from re...
Desktop search is an important part of personal information management (PIM). However, research in this area has been limited by the lack of shareable test collections, making cum...