Previews and overviews of large, heterogeneous information resources help users comprehend the scope of collections and focus on particular subsets of interest. For narrative docu...
In standard text retrieval systems, the documents are gathered and indexed on a single server. In distributed information retrieval (DIR), the documents are held in multiple colle...
Large-scale digitization projects aimed at periodicals often have as input streams of completely unlabeled document images. In such situations, the results produced by the automat...
Iuliu Vasile Konya, Christoph Seibert, Sebastian G...
A hierarchical browsing interface to a document collection can be constructed by identifying the phrases that recur in the full text of the documents and structuring them into a h...
Supporting entity extraction from large document collections is important for enabling a variety of important data analysis tasks. In this paper, we introduce the "ad-hoc&quo...