Sciweavers

241 search results - page 2 / 49
» Detecting Co-Derivative Documents in Large Text Collections
Sort
View
SIGIR
2002
ACM
13 years 4 months ago
Detecting and Browsing Events in Unstructured text
Previews and overviews of large, heterogeneous information resources help users comprehend the scope of collections and focus on particular subsets of interest. For narrative docu...
David A. Smith
ADC
2007
Springer
108views Database» more  ADC 2007»
13 years 11 months ago
Distributed Text Retrieval From Overlapping Collections
In standard text retrieval systems, the documents are gathered and indexed on a single server. In distributed information retrieval (DIR), the documents are held in multiple colle...
Milad Shokouhi, Justin Zobel, Yaniv Bernstein
ICPR
2008
IEEE
13 years 11 months ago
A robust front page detection algorithm for large periodical collections
Large-scale digitization projects aimed at periodicals often have as input streams of completely unlabeled document images. In such situations, the results produced by the automat...
Iuliu Vasile Konya, Christoph Seibert, Sebastian G...
ERCIMDL
2001
Springer
132views Education» more  ERCIMDL 2001»
13 years 9 months ago
A Combined Phrase and Thesaurus Browser for Large Document Collections
A hierarchical browsing interface to a document collection can be constructed by identifying the phrases that recur in the full text of the documents and structuring them into a h...
Gordon W. Paynter, Ian H. Witten
PVLDB
2008
85views more  PVLDB 2008»
13 years 4 months ago
Scalable ad-hoc entity extraction from text collections
Supporting entity extraction from large document collections is important for enabling a variety of important data analysis tasks. In this paper, we introduce the "ad-hoc&quo...
Sanjay Agrawal, Kaushik Chakrabarti, Surajit Chaud...