Sciweavers

241 search results - page 10 / 49
» Detecting Co-Derivative Documents in Large Text Collections
Sort
View
DRR
2008
14 years 11 months ago
Segmentation-based retrieval of document images from diverse collections
We describe a methodology for retrieving document images from large extremely diverse collections. First we perform content extraction, that is the location and measurement of reg...
Michael A. Moll, Henry S. Baird
DGO
2006
134views Education» more  DGO 2006»
14 years 11 months ago
Next steps in near-duplicate detection for eRulemaking
Large volume public comment campaigns and web portals that encourage the public to customize form letters produce many near-duplicate documents, which increases processing and sto...
Hui Yang, Jamie Callan, Stuart W. Shulman
77
Voted
QSIC
2007
IEEE
15 years 3 months ago
Automatic Quality Assessment of SRS Text by Means of a Decision-Tree-Based Text Classifier
The success of a software project is largely dependent upon the quality of the Software Requirements Specification (SRS) document, which serves as a medium to communicate user req...
Ishrar Hussain, Olga Ormandjieva, Leila Kosseim
SIGIR
2008
ACM
14 years 9 months ago
SpotSigs: robust and efficient near duplicate detection in large web collections
Motivated by our work with political scientists who need to manually analyze large Web archives of news sites, we present SpotSigs, a new algorithm for extracting and matching sig...
Martin Theobald, Jonathan Siddharth, Andreas Paepc...
COLING
2000
14 years 11 months ago
Experiments in Automated Lexicon Building for Text Searching
This paper describes experiments in the automatic construction of lexicons that would be useful in searching large document collections for text fragments that address a specific ...
Barry Schiffman, Kathleen McKeown