Sciweavers

241 search results - page 4 / 49
» Detecting Co-Derivative Documents in Large Text Collections
Sort
View
CHI
1997
ACM
15 years 1 months ago
Computational Models of Information Scent-Following in a Very Large Browsable Text Collection
An ecological-cognitive framework of analysis and a model-tracing architecture are presented and used in the analysis of data recorded from users browsing a large document collect...
Peter Pirolli
SIGIR
2004
ACM
15 years 3 months ago
Constructing a text corpus for inexact duplicate detection
As online document collections continue to expand, both on the Web and in proprietary environments, the need for duplicate detection becomes more critical. The goal of this work i...
Jack G. Conrad, Cindy P. Schriber
SIGIR
2010
ACM
14 years 4 months ago
Efficient partial-duplicate detection based on sequence matching
With the ever-increasing growth of the Internet, numerous copies of documents become serious problem for search engine, opinion mining and many other web applications. Since parti...
Qi Zhang, Yue Zhang, Haomin Yu, Xuanjing Huang
ECIR
2009
Springer
15 years 6 months ago
Revisiting N-Gram Based Models for Retrieval in Degraded Large Collections
The traditional retrieval models based on term matching are not effective in collections of degraded documents (output of OCR or ASR systems for instance). This paper presents a n...
Javier Parapar, Ana Freire, Alvaro Barreiro
82
Voted
CIKM
2011
Springer
13 years 9 months ago
Partial duplicate detection for large book collections
A framework is presented for discovering partial duplicates in large collections of scanned books with optical character recognition (OCR) errors. Each book in the collection is r...
Ismet Zeki Yalniz, Ethem F. Can, R. Manmatha