Sciweavers

48 search results - page 1 / 10
» Collection statistics for fast duplicate document detection
Sort
View
TOIS
2002
51views more  TOIS 2002»
13 years 5 months ago
Collection statistics for fast duplicate document detection
Abdur Chowdhury, Ophir Frieder, David A. Grossman,...
CIKM
2003
Springer
13 years 10 months ago
Online duplicate document detection: signature reliability in a dynamic retrieval environment
As online document collections continue to expand, both on the Web and in proprietary environments, the need for duplicate detection becomes more critical. Few users wish to retri...
Jack G. Conrad, Xi S. Guo, Cindy P. Schriber
LREC
2008
130views Education» more  LREC 2008»
13 years 6 months ago
Detecting Co-Derivative Documents in Large Text Collections
We have analyzed the SPEX algorithm by Bernstein and Zobel (2004) for detecting co-derivative documents using duplicate n-grams. Although we totally agree with the claim that not ...
Jan Pomikálek, Pavel Rychlý
ADC
2007
Springer
108views Database» more  ADC 2007»
13 years 11 months ago
Distributed Text Retrieval From Overlapping Collections
In standard text retrieval systems, the documents are gathered and indexed on a single server. In distributed information retrieval (DIR), the documents are held in multiple colle...
Milad Shokouhi, Justin Zobel, Yaniv Bernstein
SIGIR
2004
ACM
13 years 10 months ago
Constructing a text corpus for inexact duplicate detection
As online document collections continue to expand, both on the Web and in proprietary environments, the need for duplicate detection becomes more critical. The goal of this work i...
Jack G. Conrad, Cindy P. Schriber