Search Sciweavers | Sciweavers

103 search results - page 1 / 21

» Models and Algorithms for Duplicate Document Detection

click to vote

ICDAR
1999
IEEE

118views Document Analysis» more ICDAR 1999»

Models and Algorithms for Duplicate Document Detection

13 years 9 months ago

Download www.cse.lehigh.edu

This paper introduces a framework for clarifying and formalizing the duplicate document detection problem. Four distinct models are presented, each with a corresponding algorithm ...

Daniel P. Lopresti

claim paper

Read More »

click to vote

SIGMOD
2005
ACM

119views Database» more SIGMOD 2005»

DogmatiX Tracks down Duplicates in XML

14 years 5 months ago

Download www.hpi.uni-potsdam.de

Duplicate detection is the problem of detecting different entries in a data source representing the same real-world entity. While research abounds in the realm of duplicate detect...

Melanie Weis, Felix Naumann

claim paper

Read More »

click to vote

LREC
2008

130views Education» more LREC 2008»

Detecting Co-Derivative Documents in Large Text Collections

13 years 6 months ago

Download www.lrec-conf.org

We have analyzed the SPEX algorithm by Bernstein and Zobel (2004) for detecting co-derivative documents using duplicate n-grams. Although we totally agree with the claim that not ...

Jan Pomikálek, Pavel Rychlý

claim paper

Read More »

click to vote

SIGIR
2004
ACM

136views Information Technology» more SIGIR 2004»

Constructing a text corpus for inexact duplicate detection

13 years 10 months ago

Download www.conradweb.org

As online document collections continue to expand, both on the Web and in proprietary environments, the need for duplicate detection becomes more critical. The goal of this work i...

Jack G. Conrad, Cindy P. Schriber

claim paper

Read More »

click to vote

ICDE
2010
IEEE

204views Database» more ICDE 2010»

ProbClean: A probabilistic duplicate detection system

13 years 11 months ago

Download www.cs.uwaterloo.ca

— One of the most prominent data quality problems is the existence of duplicate records. Current data cleaning systems usually produce one clean instance (repair) of the input da...

George Beskales, Mohamed A. Soliman, Ihab F. Ilyas...

claim paper

Read More »

« Prev « First page 1 / 21 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers