Sciweavers

ICDAR
1999
IEEE

Models and Algorithms for Duplicate Document Detection

13 years 8 months ago
Models and Algorithms for Duplicate Document Detection
This paper introduces a framework for clarifying and formalizing the duplicate document detection problem. Four distinct models are presented, each with a corresponding algorithm for its solution derived from the realm of approximate string matching. The robustness of these techniques is demonstrated through a set of experiments using data reflecting real-world degradation effects.
Daniel P. Lopresti
Added 03 Aug 2010
Updated 03 Aug 2010
Type Conference
Year 1999
Where ICDAR
Authors Daniel P. Lopresti
Comments (0)