Sciweavers

174 search results - page 1 / 35
» On Finding Duplication and Near-Duplication in Large Softwar...
Sort
View
WCRE
1995
IEEE
13 years 8 months ago
On Finding Duplication and Near-Duplication in Large Software Systems
This paper describes how a program called dup can be used to locate instances of duplication or nearduplication in a software system. D u p reports both textually identical sectio...
Brenda S. Baker
ICPR
2010
IEEE
13 years 2 months ago
Beyond "Near Duplicates": Learning Hash Codes for Efficient Similar-Image Retrieval
Finding similar images in a large database is an important, but often computationally expensive, task. In this paper, we present a two-tier similar-image retrieval system with the...
Shumeet Baluja, Michele Covell
ICPR
2010
IEEE
13 years 8 months ago
Beyond "Near-Duplicates": Learning Hash Codes for Efficient Similar-Image Retrieval
Finding similar images in a large database is an important, but often computationally expensive, task. In this paper, we present a two-tier similar-image retrieval system with the...
Shumeet Baluja, Michele Covell
SIGIR
2008
ACM
13 years 4 months ago
SpotSigs: robust and efficient near duplicate detection in large web collections
Motivated by our work with political scientists who need to manually analyze large Web archives of news sites, we present SpotSigs, a new algorithm for extracting and matching sig...
Martin Theobald, Jonathan Siddharth, Andreas Paepc...
ICCS
2009
Springer
13 years 11 months ago
Frequent Itemset Mining for Clustering Near Duplicate Web Documents
A vast amount of documents in the Web have duplicates, which is a challenge for developing efficient methods that would compute clusters of similar documents. In this paper we use ...
Dmitry I. Ignatov, Sergei O. Kuznetsov