Sciweavers

26 search results - page 1 / 6
» Partial duplicate detection for large book collections
Sort
View
CIKM
2011
Springer
12 years 5 months ago
Partial duplicate detection for large book collections
A framework is presented for discovering partial duplicates in large collections of scanned books with optical character recognition (OCR) errors. Each book in the collection is r...
Ismet Zeki Yalniz, Ethem F. Can, R. Manmatha
LREC
2008
130views Education» more  LREC 2008»
13 years 6 months ago
Detecting Co-Derivative Documents in Large Text Collections
We have analyzed the SPEX algorithm by Bernstein and Zobel (2004) for detecting co-derivative documents using duplicate n-grams. Although we totally agree with the claim that not ...
Jan Pomikálek, Pavel Rychlý
SIGIR
2008
ACM
13 years 4 months ago
SpotSigs: robust and efficient near duplicate detection in large web collections
Motivated by our work with political scientists who need to manually analyze large Web archives of news sites, we present SpotSigs, a new algorithm for extracting and matching sig...
Martin Theobald, Jonathan Siddharth, Andreas Paepc...
SIGIR
2010
ACM
12 years 11 months ago
Efficient partial-duplicate detection based on sequence matching
With the ever-increasing growth of the Internet, numerous copies of documents become serious problem for search engine, opinion mining and many other web applications. Since parti...
Qi Zhang, Yue Zhang, Haomin Yu, Xuanjing Huang
ICMCS
2006
IEEE
188views Multimedia» more  ICMCS 2006»
13 years 11 months ago
Large-Scale Duplicate Detection for Web Image Search
Finding visually identical images in large image collections is important for many applications such as intelligence propriety protection and search result presentation. Several a...
Bin Wang, Zhiwei Li, Mingjing Li, Wei-Ying Ma