Sciweavers

2 search results - page 1 / 1
» Probabilistic near-duplicate detection using simhash
Sort
View
P2P
2010
IEEE
202views Communications» more  P2P 2010»
13 years 3 months ago
Optimizing Near Duplicate Detection for P2P Networks
—In this paper, we propose a probabilistic algorithm for detecting near duplicate text, audio, and video resources efficiently and effectively in large-scale P2P systems. To thi...
Odysseas Papapetrou, Sukriti Ramesh, Stefan Siersd...
CIKM
2011
Springer
12 years 4 months ago
Probabilistic near-duplicate detection using simhash
This paper offers a novel look at using a dimensionalityreduction technique called simhash [8] to detect similar document pairs in large-scale collections. We show that this algo...
Sadhan Sood, Dmitri Loguinov