Sciweavers

2 search results - page 1 / 1
» Probabilistic near-duplicate detection using simhash
Sort
View
87
Voted
P2P
2010
IEEE
202views Communications» more  P2P 2010»
14 years 10 months ago
Optimizing Near Duplicate Detection for P2P Networks
—In this paper, we propose a probabilistic algorithm for detecting near duplicate text, audio, and video resources efficiently and effectively in large-scale P2P systems. To thi...
Odysseas Papapetrou, Sukriti Ramesh, Stefan Siersd...
CIKM
2011
Springer
13 years 11 months ago
Probabilistic near-duplicate detection using simhash
This paper offers a novel look at using a dimensionalityreduction technique called simhash [8] to detect similar document pairs in large-scale collections. We show that this algo...
Sadhan Sood, Dmitri Loguinov