Sciweavers

2 search results - page 1 / 1
» Probabilistic near-duplicate detection using simhash
Sort
View
P2P
2010
IEEE
202views Communications» more  P2P 2010»
14 years 10 months ago
Optimizing Near Duplicate Detection for P2P Networks
—In this paper, we propose a probabilistic algorithm for detecting near duplicate text, audio, and video resources efficiently and effectively in large-scale P2P systems. To thi...
Odysseas Papapetrou, Sukriti Ramesh, Stefan Siersd...
CIKM
2011
Springer
14 years 8 days ago
Probabilistic near-duplicate detection using simhash
This paper offers a novel look at using a dimensionalityreduction technique called simhash [8] to detect similar document pairs in large-scale collections. We show that this algo...
Sadhan Sood, Dmitri Loguinov