Sciweavers

WWW
2007
ACM
14 years 4 months ago
Detecting near-duplicates for web crawling
Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrele...
Gurmeet Singh Manku, Arvind Jain, Anish Das Sarma