Search Sciweavers | Sciweavers

19

SIGIR
2008
ACM

176views Information Technology» more SIGIR 2008»

SpotSigs: robust and efficient near duplicate detection in large web collections

13 years 4 months ago

Motivated by our work with political scientists who need to manually analyze large Web archives of news sites, we present SpotSigs, a new algorithm for extracting and matching sig...

Martin Theobald, Jonathan Siddharth, Andreas Paepc...

claim paper

Read More »

8

click to vote

LAWEB
2003
IEEE

96views Internet Technology» more LAWEB 2003»

On the Evolution of Clusters of Near-Duplicate Web Pages

13 years 9 months ago

Download research.microsoft.com

This paper expands on a 1997 study of the amount and distribution of near-duplicate pages on the World Wide Web. We downloaded a set of 150 million web pages on a weekly basis ove...

Dennis Fetterly, Mark Manasse, Marc Najork

claim paper

Read More »

23

click to vote

WWW
2008
ACM

189views Internet Technology» more WWW 2008»

Detecting image spam using visual features and near duplicate detection

14 years 5 months ago

Download www2008.org

Email spam is a much studied topic, but even though current email spam detecting software has been gaining a competitive edge against text based email spam, new advances in spam g...

Bhaskar Mehta, Saurabh Nangia, Manish Gupta 0002, ...

claim paper

Read More »

9

click to vote

ICMCS
2007
IEEE

149views Multimedia» more ICMCS 2007»

SICO: A System for Detection of Near-Duplicate Images During Search

13 years 10 months ago

Download goanna.cs.rmit.edu.au

Duplicate and near-duplicate digital image matching is beneﬁcial for image search in terms of collection management, digital content protection, and search efﬁciency. In this ...

Jun Jie Foo, Ranjan Sinha, Justin Zobel

claim paper

Read More »

18

click to vote

WWW
2008
ACM

214views Internet Technology» more WWW 2008»

14 years 5 months ago

Efficient similarity joins for near duplicate detection

Download www2008.org

With the increasing amount of data and the need to integrate data from multiple data sources, a challenging issue is to find near duplicate records efficiently. In this paper, we ...

Chuan Xiao, Wei Wang 0011, Xuemin Lin, Jeffrey Xu ...

claim paper

Read More »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers