Search Sciweavers | Sciweavers

103 search results - page 2 / 21

» Models and Algorithms for Duplicate Document Detection

click to vote

ADC
2007
Springer

108views Database» more ADC 2007»

Distributed Text Retrieval From Overlapping Collections

13 years 11 months ago

Download crpit.com

In standard text retrieval systems, the documents are gathered and indexed on a single server. In distributed information retrieval (DIR), the documents are held in multiple colle...

Milad Shokouhi, Justin Zobel, Yaniv Bernstein

claim paper

Read More »

click to vote

SIGIR
2008
ACM

176views Information Technology» more SIGIR 2008»

SpotSigs: robust and efficient near duplicate detection in large web collections

13 years 5 months ago

Download ilpubs.stanford.edu

Motivated by our work with political scientists who need to manually analyze large Web archives of news sites, we present SpotSigs, a new algorithm for extracting and matching sig...

Martin Theobald, Jonathan Siddharth, Andreas Paepc...

claim paper

Read More »

click to vote

OOPSLA
2005
Springer

203views Security Privacy» more OOPSLA 2005»

SDD: high performance code clone detection system for large scale source code

13 years 10 months ago

Download www.cs.toronto.edu

Code clones in software increase maintenance cost and lower software quality. We have devised a new algorithm to detect duplicated parts of source code in large software. Our algo...

Seunghak Lee, Iryoung Jeong

claim paper

Read More »

click to vote

DGO
2006

134views Education» more DGO 2006»

Next steps in near-duplicate detection for eRulemaking

13 years 6 months ago

Download www.cs.cmu.edu

Large volume public comment campaigns and web portals that encourage the public to customize form letters produce many near-duplicate documents, which increases processing and sto...

Hui Yang, Jamie Callan, Stuart W. Shulman

claim paper

Read More »

click to vote

SIGIR
2010
ACM

169views Information Technology» more SIGIR 2010»

Efficient partial-duplicate detection based on sequence matching

13 years 22 hour ago

Download homepage.fudan.edu.cn

With the ever-increasing growth of the Internet, numerous copies of documents become serious problem for search engine, opinion mining and many other web applications. Since parti...

Qi Zhang, Yue Zhang, Haomin Yu, Xuanjing Huang

claim paper

Read More »

« Prev « First page 2 / 21 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers