Search Sciweavers | Sciweavers

52 search results - page 1 / 11

» Identifying and Filtering Near-Duplicate Documents

click to vote

CPM
2000
Springer

177views Combinatorics» more CPM 2000»

Identifying and Filtering Near-Duplicate Documents

13 years 9 months ago

Download www.cs.princeton.edu

Abstract. The mathematical concept of document resemblance captures well the informal notion of syntactic similarity. The resemblance can be estimated using a ﬁxed size “sketch...

Andrei Z. Broder

claim paper

Read More »

click to vote

SIGIR
2008
ACM

176views Information Technology» more SIGIR 2008»

SpotSigs: robust and efficient near duplicate detection in large web collections

13 years 5 months ago

Download ilpubs.stanford.edu

Motivated by our work with political scientists who need to manually analyze large Web archives of news sites, we present SpotSigs, a new algorithm for extracting and matching sig...

Martin Theobald, Jonathan Siddharth, Andreas Paepc...

claim paper

Read More »

click to vote

SIGIR
2004
ACM

136views Information Technology» more SIGIR 2004»

Constructing a text corpus for inexact duplicate detection

13 years 10 months ago

Download www.conradweb.org

As online document collections continue to expand, both on the Web and in proprietary environments, the need for duplicate detection becomes more critical. The goal of this work i...

Jack G. Conrad, Cindy P. Schriber

claim paper

Read More »

click to vote

ICSE
2011
IEEE-ACM

407views Software Engineering» more ICSE 2011»

StakeSource2.0: using social networks of stakeholders to identify and prioritise requirements

12 years 8 months ago

Download soolinglim.files.wordpress.com

Software projects typically rely on system analysts to conduct requirements elicitation, an approach potentially costly for large projects with many stakeholders and requirements....

Soo Ling Lim, Daniela Damian, Anthony Finkelstein

claim paper

Read More »

click to vote

KDD
2004
ACM

195views Data Mining» more KDD 2004»

Improved robustness of signature-based near-replica detection via lexicon randomization

14 years 5 months ago

Download ir.iit.edu

Detection of near duplicate documents is an important problem in many data mining and information filtering applications. When faced with massive quantities of data, traditional d...

Aleksander Kolcz, Abdur Chowdhury, Joshua Alspecto...

claim paper

Read More »

« Prev « First page 1 / 11 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers