Search Sciweavers | Sciweavers

139 search results - page 2 / 28

» An Approach to Identify Duplicated Web Pages

click to vote

KDD
2006
ACM

185views Data Mining» more KDD 2006»

Understanding Content Reuse on the Web: Static and Dynamic Analyses

14 years 5 months ago

Download homepages.dcc.ufmg.br

Abstract. In this paper we present static and dynamic studies of duplicate and near-duplicate documents in the Web. The static and dynamic studies involve the analysis of similar c...

Ricardo A. Baeza-Yates, Álvaro R. Pereira J...

claim paper

Read More »

click to vote

CPM
2000
Springer

177views Combinatorics» more CPM 2000»

Identifying and Filtering Near-Duplicate Documents

13 years 9 months ago

Download www.cs.princeton.edu

Abstract. The mathematical concept of document resemblance captures well the informal notion of syntactic similarity. The resemblance can be estimated using a ﬁxed size “sketch...

Andrei Z. Broder

claim paper

Read More »

click to vote

SIGIR
2008
ACM

176views Information Technology» more SIGIR 2008»

SpotSigs: robust and efficient near duplicate detection in large web collections

13 years 5 months ago

Download ilpubs.stanford.edu

Motivated by our work with political scientists who need to manually analyze large Web archives of news sites, we present SpotSigs, a new algorithm for extracting and matching sig...

Martin Theobald, Jonathan Siddharth, Andreas Paepc...

claim paper

Read More »

click to vote

IEAAIE
2003
Springer

164views Artificial Intelligence» more IEAAIE 2003»

Applying Semantic Links for Classifying Web Pages

13 years 10 months ago

Download www2.latech.edu

Automatic hypertext classification is an essential technique for organizing vast amount of Internet Web pages or HTML documents. One the of problems in classifying Web pages is tha...

Ben Choi, Qing Guo

claim paper

Read More »

click to vote

ECAI
2000
Springer

200views Artificial Intelligence» more ECAI 2000»

An Instance-based Approach for Identifying Candidate Ontology Relations within a Multi-Agent System

13 years 9 months ago

Download sunsite.informatik.rwth-aachen.de

Discovering related concepts in a multi-agent system among agents with diverse ontologies is difﬁcult using existing knowledge representation languages and approaches. We describ...

Andrew B. Williams, Costas Tsatsoulis

claim paper

Read More »

« Prev « First page 2 / 28 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers