Search Sciweavers | Sciweavers

299 search results - page 49 / 60

» User-centric Web crawling

159

Voted

CIDR
2009

129views Algorithms» more CIDR 2009»

Extracting and Querying a Comprehensive Web Database

15 years 6 months ago

Download turing.cs.washington.edu

Recent research in domain-independent information extraction holds the promise of an automatically-constructed structured database derived from the Web. A query system based on th...

Michael J. Cafarella

claim paper

Read More »

187

Voted

SIGIR
2008
ACM

176views Information Technology» more SIGIR 2008»

SpotSigs: robust and efficient near duplicate detection in large web collections

15 years 5 months ago

Download ilpubs.stanford.edu

Motivated by our work with political scientists who need to manually analyze large Web archives of news sites, we present SpotSigs, a new algorithm for extracting and matching sig...

Martin Theobald, Jonathan Siddharth, Andreas Paepc...

claim paper

Read More »

142

click to vote

ICIP
2000
IEEE

141views Image Processing» more ICIP 2000»

Efficient Video Similarity Measurement and Search

16 years 6 months ago

Download www.vis.uky.edu

We consider the use of meta-data and/or video-domain methods to detect similar videos on the web. Meta-data is extracted from the textual and hyperlink information associated with...

Sen-Ching S. Cheung, Avideh Zakhor

claim paper

Read More »

147

click to vote

WWW
2004
ACM

179views Internet Technology» more WWW 2004»

Combining link and content analysis to estimate semantic similarity

16 years 6 months ago

Download www.informatics.indiana.edu

Search engines use content and link information to crawl, index, retrieve, and rank Web pages. The correlations between similarity measures based on these cues and on semantic ass...

Filippo Menczer

claim paper

Read More »

111

Voted

CIKM
2009
Springer

121views Information Technology» more CIKM 2009»

Graph-based seed selection for web-scale crawlers

15 years 12 months ago

Download clgiles.ist.psu.edu

One of the most important steps in web crawling is determining the starting points, or seed selection. This paper identiﬁes and explores the problem of seed selection in webscal...

Shuyi Zheng, Pavel Dmitriev, C. Lee Giles

claim paper

Read More »

« Prev « First page 49 / 60 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers