Search Sciweavers | Sciweavers

563 search results - page 1 / 113

» Crawling the web for structured documents

click to vote

CIKM
2010
Springer

166views Information Technology» more CIKM 2010»

Crawling the web for structured documents

13 years 1 months ago

Download www.mendeley.com

Structured Information Retrieval is gaining a lot of interest in recent years, as this kind of information is becoming an invaluable asset for professional communities such as Sof...

Julián Urbano, Juan Loréns, Yorgos A...

claim paper

Read More »

click to vote

ICDM
2008
IEEE

186views Data Mining» more ICDM 2008»

xCrawl: A High-Recall Crawling Method for Web Mining

13 years 11 months ago

Download ls13-www.cs.uni-dortmund.de

Web Mining Systems exploit the redundancy of data published on the Web to automatically extract information from existing web documents. The ﬁrst step in the Information Extract...

Kostyantyn M. Shchekotykhin, Dietmar Jannach, Gerh...

claim paper

Read More »

click to vote

WWW
2007
ACM

162views Internet Technology» more WWW 2007»

Detecting near-duplicates for web crawling

14 years 5 months ago

Download infolab.stanford.edu

Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrele...

Gurmeet Singh Manku, Arvind Jain, Anish Das Sarma

claim paper

Read More »

click to vote

WWW
2007
ACM

110views Internet Technology» more WWW 2007»

Random web crawls

14 years 5 months ago

Download www2007.org

This paper proposes a random Web crawl model. A Web crawl is a (biased and partial) image of the Web. This paper deals with the hyperlink structure, i.e. a Web crawl is a graph, w...

Toufik Bennouas, Fabien de Montgolfier

claim paper

Read More »

click to vote

ADBIS
2003
Springer

173views Database» more ADBIS 2003»

UCYMICRA: Distributed Indexing of the Web Using Migrating Crawlers

13 years 9 months ago

Download www.l3s.de

Due to the tremendous increase rate and the high change frequency of Web documents, maintaining an up-to-date index for searching purposes (search engines) is becoming a challenge....

Odysseas Papapetrou, Stavros Papastavrou, George S...

claim paper

Read More »

« Prev « First page 1 / 113 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers