Sciweavers

178 search results - page 2 / 36
» Scheduling Algorithms for Web Crawling
Sort
View
STOC
2002
ACM
95views Algorithms» more  STOC 2002»
14 years 5 months ago
Crawling on web graphs
Colin Cooper, Alan M. Frieze
WWW
2006
ACM
14 years 6 months ago
Effective web-scale crawling through website analysis
The web crawler space is often delimited into two general areas: full-web crawling and focused crawling. We present netSifter, a crawler system which integrates features from thes...
Iván Gonzlez, Adam Marcus 0002, Daniel N. M...
IADIS
2004
13 years 6 months ago
Crawling the client-side hidden web
There is a great amount of information on the web that can not be accessed by conventional crawler engines. This portion of the web is usually called hidden web data. To be able t...
Manuel Álvarez, Alberto Pan, Juan Raposo, &...
WWW
2004
ACM
14 years 6 months ago
Distributed community crawling
The massive distribution of the crawling task can lead to inefficient exploration of the same portion of the Web. We propose a technique to guide crawlers exploration based on the...
Fabrizio Costa, Paolo Frasconi
WWW
2007
ACM
14 years 6 months ago
Detecting near-duplicates for web crawling
Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrele...
Gurmeet Singh Manku, Arvind Jain, Anish Das Sarma