Sciweavers

2237 search results - page 1 / 448
» Architectural design and evaluation of an efficient Web-craw...
Sort
View
WWW
2003
ACM
14 years 5 months ago
Efficient URL caching for world wide web crawling
Crawling the web is deceptively simple: the basic algorithm is (a) Fetch a page (b) Parse it to extract all linked URLs (c) For all the URLs not seen before, repeat (a)?(c). Howev...
Andrei Z. Broder, Marc Najork, Janet L. Wiener
PVLDB
2008
124views more  PVLDB 2008»
13 years 4 months ago
Google's Deep Web crawl
The Deep Web, i.e., content hidden behind HTML forms, has long been acknowledged as a significant gap in search engine coverage. Since it represents a large portion of the structu...
Jayant Madhavan, David Ko, Lucja Kot, Vignesh Gana...
WWW
2007
ACM
14 years 5 months ago
Detecting near-duplicates for web crawling
Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrele...
Gurmeet Singh Manku, Arvind Jain, Anish Das Sarma
WWW
2005
ACM
14 years 5 months ago
User-centric Web crawling
Search engines are the primary gateways of information access on the Web today. Behind the scenes, search engines crawl the Web to populate a local indexed repository of Web pages...
Sandeep Pandey, Christopher Olston
JSS
2002
138views more  JSS 2002»
13 years 4 months ago
Architectural design and evaluation of an efficient Web-crawling system
This paper presents an architectural design and evaluation result of an efficient Web-crawling system. The design involves a fully distributed architecture, a URL allocating algor...
Hongfei Yan, Jianyong Wang, Xiaoming Li, Lin Guo