Sciweavers

22 search results - page 1 / 5
» Efficient URL caching for world wide web crawling
Sort
View
WWW
2003
ACM
14 years 5 months ago
Efficient URL caching for world wide web crawling
Crawling the web is deceptively simple: the basic algorithm is (a) Fetch a page (b) Parse it to extract all linked URLs (c) For all the URLs not seen before, repeat (a)?(c). Howev...
Andrei Z. Broder, Marc Najork, Janet L. Wiener
WWW
2001
ACM
14 years 5 months ago
Intelligent crawling on the World Wide Web with arbitrary predicates
The enormous growth of the world wide web in recent years has made it important to perform resource discovery e ciently. Consequently, several new ideas have been proposed in rece...
Charu C. Aggarwal, Fatima Al-Garawi, Philip S. Yu
SIGIR
2003
ACM
13 years 9 months ago
Apoidea: A Decentralized Peer-to-Peer Architecture for Crawling the World Wide Web
This paper describes a decentralized peer-to-peer model for building a Web crawler. Most of the current systems use a centralized client-server model, in which the crawl is done by...
Aameek Singh, Mudhakar Srivatsa, Ling Liu, Todd Mi...
WWW
2002
ACM
14 years 5 months ago
Aliasing on the world wide web: prevalence and performance implications
Aliasing occurs in Web transactions when requests containing different URLs elicit replies containing identical data payloads. Conventional caches associate stored data with URLs ...
Terence Kelly, Jeffrey C. Mogul
SIGCOMM
1996
ACM
13 years 8 months ago
Removal Policies in Network Caches for World-Wide Web Documents
World-Wide Web proxy servers that cache documents can potentially reduce three quantities: the number of requests that reach popular servers, the volume of network trac resulting ...
Marc Abrams, Charles R. Standridge, Ghaleb Abdulla...