Sciweavers

SIGIR
2003
ACM

Apoidea: A Decentralized Peer-to-Peer Architecture for Crawling the World Wide Web

13 years 9 months ago
Apoidea: A Decentralized Peer-to-Peer Architecture for Crawling the World Wide Web
This paper describes a decentralized peer-to-peer model for building a Web crawler. Most of the current systems use a centralized client-server model, in which the crawl is done by one or more tightly coupled machines, but the distribution of the crawling jobs and the collection of crawled results are managed in a centralized system using a centralized URL repository. Centralized solutions are known to have problems like link congestion, being a single point of failure, and expensive administration. It requires both horizontal and vertical scalability solutions to manage Network File Systems (NFS) and load balancing DNS and HTTP requests. In this paper, we present an architecture of a completely distributed and decentralized Peer-to-Peer (P2P) crawler called Apoidea, which is self-managing and uses geographical proximity of the web resources to the peers for a better and faster crawl. We use Distributed Hash Table (DHT) based protocols to perform the critical URL-duplicate and content-...
Aameek Singh, Mudhakar Srivatsa, Ling Liu, Todd Mi
Added 05 Jul 2010
Updated 05 Jul 2010
Type Conference
Year 2003
Where SIGIR
Authors Aameek Singh, Mudhakar Srivatsa, Ling Liu, Todd Miller
Comments (0)