Bulk-Synchronous On-Line Crawling on Clusters of Computers

11 years 7 months ago
Bulk-Synchronous On-Line Crawling on Clusters of Computers
This paper describes the design of a crawler devised to perform the periodic retrieval of Web documents for a search engine able to accept on-line updates in a concurrent manner. On-line updates comes in the form of insertions of new documents or update of existing ones, all of them mixed with the usual user queries. The search engine is bulk-synchronous which allows it to deal efficiently with the concurrency control problem. The crawler is also bulksynchronous so that it can be integrated into the same Pprocessors cluster executing the search engine. This paper describes and evaluates the practical feasibility of such a crawler.
Mauricio Marín, Carolina Bonacic
Added 01 Jun 2010
Updated 01 Jun 2010
Type Conference
Year 2008
Where PDP
Authors Mauricio Marín, Carolina Bonacic
Comments (0)