Sciweavers

PDP
2008
IEEE

Bulk-Synchronous On-Line Crawling on Clusters of Computers

13 years 10 months ago
Bulk-Synchronous On-Line Crawling on Clusters of Computers
This paper describes the design of a crawler devised to perform the periodic retrieval of Web documents for a search engine able to accept on-line updates in a concurrent manner. On-line updates comes in the form of insertions of new documents or update of existing ones, all of them mixed with the usual user queries. The search engine is bulk-synchronous which allows it to deal efficiently with the concurrency control problem. The crawler is also bulksynchronous so that it can be integrated into the same Pprocessors cluster executing the search engine. This paper describes and evaluates the practical feasibility of such a crawler.
Mauricio Marín, Carolina Bonacic
Added 01 Jun 2010
Updated 01 Jun 2010
Type Conference
Year 2008
Where PDP
Authors Mauricio Marín, Carolina Bonacic
Comments (0)