Bulk-Synchronous On-Line Crawling on Clusters of Computers

15 years 6 months ago

Download research.yahoo.com

This paper describes the design of a crawler devised to perform the periodic retrieval of Web documents for a search engine able to accept on-line updates in a concurrent manner. On-line updates comes in the form of insertions of new documents or update of existing ones, all of them mixed with the usual user queries. The search engine is bulk-synchronous which allows it to deal efﬁciently with the concurrency control problem. The crawler is also bulksynchronous so that it can be integrated into the same Pprocessors cluster executing the search engine. This paper describes and evaluates the practical feasibility of such a crawler.

Mauricio Marín, Carolina Bonacic

Real-time Traffic

Distributed And Parallel Computing | On-line Updates | PDP 2008 | Search Engine | Usual User Queries |

claim paper

Post Info
More Details (n/a)

Added	01 Jun 2010
Updated	01 Jun 2010
Type	Conference
Year	2008
Where	PDP
Authors	Mauricio Marín, Carolina Bonacic

Comments (0)

Sciweavers

Bulk-Synchronous On-Line Crawling on Clusters of Computers

Distributed And Parallel Computing | On-line Updates | PDP 2008 | Search Engine | Usual User Queries |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers