Distributing Google

12 years 28 days ago
Distributing Google
We consider the problem of wide-area large-scale text search over a peer-to-peer infrastructure. A wide-area search infrastructure with billions of documents and millions of search terms presents unique challenges in terms of the amount of state that must be maintained and updated. Distributing such a system would require tens of thousands of hosts leading to the usual problems associated with node failures, churn and data migration. Localities inherent in query patterns will cause load imbalances and hot spots that can severely impair performance. In this paper, we describe an architecture for constructing a scalable search infrastructure that is designed to cope with the challenges of scale described above. Our architecture consists of a data store layer which is used to reliably store and recompute indexes over a slow timescale and a caching layer that is used to respond to most queries. Our primary insight is that the problem of efficiently retrieving a small number of relevant r...
Vijay Gopalakrishnan, Bobby Bhattacharjee, Peter J
Added 11 Jun 2010
Updated 11 Jun 2010
Type Conference
Year 2006
Where ICDE
Authors Vijay Gopalakrishnan, Bobby Bhattacharjee, Peter J. Keleher
Comments (0)