Sciweavers

SIGIR
2000
ACM

Partial collection replication versus caching for information retrieval systems

13 years 8 months ago
Partial collection replication versus caching for information retrieval systems
Abstract The explosion of content in distributed information retrieval (IR) systems requires new mechanisms to attain timely and accurate retrieval of unstructured text. In this paper, we compare two mechanisms to improve IR system performance: partial collection replication and caching. When queries have locality, both mechanisms return results more quickly than sending queries to the original collection(s). Caches return results when queries exactly match a previous one. Partial replicas are a form of caching that return results when the IR technology determines the query is a good match. Caches are simpler and faster, but replicas can increase locality by detecting similarity between queries that are not exactly the same. We use real traces from THOMAS and Excite to measure query locality and similarity. With a very restrictive definition of query similarity, similarity improves query locality up to 15% over exact match. We use a validated simulator to compare their performance, an...
Zhihong Lu, Kathryn S. McKinley
Added 01 Aug 2010
Updated 01 Aug 2010
Type Conference
Year 2000
Where SIGIR
Authors Zhihong Lu, Kathryn S. McKinley
Comments (0)