Sciweavers

SIGMOD
2008
ACM

Query-based partitioning of documents and indexes for information lifecycle management

14 years 3 months ago
Query-based partitioning of documents and indexes for information lifecycle management
Regulations require businesses to archive many electronic documents for extended periods of time. Given the sheer volume of documents and the response time requirements, documents that are unlikely to ever be accessed should be stored on an inexpensive device (such as tape), while documents that are likely to be accessed should be placed on a more expensive, higher-performance device. Unfortunately, traditional data partitioning techniques either require substantial manual involvement, or are not suitable for read-rarely workloads. In this paper, we present a novel technique to address this problem. We estimate the future access likelihood for a document based on past workloads of keyword queries and the click-through behavior for top-K query answers, then use this information to drive partitioning decisions. Our overall best scheme, the document-split inverted index, does not require any parameter tuning and yet performs close to the optimal partitioning strategy. Experiments show th...
Soumyadeb Mitra, Marianne Winslett, Windsor W. Hsu
Added 08 Dec 2009
Updated 08 Dec 2009
Type Conference
Year 2008
Where SIGMOD
Authors Soumyadeb Mitra, Marianne Winslett, Windsor W. Hsu
Comments (0)