Reducing Storage Costs for Federated Search of Text Databases

9 years 3 months ago
Reducing Storage Costs for Federated Search of Text Databases
In environments containing many text search engines a federated search system provides people with a single point of access. When search engines are managed by independent organizations two key problems are discovering and representing the contents of each text database. Query-based sampling is a recent technique for discovering the contents of uncooperative databases so as to create database resource descriptions that support a variety of necessary capabilities. However, when the documents obtained by query-based sampling are very long, as is common in some government environments, disk storage costs can be surprisingly large. This paper investigates methods of pruning sampled documents to reduce storage costs. The experimental results demonstrate that disk storage costs can be reduced by 54-93% while causing only minor losses in federated search accuracy.
Jie Lu, Jamie Callan
Added 31 Oct 2010
Updated 31 Oct 2010
Type Conference
Year 2003
Where DGO
Authors Jie Lu, Jamie Callan
Comments (0)