Sciweavers

SIGMOD
2004
ACM

When one Sample is not Enough: Improving Text Database Selection Using Shrinkage

14 years 4 months ago
When one Sample is not Enough: Improving Text Database Selection Using Shrinkage
Database selection is an important step when searching over large numbers of distributed text databases. The database selection task relies on statistical summaries of the database contents, which are not typically exported by databases. Previous research has developed algorithms for constructing an approximate content summary of a text database from a small document sample extracted via querying. Unfortunately, Zipf's law practically guarantees that content summaries built this way for any relatively large database will fail to cover many low-frequency words. Incomplete content summaries might negatively affect the database selection process, especially for short queries with infrequent words. To improve the coverage of approximate content summaries, we build on the observation that topically similar databases tend to have related vocabularies. Therefore, the approximate content summaries of topically related databases can complement each other and increase their coverage. Speci...
Panagiotis G. Ipeirotis, Luis Gravano
Added 08 Dec 2009
Updated 08 Dec 2009
Type Conference
Year 2004
Where SIGMOD
Authors Panagiotis G. Ipeirotis, Luis Gravano
Comments (0)