Sciweavers

VLDB
2002
ACM

Distributed Search over the Hidden Web: Hierarchical Database Sampling and Selection

13 years 4 months ago
Distributed Search over the Hidden Web: Hierarchical Database Sampling and Selection
Many valuable text databases on the web have non-crawlable contents that are "hidden" behind search interfaces. Metasearchers are helpful tools for searching over many such databases at once through a unified query interface. A critical task for a metasearcher to process a query efficiently and effectively is the selection of the most promising databases for the query, a task that typically relies on statistical summaries of the database contents. Unfortunately, webaccessible text databases do not generally export content summaries. In this paper, we present an algorithm to derive content summaries from "uncooperative" databases by using "focused query probes," which adaptively zoom in on and extract documents that are representative of the topic coverage of the databases. Our content summaries are the first to include absolute document frequency estimates for the database words. We also present a novel database selection algorithm that exploits both the ...
Panagiotis G. Ipeirotis, Luis Gravano
Added 23 Dec 2010
Updated 23 Dec 2010
Type Journal
Year 2002
Where VLDB
Authors Panagiotis G. Ipeirotis, Luis Gravano
Comments (0)