Sciweavers

TOIS
2008

Classification-aware hidden-web text database selection

13 years 3 months ago
Classification-aware hidden-web text database selection
Many valuable text databases on the web have non-crawlable contents that are "hidden" behind search interfaces. Metasearchers are helpful tools for searching over multiple such "hidden-web" text databases at once through a unified query interface. An important step in the metasearching process is database selection, or determining which databases are the most relevant for a given user query. The state-ofthe-art database selection techniques rely on statistical summaries of the database contents, generally including the database vocabulary and the associated word frequencies. Unfortunately, hidden-web text databases typically do not export such summaries, so previous research has developed algorithms for constructing approximate content summaries from document samples extracted from the databases via querying. We present a novel "focused probing" sampling algorithm that detects the topics covered in a database and adaptively extracts documents that are rep...
Panagiotis G. Ipeirotis, Luis Gravano
Added 15 Dec 2010
Updated 15 Dec 2010
Type Journal
Year 2008
Where TOIS
Authors Panagiotis G. Ipeirotis, Luis Gravano
Comments (0)