Improving text collection selection with coverage and overlap statistics

10 years 8 months ago
Improving text collection selection with coverage and overlap statistics
In an environment of distributed text collections, the first step in the information retrieval process is to identify which of all available collections are more relevant to a given query and which should thus be accessed to answer the query. We address the challenge of collection selection when there is full or partial overlap between the available text collections, a scenario which has not been examined previously despite its real-world applications. To that end, we present COSCO, a collection selection approach which uses collection-specific coverage and overlap statistics. We describe our experimental results which show that the presented approach displays the desired behavior of retrieving more new results early on in the collection order, and performs consistently and significantly better than CORI, previously considered to be one of the best collection selection systems. Poster recommended by WWW2005 Program Committee. Categories and Subject Descriptors: H.3.3 [Information Stor...
Thomas Hernandez, Subbarao Kambhampati
Added 22 Nov 2009
Updated 22 Nov 2009
Type Conference
Year 2005
Where WWW
Authors Thomas Hernandez, Subbarao Kambhampati
Comments (0)