Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

91

ADC
2007
Springer

favoriteEmaildiscussreport

108views Database» more ADC 2007»

Distributed Text Retrieval From Overlapping Collections

15 years 5 months ago

Distributed Text Retrieval From Overlapping Collections

Download crpit.com

In standard text retrieval systems, the documents are gathered and indexed on a single server. In distributed information retrieval (DIR), the documents are held in multiple collections; answers to queries are produced by selecting the collections to query and then merging results from these collections. However, in most prior research in the area, collections are assumed to be disjoint. In this paper, we investigate the eﬀectiveness of diﬀerent combinations of server selection and result merging algorithms in the presence of duplicates. We also test our hash-based method for eﬃciently detecting duplicates and near-duplicates in the lists of documents returned by collections. Our results, based on two diﬀerent designs of test data, indicate that some DIR methods are more likely to return duplicate documents, and show that removing such redundant documents can have a signiﬁcant impact on the ﬁnal search eﬀectiveness.

Milad Shokouhi, Justin Zobel, Yaniv Bernstein

Real-time Traffic

ADC 2007 | Database | Documents | Multiple Collections | Standard Text Retrieval |

claim paper

Related Content

» Improving text collection selection with coverage and overlap statistics

» QVI Querybased virtual index for distributed information retrieval

» Latent space domain transfer between high dimensional overlapping distributions

» A TopicBased Measure of Resource Description Quality for Distributed Information Retrieval

» TileBars Visualization of Term Distribution Information in Full Text Information Access

» Improving collection selection with overlap awareness in P2P search engines

» Controlling overlap in contentoriented XML retrieval

» Indexing Structures Derived from Syntax in TREC3 System Description

» Predicting accuracy of extracting information from unstructured text collections

Post Info
More Details (n/a)

Added	06 Jun 2010
Updated	06 Jun 2010
Type	Conference
Year	2007
Where	ADC
Authors	Milad Shokouhi, Justin Zobel, Yaniv Bernstein

Comments (0)