LCA-based selection for XML document collections

16 years 2 months ago

Download www.cs.uoi.gr

In this paper, we address the problem of database selection for XML document collections, that is, given a set of collections and a user query, how to rank the collections based on their goodness to the query. Goodness is determined by the relevance of the documents in the collection to the query. We consider keyword queries and support Lowest Common Ancestor (LCA) semantics for deﬁning query results, where the relevance of each document to a query is determined by properties of the LCA of those nodes in the XML document that contain the query keywords. To avoid evaluating queries against each document in a collection, we propose maintaining in a preprocessing phase, information about the LCAs of all pairs of keywords in a document and use it to approximate the properties of the LCA-based results of a query. To improve storage and processing eﬃciency, we use appropriate summaries of the LCA information based on Bloom ﬁlters. We address both a boolean and a weighted version of th...

Georgia Koloniari, Evaggelia Pitoura

Real-time Traffic