Top-k queries on uncertain data: on score distribution and typical answers

16 years 4 months ago

Download db.csail.mit.edu

Uncertain data arises in a number of domains, including data integration and sensor networks. Top-k queries that rank results according to some user-defined score are an important tool for exploring large uncertain data sets. As several recent papers have observed, the semantics of top-k queries on uncertain data can be ambiguous due to tradeoffs between reporting high-scoring tuples and tuples with a high probability of being in the resulting data set. In this paper, we demonstrate the need to present the score distribution of top-k vectors to allow the user to choose between results along this score-probability dimensions. One option would be to display the complete distribution of all potential top-k tuple vectors, but this set is too large to compute. Instead, we propose to provide a number of typical vectors that effectively sample this distribution. We propose efficient algorithms to compute these vectors. We also extend the semantics and algorithms to the scenario of score ties...

Tingjian Ge, Stanley B. Zdonik, Samuel Madden

Real-time Traffic

Database | Large Uncertain Data | SIGMOD 2009 | Uncertain Data | Uncertain Data Arises |

claim paper

» Optimizing scoring functions and indexes for proximity search in typeannotated corpora

» Supporting topk join queries in relational databases

» Probabilistic Ranking Queries on Gaussians

Post Info
More Details (n/a)

Added	05 Dec 2009
Updated	05 Dec 2009
Type	Conference
Year	2009
Where	SIGMOD
Authors	Tingjian Ge, Stanley B. Zdonik, Samuel Madden

Comments (0)

Sciweavers

Top-k queries on uncertain data: on score distribution and typical answers

Database | Large Uncertain Data | SIGMOD 2009 | Uncertain Data | Uncertain Data Arises |

Explore & Download

Productivity Tools

Sciweavers