Sciweavers

SIGMOD
2005
ACM

A Disk-Based Join With Probabilistic Guarantees

14 years 3 months ago
A Disk-Based Join With Probabilistic Guarantees
One of the most common operations in analytic query processing is the application of an aggregate function to the result of a relational join. We describe an algorithm for computing the answer to such a query over large, disk-based input tables. The key innovation of our algorithm is that at all times, it provides an online, statistical estimator for the eventual answer to the query, as well as probabilistic confidence bounds. Thus, a user can monitor the progress of the join throughout its execution and stop the join when satisfied with the estimate's accuracy, or run the algorithm to completion with a total time requirement that is not much longer than other common join algorithms. This contrasts with other online join algorithms, which either do not offer such statistical guarantees or can only offer guarantees so long as the input data can fit into core memory.
Chris Jermaine, Alin Dobra, Subramanian Arumugam,
Added 08 Dec 2009
Updated 08 Dec 2009
Type Conference
Year 2005
Where SIGMOD
Authors Chris Jermaine, Alin Dobra, Subramanian Arumugam, Shantanu Joshi, Abhijit Pol
Comments (0)