Join queries on uncertain data: Semantics and efficient processing

8 years 11 months ago
Join queries on uncertain data: Semantics and efficient processing
— Uncertain data is quite common nowadays in a variety of modern database applications. At the same time, the join operation is one of the most important but expensive operations in SQL. However, join queries on uncertain data have not been adequately addressed thus far. In this paper, we study the SQL join operation on uncertain attributes. We observe and formalize two kinds of join operations on such data, namely vjoin and d-join. They are each useful for different applications. Using probability theory, we then devise efficient query processing algorithms for these join operations. Specifically, we use probability bounds that are based on the moments of random variables to either early accept or early reject a candidate v-join result tuple. We also devise an indexing mechanism and an algorithm called Two-End Zigzag Join to further save I/O costs. For d-join, we first observe that it can be reduced to a special form of similarity join in a multidimensional space. We then design an ...
Tingjian Ge
Added 21 Aug 2011
Updated 21 Aug 2011
Type Journal
Year 2011
Where ICDE
Authors Tingjian Ge
Comments (0)