Efficient Processing of Distributed Iceberg Semi-joins

13 years 8 months ago

Download cs.stanford.edu

The Iceberg SemiJoin (ISJ) of two datasets R and S returns the tuples in R which join with at least k tuples of S. The ISJ operator is essential in many practical applications including OLAP, Data Mining and Information Retrieval. In this paper we consider the distributed evaluation of Iceberg SemiJoins, where R and S reside on remote servers. We developed an efficient algorithm which employs Bloom filters. The novelty of our approach is that we interleave the evaluation of the Iceberg set in server S with the pruning of unmatched tuples in server R. Therefore, we are able to (i) eliminate unnecessary tuples early, and (ii) extract accurate Bloom filters from the intermediate hash tables which are constructed during the generation of the Iceberg set. Compared to conventional two-phase approaches, our experiments demonstrate that our method transmits up to 80% less data through the network, while reducing the disk I/O cost.

Mohammed Kasim Imthiyaz, Dong Xiaoan, Panos Kalnis

Real-time Traffic

Bloom Filters | Database | DEXA 2004 | Iceberg Semijoins | Iceberg Set |

claim paper

Post Info
More Details (n/a)

Added	20 Aug 2010
Updated	20 Aug 2010
Type	Conference
Year	2004
Where	DEXA
Authors	Mohammed Kasim Imthiyaz, Dong Xiaoan, Panos Kalnis

Comments (0)

Sciweavers

Efficient Processing of Distributed Iceberg Semi-joins

Bloom Filters | Database | DEXA 2004 | Iceberg Semijoins | Iceberg Set |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers