In this work we tackle the open problem of self-join size (SJS) estimation in a large-scale Distributed Data System, where tuples of a relation are distributed over data nodes whic...
Abstract. Estimating the sizes of query results, and intermediate results, is crucial to many aspects of query processing. In particular, it is necessary for effective query optimi...
We present a new technique for using samples to estimate join cardinalities. This technique, which we term "end-biased samples," is inspired by recent work in network tr...
Join techniques deploying approximate match predicates are fundamental data cleaning operations. A variety of predicates have been utilized to quantify approximate match in such o...
Sudipto Guha, Nick Koudas, Divesh Srivastava, Xiao...