Estimating the number of distinct values is a wellstudied problem, due to its frequent occurrence in queries and its importance in selecting good query plans. Previous work has sh...
The result size of a query that involves multiple attributes from the same relation depends on these attributes’ joint data distribution,i.e., the frequencies of all combination...
Join techniques deploying approximate match predicates are fundamental data cleaning operations. A variety of predicates have been utilized to quantify approximate match in such o...
Sudipto Guha, Nick Koudas, Divesh Srivastava, Xiao...
We introduce a benchmark called TEXTURE (TEXT Under RElations) to measure the relative strengths and weaknesses of combining text processing with a relational workload in an RDBMS...
Vuk Ercegovac, David J. DeWitt, Raghu Ramakrishnan
Text documents often embed data that is structured in nature, and we can expose this structured data using information extraction technology. By processing a text database with inf...