The increasing prominence of data streams arising in a wide range of advanced applications such as fraud detection and trend learning has led to the study of online mining of freq...
Most information extraction (IE) approaches have considered only static text corpora, over which we apply IE only once. Many real-world text corpora however are dynamic. They evol...
Fei Chen 0002, Byron J. Gao, AnHai Doan, Jun Yang ...
This paper introduces the Tuple Graph (TuG) synopses, a new class of data summaries that enable accurate selectivity estimates for complex relational queries. The proposed summari...
—Sampling is used as a universal method to reduce the running time of computations – the computation is performed on a much smaller sample and then the result is scaled to comp...
The similarity join is an important operation for mining high-dimensional feature spaces. Given two data sets, the similarity join computes all tuples (x, y) that are within a dis...