It has long been noted that many data mining algorithms can be built on top of join algorithms. This has lead to a wealth of recent work on efficiently supporting such joins with ...
Lexiang Ye, Xiaoyue Wang, Dragomir Yankov, Eamonn ...
Abstract—Most objects and data in the real world are interconnected, forming complex, heterogeneous but often semistructured information networks. However, many database research...
Distance function computation is a key subtask in many data mining algorithms and applications. The most effective form of the distance function can only be expressed in the conte...
Skewed distributions appear very often in practice. Unfortunately, the traditional Zipf distribution often fails to model them well. In this paper, we propose a new probability di...
Query optimization in data integration requires source coverage and overlap statistics. Gathering and storing the required statistics presents many challenges, not the least of wh...