Many data-management applications require integrating data from a variety of sources, where different sources may refer to the same real-world entity in different ways and some ma...
Correlation Clustering was defined by Bansal, Blum, and Chawla as the problem of clustering a set of elements based on a possibly inconsistent binary similarity function between e...
Distance functions are an important component in many learning applications. However, the correct function is context dependent, therefore it is advantageous to learn a distance f...
There is an increasing quantity of data with uncertainty arising from applications such as sensor network measurements, record linkage, and as output of mining algorithms. This un...
This paper gives an overview of two middleware systems that have been developed over the last 6 years to address the challenges involved in developing parallel and distributed imp...