MapReduce is a computing paradigm that has gained a lot of attention in recent years from industry and research. Unlike parallel DBMSs, MapReduce allows non-expert users to run co...
Entity Resolution (ER) is the process of identifying groups of records that refer to the same real-world entity. Various measures (e.g., pairwise F1, cluster F1) have been used fo...
David Menestrina, Steven Whang, Hector Garcia-Moli...
An increasing amount of personal data is automatically gathered and stored on servers by administrations, hospitals, insurance companies, etc. Citizen themselves often count on in...
Tristan Allard, Nicolas Anciaux, Luc Bouganim, Yan...
Recently, there has been increasing interest in extending relational query processing to include data obtained from unstructured sources. A common approach is to use stand-alone I...
Daisy Zhe Wang, Michael J. Franklin, Minos N. Garo...
Data ambiguity is inherent in applications such as data integration, location-based services, and sensor monitoring. In many situations, it is possible to “clean”, or remove, ...
Reynold Cheng, Eric Lo, Xuan Yang, Ming-Hay Luk, X...