In this report, we provide a summary1 of the First Int'l VLDB Workshop on Clean Databases (CleanDB 2006), which took place at Seoul, Korea, on September 11, 2006, in conjunct...
The paper presents a data cleansing technique for string databases. We propose and evaluate an algorithm that identifies a group of strings that consists of (multiple) occurrence...
We consider the Entity Resolution (ER) problem (also known as deduplication, or merge-purge), in which records determined to represent the same real-world entity are successively ...
David Menestrina, Omar Benjelloun, Hector Garcia-M...
We propose the Multiple Join Path (MJP) framework for obtaining high quality information by linking fields across multiple databases, when the underlying databases have poor qual...