Central to a data cleaning system are record matching and data repairing. Matching aims to identify tuples that refer to the same real-world object, and repairing is to make a dat...
Wenfei Fan, Jianzhong Li, Shuai Ma, Nan Tang, Weny...
Abstract--With the growing computer networks, accessible data is becoming increasing distributed. Understanding and integrating remote and unfamiliar data sources are important dat...
Violations of functional dependencies (FDs) are common in practice, often arising in the context of data integration or Web data extraction. Resolving these violations is known to...
A string similarity join finds similar pairs between two collections of strings. It is an essential operation in many applications, such as data integration and cleaning, and has ...
— Uncertainties in data arise for a number of reasons: when the data set is incomplete, contains conflicting information or has been deliberately perturbed or coarsened to remov...
Graham Cormode, Divesh Srivastava, Entong Shen, Ti...