We classify data quality problems that are addressed by data cleaning and provide an overview of the main solution approaches. Data cleaning is especially required when integratin...
Efficient and accurate data cleaning is an essential task for the successful deployment of RFID systems. Although important advances have been made in tag detection rates, it is s...
In this paper we analyze a very large junk e-mail corpus which was generated by a hundred thousand volunteer users of the Hotmail e-mail service. We describe how the corpus is bei...
Geoff Hulten, Joshua T. Goodman, Robert Rounthwait...
Matching dependencies (MDs) are used to declaratively specify the identification (or matching) of certain attribute values in pairs of database tuples when some similarity conditi...
Jaffer Gardezi, Leopoldo E. Bertossi, Iluju Kiring...
Email can be considered as a virtual working environment in which users are constantly struggling to manage the vast amount of exchanged data. Although most of this data belongs t...
Simon Scerri, Gerhard Gossen, Brian Davis, Siegfri...