Robust and Efficient Fuzzy Match for Online Data Cleaning

16 years 1 months ago

Download research.microsoft.com

To ensure high data quality, data warehouses must validate and cleanse incoming data tuples from external sources. In many situations, clean tuples must match acceptable tuples in reference tables. For example, product name and description fields in a sales record from a distributor must match the pre-recorded name and description fields in a product reference relation. A significant challenge in such a scenario is to implement an efficient and accurate fuzzy match operation that can effectively clean an incoming tuple if it fails to match exactly with any tuple in the reference relation. In this paper, we propose a new similarity function which overcomes limitations of commonly used similarity functions, and develop an efficient fuzzy match algorithm. We demonstrate the effectiveness of our techniques by evaluating them on real datasets.

Surajit Chaudhuri, Kris Ganjam, Venkatesh Ganti, R

Real-time Traffic

Accurate Fuzzy Match | Database | Efficient Fuzzy Match | Product Reference Relation | SIGMOD 2003 |

claim paper

» Incorporating string transformations in record matching

» A Type2 SelfOrganizing Neural Fuzzy System and Its FPGA Implementation

» Learning metadata from the evidence in an online citation matching scheme

» User Adaptation for Online Sketchy Shape Recognition

» Text joins in an RDBMS for web data integration

» Matchmaking Distributed Resource Management for High Throughput Computing

Post Info
More Details (n/a)

Added	08 Dec 2009
Updated	08 Dec 2009
Type	Conference
Year	2003
Where	SIGMOD
Authors	Surajit Chaudhuri, Kris Ganjam, Venkatesh Ganti, Rajeev Motwani

Comments (0)

Sciweavers

Robust and Efficient Fuzzy Match for Online Data Cleaning

Accurate Fuzzy Match | Database | Efficient Fuzzy Match | Product Reference Relation | SIGMOD 2003 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers