Automatic Data Fusion with HumMer

11 years 11 months ago
Automatic Data Fusion with HumMer
Heterogeneous and dirty data is abundant. It is stored under different, often opaque schemata, it represents identical real-world objects multiple times, causing duplicates, and it has missing values and conflicting values. The Humboldt Merger (HumMer) is a tool that allows ad-hoc, declarative fusion of such data using a simple extension to SQL. Guided by a query against multiple tables, HumMer proceeds in three fully automated steps: First, instance-based schema matching bridges schematic heterogeneity of the tables by aligning corresponding attributes. Next, duplicate detection techniques find multiple representations of identical real-world objects. Finally, data fusion and conflict resolution merges duplicates into a single, consistent, and clean representation. 1 Fusing Heterogeneous, Duplicate, and Conflicting Data The task of fusing data involves the solution of many different problems, each one in itself formidable: Apart from the technical challenges of accessing remote...
Alexander Bilke, Jens Bleiholder, Christoph Bö
Added 28 Jun 2010
Updated 28 Jun 2010
Type Conference
Year 2005
Where VLDB
Authors Alexander Bilke, Jens Bleiholder, Christoph Böhm, Karsten Draba, Felix Naumann, Melanie Weis
Comments (0)