Duplicate detection in probabilistic data

15 years 6 months ago

Download eprints.eemcs.utwente.nl

Abstract— Collected data often contains uncertainties. Probabilistic databases have been proposed to manage uncertain data. To combine data from multiple autonomous probabilistic databases, an integration of probabilistic data has to be performed. Until now, however, data integration approaches have focused on the integration of certain source data (relational or XML). There is no work on the integration of uncertain (esp. probabilistic) source data so far. In this paper, we present a ﬁrst step towards a concise consolidation of probabilistic data. We focus on duplicate detection as a representative and essential step in an integration process. We present techniques for identifying multiple probabilistic representations of the same real-world entities. Furthermore, for increasing the efﬁciency of the duplicate detection process we introduce search space reduction methods adapted to probabilistic data.

Fabian Panse, Maurice van Keulen, Ander de Keijzer

Real-time Traffic

Database | ICDE 2010 | Probabilistic | Probabilistic Data | Probabilistic Databases |

claim paper

» Optimizing Near Duplicate Detection for P2P Networks

» A hitmiss model for duplicate detection in the WHO drug safety database

» Efficient SemanticAware Detection of Near Duplicate Resources

» Clean Answers over Dirty Databases A Probabilistic Approach

» DogmatiX Tracks down Duplicates in XML

» Visual Detection of Duplicated Code

» Matching Algorithms within a Duplicate Detection System

» Scaling up duplicate detection in graph data

Post Info
More Details (n/a)

Added	06 Dec 2010
Updated	06 Dec 2010
Type	Conference
Year	2010
Where	ICDE
Authors	Fabian Panse, Maurice van Keulen, Ander de Keijzer, Norbert Ritter

Comments (0)

Sciweavers

Duplicate detection in probabilistic data

Database | ICDE 2010 | Probabilistic | Probabilistic Data | Probabilistic Databases |

Explore & Download

Productivity Tools

Sciweavers