Sciweavers

363 search results - page 1 / 73
» Probabilistic Data Generation for Deduplication and Data Lin...
Sort
View
IDEAL
2005
Springer
13 years 10 months ago
Probabilistic Data Generation for Deduplication and Data Linkage
Abstract. In many data mining projects the data to be analysed contains personal information, like names and addresses. Cleaning and preprocessing of such data likely involves dedu...
Peter Christen
KDD
2008
ACM
176views Data Mining» more  KDD 2008»
14 years 4 months ago
Febrl -: an open source data cleaning, deduplication and record linkage system with a graphical user interface
Matching records that refer to the same entity across databases is becoming an increasingly important part of many data mining projects, as often data from multiple sources needs ...
Peter Christen
PVLDB
2010
98views more  PVLDB 2010»
13 years 2 months ago
On-the-Fly Entity-Aware Query Processing in the Presence of Linkage
Entity linkage is central to almost every data integration and data cleaning scenario. Traditional techniques use some computed similarity among data structure to perform merges a...
Ekaterini Ioannou, Wolfgang Nejdl, Claudia Nieder&...
EDBT
2012
ACM
224views Database» more  EDBT 2012»
11 years 7 months ago
Aggregate queries on probabilistic record linkages
Record linkage analysis, which matches records referring to the same real world entities from different data sets, is an important task in data integration. Uncertainty often exi...
Ming Hua, Jian Pei
DMKD
2004
ACM
139views Data Mining» more  DMKD 2004»
13 years 10 months ago
Iterative record linkage for cleaning and integration
Record linkage, the problem of determining when two records refer to the same entity, has applications for both data cleaning (deduplication) and for integrating data from multipl...
Indrajit Bhattacharya, Lise Getoor