Sciweavers

Share
380 search results - page 1 / 76
» Probabilistic Data Generation for Deduplication and Data Lin...
Sort
View
IDEAL
2005
Springer
10 years 3 months ago
Probabilistic Data Generation for Deduplication and Data Linkage
Abstract. In many data mining projects the data to be analysed contains personal information, like names and addresses. Cleaning and preprocessing of such data likely involves dedu...
Peter Christen
KDD
2008
ACM
176views Data Mining» more  KDD 2008»
10 years 10 months ago
Febrl -: an open source data cleaning, deduplication and record linkage system with a graphical user interface
Matching records that refer to the same entity across databases is becoming an increasingly important part of many data mining projects, as often data from multiple sources needs ...
Peter Christen
PVLDB
2010
98views more  PVLDB 2010»
9 years 8 months ago
On-the-Fly Entity-Aware Query Processing in the Presence of Linkage
Entity linkage is central to almost every data integration and data cleaning scenario. Traditional techniques use some computed similarity among data structure to perform merges a...
Ekaterini Ioannou, Wolfgang Nejdl, Claudia Nieder&...
BTW
2015
Springer
23views Database» more  BTW 2015»
4 years 6 months ago
Ddup - towards a deduplication framework utilising apache spark
: This paper is about a new framework called DeduPlication (DduP). DduP aims to solve large scale deduplication problems on arbitrary data tuples. DduP tries to bridge the gap betw...
Niklas Wilcke
DMKD
2004
ACM
139views Data Mining» more  DMKD 2004»
10 years 3 months ago
Iterative record linkage for cleaning and integration
Record linkage, the problem of determining when two records refer to the same entity, has applications for both data cleaning (deduplication) and for integrating data from multipl...
Indrajit Bhattacharya, Lise Getoor
books