Sciweavers

260 search results - page 16 / 52
» Industry-scale duplicate detection
Sort
View
SIGMOD
2010
ACM
269views Database» more  SIGMOD 2010»
14 years 10 months ago
MapDupReducer: detecting near duplicates over massive datasets
Categories and Subject Descriptors General Terms Keywords
Chaokun Wang, Jianmin Wang, Xuemin Lin, Wei Wang, ...
74
Voted
TOIS
2002
51views more  TOIS 2002»
14 years 9 months ago
Collection statistics for fast duplicate document detection
Abdur Chowdhury, Ophir Frieder, David A. Grossman,...
VLDB
2002
ACM
110views Database» more  VLDB 2002»
14 years 9 months ago
Eliminating Fuzzy Duplicates in Data Warehouses
The duplicate elimination problem of detecting multiple tuples, which describe the same real world entity, is an important data cleaning problem. Previous domain independent solut...
Rohit Ananthakrishna, Surajit Chaudhuri, Venkatesh...
DEXA
2004
Springer
147views Database» more  DEXA 2004»
15 years 3 months ago
A Flexible Fuzzy Expert System for Fuzzy Duplicate Elimination in Data Cleaning
Data cleaning deals with the detection and removal of errors and inconsistencies in data, gathered from distributed sources. This process is essential for drawing correct conclusio...
Hamid Haidarian Shahri, Ahmad Abdollahzadeh Barfor...