Sciweavers

722 search results - page 2 / 145
» Data Cleaning: Problems and Current Approaches
Sort
View
KDD
2005
ACM
125views Data Mining» more  KDD 2005»
14 years 5 months ago
Email data cleaning
Addressed in this paper is the issue of `email data cleaning' for text mining. Many text mining applications need take emails as input. Email data is usually noisy and thus i...
Jie Tang, Hang Li, Yunbo Cao, ZhaoHui Tang
ICDE
2010
IEEE
204views Database» more  ICDE 2010»
14 years 10 days ago
ProbClean: A probabilistic duplicate detection system
— One of the most prominent data quality problems is the existence of duplicate records. Current data cleaning systems usually produce one clean instance (repair) of the input da...
George Beskales, Mohamed A. Soliman, Ihab F. Ilyas...
DEXA
2004
Springer
147views Database» more  DEXA 2004»
13 years 11 months ago
A Flexible Fuzzy Expert System for Fuzzy Duplicate Elimination in Data Cleaning
Data cleaning deals with the detection and removal of errors and inconsistencies in data, gathered from distributed sources. This process is essential for drawing correct conclusio...
Hamid Haidarian Shahri, Ahmad Abdollahzadeh Barfor...
KDD
2008
ACM
135views Data Mining» more  KDD 2008»
14 years 5 months ago
DiMaC: a disguised missing data cleaning tool
In some applications such as filling in a customer information form on the web, some missing values may not be explicitly represented as such, but instead appear as potentially va...
Ming Hua, Jian Pei
SIGMOD
2008
ACM
167views Database» more  SIGMOD 2008»
14 years 5 months ago
DiMaC: a system for cleaning disguised missing data
In some applications such as filling in a customer information form on the web, some missing values may not be explicitly represented as such, but instead appear as potentially va...
Ming Hua, Jian Pei