Sciweavers

4660 search results - page 525 / 932
» Learning from imperfect data
Sort
View
SIGIR
2003
ACM
15 years 9 months ago
ReCoM: reinforcement clustering of multi-type interrelated data objects
Most existing clustering algorithms cluster highly related data objects such as Web pages and Web users separately. The interrelation among different types of data objects is eith...
Jidong Wang, Hua-Jun Zeng, Zheng Chen, Hongjun Lu,...
KDD
2008
ACM
176views Data Mining» more  KDD 2008»
16 years 4 months ago
Febrl -: an open source data cleaning, deduplication and record linkage system with a graphical user interface
Matching records that refer to the same entity across databases is becoming an increasingly important part of many data mining projects, as often data from multiple sources needs ...
Peter Christen
147
Voted
SDM
2008
SIAM
177views Data Mining» more  SDM 2008»
15 years 5 months ago
Practical Private Computation and Zero-Knowledge Tools for Privacy-Preserving Distributed Data Mining
In this paper we explore private computation built on vector addition and its applications in privacypreserving data mining. Vector addition is a surprisingly general tool for imp...
Yitao Duan, John F. Canny
KDD
2005
ACM
125views Data Mining» more  KDD 2005»
16 years 4 months ago
Email data cleaning
Addressed in this paper is the issue of `email data cleaning' for text mining. Many text mining applications need take emails as input. Email data is usually noisy and thus i...
Jie Tang, Hang Li, Yunbo Cao, ZhaoHui Tang
KDD
1999
ACM
128views Data Mining» more  KDD 1999»
15 years 8 months ago
Towards Automated Synthesis of Data Mining Programs
Code synthesis is routinely used in industry to generate GUIs, form lling applications, and database support code and is even used with COBOL. In this paper we consider the questi...
Wray L. Buntine, Bernd Fischer 0002, Thomas Pressb...