With the growing demand on cluster analysis for categorical data, a handful of categorical clustering algorithms have been developed. Surprisingly, to our knowledge, none has sati...
Duplicate detection is the problem of detecting different entries in a data source representing the same real-world entity. While research abounds in the realm of duplicate detect...
The appropriate choice of a method for imputation of missing data becomes especially important when the fraction of missing values is large and the data are of mixed type. The prop...
Vadim V. Ayuyev, Joseph Jupin, Philip W. Harris, Z...
While scalable data mining methods are expected to cope with massive Web data, coping with evolving trends in noisy data in a continuous fashion, and without any unnecessary stopp...
We consider the task of summarizing a cluster of related sentences with a short sentence which we call multi-sentence compression and present a simple approach based on shortest p...