The typical task of unsupervised learning is to organize data, for example into clusters, typically disjoint clusters (eg. the K-means algorithm). One would expect (for example) a...
Mark K. Goldberg, Mykola Hayvanovych, Malik Magdon...
In this paper, we present a general guideline to find a better distance measure for similarity estimation based on statistical analysis of distribution models and distance function...
Jie Yu, Jaume Amores, Nicu Sebe, Petia Radeva, Qi ...
The problem of identifying approximately duplicate records in databases is an essential step for data cleaning and data integration processes. Most existing approaches have relied...
—The measure of similarity normally utilized in statistical signal processing is based on second order moments. In this paper, we reveal the probabilistic meaning of correntropy ...
It is well known that many hard tasks considered in machine learning and data mining can be solved in an rather simple and robust way with an instance- and distance-based approach....