Most clustering algorithms are partitional in nature, assigning each data point to exactly one cluster. However, several real world datasets have inherently overlapping clusters i...
Current outlier detection schemes typically output a numeric score representing the degree to which a given observation is an outlier. We argue that converting the scores into wel...
Data mining is frequently obstructed by privacy concerns. In many cases data is distributed, and bringing the data together in one place for analysis is not possible due to privac...
Entity resolution is the problem of determining which records in a database refer to the same entities, and is a crucial and expensive step in the data mining process. Interest in...
The secure multi-party computation (SMC) model provides means for balancing the use and confidentiality of distributed data. This is especially important in the field of privacy...