Abstract. We propose a new class of distance measures (metrics) designed for multisets, both of which are a recurrent theme in many data mining applications. One particular instanc...
Huge amount of gene expression data have been generated as a result of the human genomic project. Clustering has been used extensively in mining these gene expression data to find...
We address the problem of similarity metric selection in pairwise affinity clustering. Traditional techniques employ standard algebraic context-independent sample-distance measur...
Although most time-series data mining research has concentrated on providing solutions for a single distance function, in this work we motivate the need for a single index structu...
We initiate the study of sparse recovery problems under the Earth-Mover Distance (EMD). Specifically, we design a distribution over m × n matrices A such that for any x, given A...