Clustering is the problem of identifying the distribution of patterns and intrinsic correlations in large data sets by partitioning the data points into similarity classes. This p...
We initiate the study of sparse recovery problems under the Earth-Mover Distance (EMD). Specifically, we design a distribution over m × n matrices A such that for any x, given A...
We study a generalization of the k-median problem with respect to an arbitrary dissimilarity measure D. Given a finite set P of size n, our goal is to find a set C of size k such t...
Clustering methods for data-mining problems must be extremely scalable. In addition, several data mining applications demand that the clusters obtained be balanced, i.e., be of ap...
We prove the existence of small core-sets for solving approximate k-center clustering and related problems. The size of these core-sets is considerably smaller than the previously...