The problem of biclustering consists of the simultaneous clustering of rows and columns of a matrix such that each of the submatrices induced by a pair of row and column clusters ...
Clustering is a common problem in the analysis of large data sets. Streaming algorithms, which make a single pass over the data set using small working memory and produce a cluster...
There is an increasing quantity of data with uncertainty arising from applications such as sensor network measurements, record linkage, and as output of mining algorithms. This un...
In k-means clustering we are given a set of n data points in d-dimensional space d and an integer k, and the problem is to determine a set of k points in d , called centers, to mi...
Tapas Kanungo, David M. Mount, Nathan S. Netanyahu...
We give the first optimal algorithm for estimating the number of distinct elements in a data stream, closing a long line of theoretical research on this problem begun by Flajolet...