Practical clustering algorithms require multiple data scans to achieve convergence. For large databases, these scans become prohibitively expensive. We present a scalable clusteri...
This paper offers a local distributed algorithm for expectation maximization in large peer-to-peer environments. The algorithm can be used for a variety of well-known data mining...
In this paper, we propose GAD (General Activity Detection) for fast clustering on large scale data. Within this framework we design a set of algorithms for different scenarios: (...
Jiawei Han, Liangliang Cao, Sangkyum Kim, Xin Jin,...
In the current trend of software engineering, software systems are viewed as clusters of overlapping structures representing various concerns, covering heterogeneous artifacts lik...
Both public and private organizations have been accumulating large volumes of electronically available text documents for the past years. However, to turn text archives into profi...