Sampling is a widely used technique to increase efficiency in database and data mining applications operating on large dataset. In this paper we present a scalable sampling imple...
We present a parallel version of BIRCH with the objective of enhancing the scalability without compromising on the quality of clustering. The incoming data is distributed in a cyc...
A wide variety of machine learning problems can be described as minimizing a regularized risk functional, with different algorithms using different notions of risk and different r...
Choon Hui Teo, Alex J. Smola, S. V. N. Vishwanatha...
The scalability problem in data mining involves the development of methods for handling large databases with limited computational resources. In this paper, we present a two-phase...