Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

84

KDD
1998
ACM

favoriteEmaildiscussreport

123views Data Mining» more KDD 1998»

Scaling Clustering Algorithms to Large Databases

15 years 3 months ago

Scaling Clustering Algorithms to Large Databases

Download www.aaai.org

Practical clustering algorithms require multiple data scans to achieve convergence. For large databases, these scans become prohibitively expensive. We present a scalable clustering framework applicable to a wide class of iterative clustering. We require at most one scan of the database. In this work, the framework is instantiated and numerically justified with the popular K-Means clustering algorithm. The method is based on identifying regions of the data that are compressible, regions that must be maintained in memory, and regions that are discardable. The algorithm operates within the confines of a limited memory buffer. Empirical results demonstrate that the scalable scheme outperforms a sampling-based approach. In our scheme, data resolution is preserved to the extent possible based upon the size of the allocated memory buffer and the fit of current clustering model to the data. The framework is naturally extended to update multiple clustering models simultaneously. We empiricall...

Paul S. Bradley, Usama M. Fayyad, Cory Reina

Real-time Traffic

Clustering Algorithm | Clustering Models | Data Mining | KDD 1998 | Scalable Clustering Framework |

claim paper

Related Content

» CURE An Efficient Clustering Algorithm for Large Databases

» LargeScale Discovery of Spatially Related Images

» Large scale clustering of protein sequences with FORCE A layout based heuristic for weight...

» GPUaccelerated Chemical Similarity Assessment for Large Scale Databases

» Genes Themes and Microarrays Using Information Retrieval for LargeScale Gene Analysis

» Inducing Gazetteers for Named Entity Recognition by LargeScale Clustering of Dependency Re...

» PBIRCH A Scalable Parallel Clustering algorithm for Incremental Data

» A DistributionBased Clustering Algorithm for Mining in Large Spatial Databases

» Design and analysis of a multidimensional data sampling service for large scale data analy...

Post Info
More Details (n/a)

Added	06 Aug 2010
Updated	06 Aug 2010
Type	Conference
Year	1998
Where	KDD
Authors	Paul S. Bradley, Usama M. Fayyad, Cory Reina

Comments (0)