Generalized Projected Clustering in High-Dimensional Data Streams

10 years 6 months ago
Generalized Projected Clustering in High-Dimensional Data Streams
Clustering is to identify densely populated subgroups in data, while correlation analysis is to find the dependency between the attributes of the data set. In this paper, we combine the two techniques in the domain of data streams, i.e. dense subgroup of data points sharing strong correlation. Such correlation connected cluster [11] is meaningful in many areas, e.g., in E-business, the positive correlations indicate sets of similar purchase patterns. However, detecting such clusters in streaming data is difficult: In high-dimensional streams, the inherent sparsity means that the correlations are local for subgroups; the correlation itself can be of arbitrarily complex direction, that is a set of attributes are dependent on another set. We present a novel method ACID to overcome these problems in detecting correlation connected clusters in data streams. The method incorporates principal component analysis (PCA), streaming cluster feature vectors (SCF), and SCF-Tree (a variant of CFTree)...
Ting Wang
Added 20 Aug 2010
Updated 20 Aug 2010
Type Conference
Year 2006
Authors Ting Wang
Comments (0)