Sciweavers

CIKM
2006
Springer

Finding highly correlated pairs efficiently with powerful pruning

13 years 8 months ago
Finding highly correlated pairs efficiently with powerful pruning
We consider the problem of finding highly correlated pairs in a large data set. That is, given a threshold not too small, we wish to report all the pairs of items (or binary attributes) whose (Pearson) correlation coefficients are greater than the threshold. Correlation analysis is an important step in many statistical and knowledge-discovery tasks. Normally, the number of highly correlated pairs is quite small compared to the total number of pairs. Identifying highly correlated pairs in a naive way by computing the correlation coefficients for all the pairs is wasteful. With massive data sets, where the total number of pairs may exceed the main-memory capacity, the computational cost of the naive method is prohibitive. In their KDD'04 paper [15], Hui Xiong et al. address this problem by proposing the TAPER algorithm. The algorithm goes through the data set in two passes. It uses the first pass to generate a set of candidate pairs whose correlation coefficients are then computed ...
Jian Zhang, Joan Feigenbaum
Added 20 Aug 2010
Updated 20 Aug 2010
Type Conference
Year 2006
Where CIKM
Authors Jian Zhang, Joan Feigenbaum
Comments (0)