Sciweavers

SIGMOD
2010
ACM

Fast approximate correlation for massive time-series data

13 years 9 months ago
Fast approximate correlation for massive time-series data
We consider the problem of computing all-pair correlations in a warehouse containing a large number (e.g., tens of thousands) of time-series (or, signals). The problem arises in automatic discovery of patterns and anomalies in data intensive applications such as data center management, environmental monitoring, and scientific experiments. However, with existing techniques, solving the problem for a large stream warehouse is extremely expensive, due to the problem’s inherent quadratic I/O and CPU complexities. We propose novel algorithms, based on Discrete Fourier Transformation (DFT) and graph partitioning, to reduce the end-to-end response time of an all-pair correlation query. To minimize I/O cost, we partition a massive set of input signals into smaller batches such that caching the signals one batch at a time maximizes data reuse and minimizes disk I/O. To reduce CPU cost, we propose two approximation algorithms. Our first algorithm efficiently computes approximate correlatio...
Abdullah Mueen, Suman Nath, Jie Liu
Added 18 Jul 2010
Updated 18 Jul 2010
Type Conference
Year 2010
Where SIGMOD
Authors Abdullah Mueen, Suman Nath, Jie Liu
Comments (0)