Sciweavers

CLUSTER
2009
IEEE

Numerically stable, single-pass, parallel statistics algorithms

13 years 9 months ago
Numerically stable, single-pass, parallel statistics algorithms
—Statistical analysis is widely used for countless scientific applications in order to analyze and infer meaning from data. A key challenge of any statistical analysis package aimed at large-scale, distributed data is to address the orthogonal issues of parallel scalability and numerical stability. In this paper we derive a series of formulas that allow for single-pass, yet numerically robust, pairwise parallel and incremental updates of both arbitrary-order centered statistical moments and comoments. Using these formulas, we have built an open source parallel statistics framework that performs principal component analysis (PCA) in addition to computing descriptive, correlative, and multi-correlative statistics. The results of a scalability study demonstrate numerically stable, near-optimal scalability on up to 128 processes and results are presented in which the statistical framework is used to process large-scale turbulent combustion simulation data with 1500 processes.
Janine Bennett, R. Grout, Philippe P. Pébay
Added 21 Jul 2010
Updated 21 Jul 2010
Type Conference
Year 2009
Where CLUSTER
Authors Janine Bennett, R. Grout, Philippe P. Pébay, Diana Roe, David Thompson
Comments (0)