Sciweavers

IJHPCA
2007

Scaling Properties of Common Statistical Operators for Gridded Datasets

13 years 4 months ago
Scaling Properties of Common Statistical Operators for Gridded Datasets
An accurate cost-model that accounts for dataset size and structure can help optimize geoscience data analysis. We develop and apply a computational model to estimate data analysis costs for arithmetic operations on gridded datasets typical of satellite- or climate model-origin. For these dataset geometries our model predicts data reduction scalings that agree with measurements of widely-used geoscience data processing software, the netCDF Operators (NCO). I/O performance and library design dominate throughput for simple analysis (e.g., dataset differencing). Dataset structure can reduce analysis throughput ten-fold relative to same-sized unstructured datasets. We demonstrate algorithmic optimizations which substantially increase throughput for more complex, arithmetic-dominated analysis such as weighted-averaging of multi-dimensional data. These scaling properties can help to estimate costs of distribution strategies for data reduction in cluster and grid environments.
Charles S. Zender, Harry Mangalam
Added 15 Dec 2010
Updated 15 Dec 2010
Type Journal
Year 2007
Where IJHPCA
Authors Charles S. Zender, Harry Mangalam
Comments (0)