Sciweavers

ICS
2010
Tsinghua U.

Clustering performance data efficiently at massive scales

13 years 6 months ago
Clustering performance data efficiently at massive scales
Existing supercomputers have hundreds of thousands of processor cores, and future systems may have hundreds of millions. Developers need detailed performance measurements to tune their applications and to exploit these systems fully. However, extreme scales pose unique challenges for performance-tuning tools, which can generate significant volumes of I/O. Compute-to-I/O ratios have increased drastically as systems have grown, and the I/O systems of large machines can handle the peak load from only a small fraction of cores. Tool developers need efficient techniques to analyze and to reduce performance data from large numbers of cores. We introduce CAPEK, a novel parallel clustering algorithm that enables in-situ analysis of performance data at run time. Our algorithm scales sub-linearly to 131,072 processes, running in less than one second even at that scale, which is fast enough for on-line use in production runs. The CAPEK implementation is fully generic and can be used for many typ...
Todd Gamblin, Bronis R. de Supinski, Martin Schulz
Added 29 Sep 2010
Updated 29 Sep 2010
Type Conference
Year 2010
Where ICS
Authors Todd Gamblin, Bronis R. de Supinski, Martin Schulz, Robert J. Fowler, Daniel A. Reed
Comments (0)