Sciweavers

PPOPP
2006
ACM

On-line automated performance diagnosis on thousands of processes

13 years 10 months ago
On-line automated performance diagnosis on thousands of processes
Performance analysis tools are critical for the effective use of large parallel computing resources, but existing tools have failed to address three problems that limit their scalability: (1) management and processing of the volume of performance data generated when monitoring a large number of application processes, (2) communication between a large number of tool components, and (3) presentation of performance data and analysis results for applications with a large number of processes. In this paper, we present a novel approach for finding performance problems in applications with a large number of processes that leverages our multicast and data aggregation infrastructure to address these three performance tool scalability barriers. First, we show how to design a scalable, distributed performance diagnosis facility. We demonstrate this design with an on-line, automated strategy for finding performance bottlenecks. Our strategy uses distributed, independent bottleneck search agents l...
Philip C. Roth, Barton P. Miller
Added 14 Jun 2010
Updated 14 Jun 2010
Type Conference
Year 2006
Where PPOPP
Authors Philip C. Roth, Barton P. Miller
Comments (0)