Sciweavers

EUROPAR
2007
Springer

TAUoverSupermon : Low-Overhead Online Parallel Performance Monitoring

13 years 10 months ago
TAUoverSupermon : Low-Overhead Online Parallel Performance Monitoring
Online application performance monitoring allows tracking performance characteristics during execution as opposed to doing so post-mortem. This opens up several possibilities otherwise unavailable such as real-time visualization and application performance steering that can be useful in the context of long-running applications. As HPC systems grow in size and complexity, the key challenge is to keep the online performance monitor scalable and low overhead while still providing a useful performance reporting capability. Two fundamental components that constitute such a performance monitor are the measurement and transport systems. We adapt and combine two existing, mature systems - TAU and Supermon - to address this problem. TAU performs the measurement while Supermon is used to collect the distributed measurement state. Our experiments show that this novel approach leads to very lowoverhead application monitoring as well as other benefits unavailable from using a transport such as NFS...
Aroon Nataraj, Matthew J. Sottile, Alan Morris, Al
Added 07 Jun 2010
Updated 07 Jun 2010
Type Conference
Year 2007
Where EUROPAR
Authors Aroon Nataraj, Matthew J. Sottile, Alan Morris, Allen D. Malony, Sameer Shende
Comments (0)