Some of the most challenging applications to parallelize scalably are the ones that present a relatively small amount of computation per iteration. Multiple interacting performanc...
Tuning parallel code can be a time-consuming and difficult task. We present our approach to automate the performance analysis of OpenMP applications that is based on the notion of ...
Chip multiprocessors (CMP) are widely used for high performance computing. Further, these CMPs are being configured in a hierarchical manner to compose a node in a cluster system....
Xingfu Wu, Valerie E. Taylor, Charles W. Lively, S...
Performance analysis of real applications in clusters and GRID like environments is essential to fully exploit the performance of new architectures. The key problem is the deepenin...
Holger Brunst, Edgar Gabriel, Marc Lange, Matthias...
We present the Stack Trace Analysis Tool (STAT) to aid in debugging extreme-scale applications. STAT can reduce problem exploration spaces from thousands of processes to a few by ...
Dorian C. Arnold, Dong H. Ahn, Bronis R. de Supins...