Trace-level reuse is based on the observation that some traces (dynamic sequences of instructions) are frequently repeated during the execution of a program, and in many cases, th...
This paper addresses the problem of efficient and accurate performance analysis to drive the exploration and design of bus-based System-on-Chip (SOC) communication architectures. ...
Emergence of new parallel architectures presents new challenges for application developers. Supercomputers vary in processor speed, network topology, interconnect communication ch...
Abhinav Bhatele, Lukasz Wesolowski, Eric J. Bohm, ...
Abstract—The Charm++ parallel programming system provides a modular performance interface that can be used to extend its performance measurement and analysis capabilities. The in...
Scott Biersdorff, Chee Wai Lee, Allen D. Malony, L...
Grid jobs often consist of a large number of tasks. If the performance of a statically scheduled grid job is unsatisfactory, one must decide which code of which task should be imp...