Many large-scale production applications often have very long executions times and require periodic data checkpoints in order to save the state of the computation for program rest...
Wei-keng Liao, Avery Ching, Kenin Coloma, Alok N. ...
Many parallel applications from scientific computing use MPI global communication operations to collect or distribute data. Since the execution times of these communication opera...
In recent years, several approaches have been proposed to use profile information in compiler optimization. This profile information can be used at the source level to guide loo...
Masayo Haneda, Peter M. W. Knijnenburg, Harry A. G...
This paper investigates helper threads that improve performance by prefetching data on behalf of an application’s main thread. The focus is data prefetch helper threads that lac...
Cache behavior modeling is an important part of modern optimizing compilers. In this paper we present a method to estimate the number of cache misses, at compile time, using a mac...