We present a cache locality optimization technique that can optimize a loop nest even if the arrays referenced have different layouts in memory. Such a capability is required for a...
Mahmut T. Kandemir, J. Ramanujam, Alok N. Choudhar...
In translating HPF programs, a compiler has to generate local iteration and communication sets. Apart from local enumeration, local storage compression is an issue, because in HPF ...
This paper proposes a simple and efficient implementation method for a hierarchical coarse grain task parallel processing scheme on a SMP machine. OSCAR multigrain parallelizing c...
Thread-level speculation (TLS) allows potentially dependent threads to speculatively execute in parallel, thus making it easier for the compiler to extract parallel threads. Howeve...
This paper improves our previous research effort [1] by providing an efficient method for kernel loop unrolling minimisation in the case of already scheduled loops, where circular...