Aggressive compiler optimizations such as software pipelining and loop invariant code motion can significantly improve application performance, but these transformations often re...
Chris Zimmer, Stephen Roderick Hines, Prasad Kulka...
We present a parallel code generation algorithm for complete applications and a new experimental methodology that tests the efficacy of our approach. The algorithm optimizes for d...
OpenMP has emerged as a widely accepted standard for writing shared memory programs. Hardware-specific extensions such as data placement are usually needed to improve the scalabi...
Abstract. Limited bandwidth to off-chip main memory is a performance bottleneck in chip multiprocessors for streaming computations, such as Cell/B.E., and this will become even mor...
Speed improvements in today's processors have largely been delivered in the form of multiple cores, increasing the importance of ions that ease parallel programming. Software...