This paper presents reduction recognition and parallel code generationstrategies for distributed-memorymultiprocessors. We describe techniques to recognize a broad range of implic...
Abstract. Local CPS conversion is a compiler transformation for improving the code generated for nested loops by a direct-style compiler that uses recursive functions to represent ...
Global locality analysis is a technique for improving the cache performance of a sequence of loop nests through a combination of loop and data layout optimizations. Pure loop tran...
Mahmut T. Kandemir, Alok N. Choudhary, J. Ramanuja...
On machines with high-performance processors, the memory system continues to be a performance bottleneck. Compilers insert prefetch operations and reorder data accesses to improve...
Nathaniel McIntosh, Sandya Mannarswamy, Robert Hun...
Compiler technology is becoming a key component in the design of embedded systems, mostly due to increasing participation of software in the design process. Meeting system-level ob...