On modern computers, the performance of programs is often limited by memory latency rather than by processor cycle time. To reduce the impact of memory latency, the restructuring ...
Induprakas Kodukula, Keshav Pingali, Robert Cox, D...
This paper describes a tiling technique that can be used by application programmers and optimizing compilers to obtain I/O-efficient versions of regular scientific loop nests. Du...
Mahmut T. Kandemir, Alok N. Choudhary, J. Ramanuja...
This paper describes a tiling technique that can be used by application programmers and optimizing compilers to obtain I/O-efficient versions of regular scientific loop nests. Due ...
Mahmut T. Kandemir, Alok N. Choudhary, J. Ramanuja...
We present an approach for synthesizing transformations to enhance locality in imperfectly-nested loops. The key idea is to embed the iteration space of every statement in a loop ...
Clustered L0 buffers are an interesting alternative to reduce energy consumption in the instruction memory hierarchy of embedded VLIW processors. Currently, the synthesis of L0 cl...
Murali Jayapala, Tom Vander Aa, Francisco Barat, G...