Portable or embedded systems allow more and more complex applications like multimedia today. These applications and submicronic technologies have made the power consumption criter...
When vectorizing for SIMD architectures that are commonly employed by today’s multimedia extensions, one of the new challenges that arise is the handling of memory alignment. Pr...
In this paper we present the internal representation and optimizations used by the CASH compiler for improving the memory parallelism of pointer-based programs. CASH uses an SSA-b...
This paper focuses on the Cyclops64 computer architecture and presents an analytical model and performance simulation results for the preloading and loop unrolling approaches to op...
Yanwei Niu, Ziang Hu, Kenneth E. Barner, Guang R. ...
Improving memory performance at software level is more effective in reducing the rapidly expanding gap between processor and memory performance. Loop transformations (e.g. loop un...
Surendra Byna, Xian-He Sun, William Gropp, Rajeev ...