An adaptation of the classic register allocation algorithm to the problem of array storage optimization in MATLAB is presented. The method involves the decomposition of an interfe...
Automated code generation and performance tuning techniques for concurrent architectures such as GPUs, Cell and FPGAs can provide integer factor speedups over multi-core processor...
This paper describes extensions to OpenMP that implement data placement features needed for NUMA architectures. OpenMP is a collection of compiler directives and library routines ...
John Bircsak, Peter Craig, RaeLyn Crowell, Zarka C...
The advances in the programmable hardware has lead to new architectures where the hardware can be dynamically adapted to the application to gain better performance. There are stil...
Exploiting locality at run-time is a complementary approach to a compiler approach for those applications with dynamic memory access patterns. This paper proposes a memory-layout ...