: Data distribution is one of the key aspects that a parallelizing compiler for a distributed memory architecture should consider, in order to get efficiency from the system. The ...
Access/execute architectures have several advantages over more traditional architectures. Because address generation and memory access are decoupled from operand use, memory laten...
Blocking is a well-known optimization technique for improving the effectiveness of memory hierarchies. Instead of operating on entire rows or columns of an array, blocked algorith...
Monica S. Lam, Edward E. Rothberg, Michael E. Wolf
Concurrency control is one of the main sources of error and complexity in shared memory parallel programming. While there are several techniques to handle concurrency control such...
Luis Ceze, Christoph von Praun, Calin Cascaval, Pa...
We present Offload, a programming model for offloading parts of a C++ application to run on accelerator cores in a heterogeneous multicore system. Code to be offloaded is enclosed ...
Pete Cooper, Uwe Dolinsky, Alastair F. Donaldson, ...