Conventional cache models are not suited for real-time parallel processing because tasks may flush each other’s data out of the cache in an unpredictable manner. In this way th...
Anca Mariana Molnos, Marc J. M. Heijligers, Sorin ...
Efficient intra-node shared memory communication is important for High Performance Computing (HPC), especially with the emergence of multi-core architectures. As clusters continue ...
We assert that in order to perform well, a shared-memory multiprocessorinter-process communication (IPC)facility mustavoid a) accessing any shared data, and b) acquiring any locks...
One of the most fundamental problems automatic parallelization tools are confronted with is to find an optimal domain decomposition for a given application. For regular domain prob...
We present parallel and sequential dense QR factorization algorithms that are optimized to avoid communication. Some of these are novel, and some extend earlier work. Communicatio...
James Demmel, Laura Grigori, Mark Hoemmen, Julien ...