In message-passing parallel applications, messages are not delivered in a strict order. In most applications, the computation results and the set of messages produced during the e...
Basile Schaeli, Sebastian Gerlach, Roger D. Hersch
Data sets in large applications are often too massive to fit completely inside the computer's internal memory. The resulting input/output communication (or I/O) between fast ...
Abstract--This paper explores the computation and communication overlap capabilities enabled by the new CORE-Direct hardware capabilities introduced in the InfiniBand (IB) Host Cha...
Richard L. Graham, Stephen W. Poole, Pavel Shamis,...
This paper proposes a robust estimation and validation framework for characterizing local structures in a positive multi-variate continuous function approximated by a Gaussian-base...
Systolic implementations of dynamic programming solutions that utilize a similarity matrix can achieve appreciable performance with both course- and fine-grain parallelization. A ...