The Message Passing Interface (MPI) is one of the most widely used programming models for parallel computing. However, the amount of memory available to an MPI process is limited ...
James Dinan, Pavan Balaji, Ewing L. Lusk, P. Saday...
This paper presents a proto-type autonomous signal processing system on a chip. The system is architected such that high performance digital signal processing occurs in the FPGAâ€...
The I/O access patterns of many parallel applications consist of accesses to a large number of small, noncontiguous pieces of data. If an application's I/O needs are met by m...
Although event tracing of parallel applications offers highly detailed performance information, tracing on current leading edge systems may lead to unacceptable perturbation of the...
—Sparse Matrix-Vector multiplication (SpMV) is a very challenging computational kernel, since its performance depends greatly on both the input matrix and the underlying architec...
Vasileios Karakasis, Georgios I. Goumas, Nectarios...