This paper describes the design and the implementation of parallel routines in the Heterogeneous ScaLAPACK library that solve a dense system of linear equations. This library is w...
Ravi Reddy Manumachu, Alexey L. Lastovetsky, Pedro...
Linear algebra algorithms are fundamental to many computing applications. Modern GPUs are suited for many general purpose processing tasks and have emerged as inexpensive high per...
In this paper, we empirically evaluate fundamental design trade-offs among the most recent multicore processors and accelerator technologies. Our primary aim is to aid application...
—User-level communication allows an application process to access the network interface directly. Bypassing the kernel requires that a user process accesses the network interface...
We introduce Scioto, Shared Collections of Task Objects, a lightweight framework for providing task management on distributed memory machines under one-sided and globalview parall...
James Dinan, Sriram Krishnamoorthy, D. Brian Larki...