The algorithms in the current sequential numerical linear algebra libraries (e.g. LAPACK) do not parallelize well on multicore architectures. A new family of algorithms, the tile a...
Emmanuel Agullo, Henricus Bouwmeester, Jack Dongar...
Boundeddegreenetworks like deBruijn graphsor wrapped butterfly networks are very important from VLSI implementation point of view as well as for applications where the computing n...
This paper describes PARDIS, a system containing explicit support for interoperability of PARallel DIStributed applications. PARDIS is based on the Common Object Request Broker Ar...
While commodity computing and graphics hardware has increased in capacity and dropped in cost, it is still quite difficult to make effective use of such systems for general-purpos...
E. Wes Bethel, Greg Humphreys, Brian E. Paul, J. D...
This paper introduces a method for improving program run-time performance by gathering work in an application and executing it efficiently in an integrated thread. Our methods ext...