The number of multithreaded Message Passing Interface (MPI) implementations and applications is increasing rapidly. We discuss how multithreaded applications can receive messages o...
Torsten Hoefler, Greg Bronevetsky, Brian Barrett, ...
Abstract. Domain decomposition for regular meshes on parallel computers has traditionally been performed by attempting to exactly partition the work among the available processors ...
Commercial HPC applications are often run on clusters that use the Microsoft Windows operating system and need an MPI implementation that runs efficiently in the Windows environmen...
Jayesh Krishna, Pavan Balaji, Ewing L. Lusk, Rajee...
Abstract. Designing and tuning parallel applications with MPI, particularly at large scale, requires understanding the performance implications of different choices of algorithms ...
Torsten Hoefler, William Gropp, Rajeev Thakur, Jes...
Abstract. Today’s large finite element simulations require parallel algorithms to scale on clusters with thousands or tens of thousands of processor cores. We present data struc...
Timo Heister, Martin Kronbichler, Wolfgang Bangert...
Abstract. In this work we present two algorithms of irregular scatter/gather operations based on the binomial tree and Tr¨aff algorithms. We use the prediction provided by hetero...
Kiril Dichev, Vladimir Rychkov, Alexey L. Lastovet...
A major trend in HPC is the escalation toward manycore, where systems are composed of shared memory nodes featuring numerous processing units. Unfortunately, with scale comes compl...
Teng Ma, George Bosilca, Aurelien Bouteiller, Jack...
Abstract. VolpexMPI is an MPI library designed for volunteer computing environments. In order to cope with the fundamental unreliability of these environments, VolpexMPI deploys tw...
Parallel programming models on large-scale systems require a scalable system for managing the processes that make up the execution of a parallel program. The process-management sys...
Pavan Balaji, Darius Buntinas, David Goodell, Will...