In earlier work, we showed that the one-sided communication model found in PGAS languages (such as UPC) offers significant advantages in communication efficiency by decoupling d...
Rajesh Nishtala, Paul Hargrove, Dan Bonachea, Kath...
A significant fraction of parallel scientific codes are iterative with barriers between iterations or even between phases of the same iteration. The sender of a message is assur...
Eric J. Bohm, Sayantan Chakravorty, Pritish Jetley...
Sorting is a commonly used process with a wide breadth of applications in the high performance computing field. Early research in parallel processing has provided us with comprehen...
Many large-scale parallel programs follow a bulk synchronous parallel (BSP) structure with distinct computation and communication phases. Although the communication phase in such ...
Torsten Hoefler, Christian Siebert, Andrew Lumsdai...
Abstract. Designing and tuning parallel applications with MPI, particularly at large scale, requires understanding the performance implications of different choices of algorithms ...
Torsten Hoefler, William Gropp, Rajeev Thakur, Jes...