This paper emphasizes on load balancing issues associated with hybrid programming models for the parallelization of fully permutable nested loops onto SMP clusters. Hybrid paralle...
As we move to large manycores, the hardware-based global checkpointing schemes that have been proposed for small shared-memory machines do not scale. Scalability barriers include ...
Abstract--This paper explores the computation and communication overlap capabilities enabled by the new CORE-Direct hardware capabilities introduced in the InfiniBand (IB) Host Cha...
Richard L. Graham, Stephen W. Poole, Pavel Shamis,...
In this paper, we present three parallel flexible approximate string matching methods on a parallel architecture with heterogeneous workstations to gain supercomputer power at lo...
Panagiotis D. Michailidis, Konstantinos G. Margari...
Many large-scale parallel programs follow a bulk synchronous parallel (BSP) structure with distinct computation and communication phases. Although the communication phase in such ...
Torsten Hoefler, Christian Siebert, Andrew Lumsdai...