A 5-year-profiling in production mode at the University of Stuttgart has shown that more than 40% of the execution time of Message Passing Interface (MPI) routines is spent in the...
The paper proposes a novel approach for optimizing performance of all-to-all collective communication by taking advantage of concurrency available in modern networks such as Infin...
Writing parallel applications for computational grids is a challenging task. To achieve good performance, algorithms designed for local area networks must be adapted to the differ...
Thilo Kielmann, Rutger F. H. Hofman, Henri E. Bal,...
We describe a generic programming model to design collective communications on SMP clusters. The programming model utilizes shared memory for collective communications and overlap...