Buffered CoScheduled MPI (BCS-MPI) introduces a new approach to design the communication layer for largescale parallel machines. The emphasis of BCS-MPI is on the global coordinat...
Process skew is an important factor in the performance of parallel applications, especially in large-scale clusters. Reduction is a common collective operation which, by its natur...
Adam Wagner, Darius Buntinas, Dhabaleswar K. Panda...
We present a model for the parallel performance of algorithms that consist of concurrent, two-dimensional wavefronts implemented in a message passing environment. The model combine...
Adolfy Hoisie, Olaf M. Lubeck, Harvey J. Wasserman
Sampling is a widely used technique to increase efficiency in database and data mining applications operating on large dataset. In this paper we present a scalable sampling imple...
In this paper we present an analytical-based framework for parallel program performance prediction. The main thrust of this work is to provide a means for treating realistic appli...