Sciweavers

CLUSTER
2003
IEEE

Application-Bypass Reduction for Large-Scale Clusters

13 years 9 months ago
Application-Bypass Reduction for Large-Scale Clusters
Process skew is an important factor in the performance of parallel applications, especially in large-scale clusters. Reduction is a common collective operation which, by its nature, introduces implicit synchronization between the processes involved in the communication and is therefore highly susceptible to performance degradation due to process skew. A collective operation with application-bypass does not require the application to block in order for the operation to make progress. Application-bypass collective operations are therefore highly tolerant of skew. In this paper we describe the design and implementation of an application-bypass version of the reduction operation in MPICH over GM. We evaluate our implementation on a 16-node cluster. Under conditions of process skew we find a factor of improvement of up to 3.3 for our application-bypass reduction versus the default MPICH implementation. In addition, we see that this factor of improvement increases with system size, indicat...
Adam Wagner, Darius Buntinas, Dhabaleswar K. Panda
Added 04 Jul 2010
Updated 04 Jul 2010
Type Conference
Year 2003
Where CLUSTER
Authors Adam Wagner, Darius Buntinas, Dhabaleswar K. Panda, Ron Brightwell
Comments (0)