High Performance Pipelined Process Migration with RDMA

12 years 10 months ago
High Performance Pipelined Process Migration with RDMA
—Coordinated Checkpoint/Restart (C/R) is a widely deployed strategy to achieve fault-tolerance. However, C/R by itself is not capable enough to meet the demands of upcoming exascale systems, due to its heavy I/O overhead. Process migration has already been proposed in literature as a pro-active fault-tolerance mechanism to complement C/R. Several popular MPI implementations have provided support for process migration, including MVAPICH2 and OpenMPI. But these existing solutions cannot yield a satisfactory performance. In this paper we conduct extensive profiling on several process migration mechanisms, and reveal that inefficient I/O and network transfer are the principal factors responsible for the high overhead. We then propose a new approach, Pipelined Process Migration with RDMA (PPMR), to overcome these overheads. Our new protocol fully pipelines data writing, data transfer, and data read operations during different phases of a migration cycle. PPMR aggregates data writes on t...
Xiangyong Ouyang, Raghunath Rajachandrasekar, Xavi
Added 18 Aug 2011
Updated 18 Aug 2011
Type Journal
Year 2011
Authors Xiangyong Ouyang, Raghunath Rajachandrasekar, Xavier Besseron, Dhabaleswar K. Panda
Comments (0)