—Large-scale GPU clusters are gaining popularity in the scientific computing community. However, their deployment and production use are associated with a number of new challenge...
Volodymyr V. Kindratenko, Jeremy Enos, Guochun Shi...
Collective operations and non-blocking point-to-point operations are two important parts of MPI that each provide important performance and programmability benefits. Although non...
Workflow is the key technology for business process automation, while distributed workflow is the solution to deal with the decentralized nature of workflow applications and the pe...
In this paper, we propose a modified parallel virtual file system that provides snapshot functionality. Because typical file systems are exposed to various failures, taking a s...
Abstract-Predicting sequential execution blocks of a large scale parallel application is an essential part of accurate prediction of the overall performance of the application. Whe...
Gengbin Zheng, Gagan Gupta, Eric J. Bohm, Isaac Do...