A scalable approach to performance analysis of MPI applications is presented that includes automated source code instrumentation, low overhead generation of profile and trace data...
Shirley Moore, Felix Wolf, Jack Dongarra, Sameer S...
Using historical information to predict future runs of parallel jobs has shown to be valuable in job scheduling. Trends toward more flexible jobscheduling techniques such as adapt...
Condition based maintenance (CBM) seeks to generate a design for a new ship wide CMB system that performs diagnoses and failure prediction on Navy shipboard machinery. Eventually, ...
Hypervisor-based fault tolerance (HBFT), a checkpoint-recovery mechanism, is an emerging approach to sustaining mission-critical applications. Based on virtualization technology, H...
Jun Zhu, Wei Dong, Zhefu Jiang, Xiaogang Shi, Zhen...
The current technologies have made it possible to execute parallel applications across heterogeneous platforms. However, the performance models available do not provide adequate m...
Jameela Al-Jaroodi, Nader Mohamed, Hong Jiang, Dav...