—Massively parallel scientific applications, running on extreme-scale supercomputers, produce hundreds of terabytes of data per run, driving the need for storage solutions to im...
Ramya Prabhakar, Sudharshan S. Vazhkudai, Youngjae...
As the number of processors in today’s high performance computers continues to grow, the mean-time-to-failure of these computers are becoming significantly shorter than the exe...
Zizhong Chen, Graham E. Fagg, Edgar Gabriel, Julie...
Resource sharing computing is a project that realizes high performance computing by utilizing the resources of peers that are connected to the Internet. Resource sharing computing ...
In this paper, we focus our attention on the problem of automating the addition of failsafe fault-tolerance where fault-tolerance is added to an existing (fault-intolerant) progra...
Publish/subscribe (pub/sub) middleware platforms for eventbased distributed systems often provide many configurable policies that affect end-to-end quality of service (QoS). Altho...
Joe Hoffert, Douglas C. Schmidt, Aniruddha S. Gokh...