Sciweavers

3181 search results - page 316 / 637
» Automated Deployment Support for Parallel Distributed Comput...
Sort
View
ICDCS
2011
IEEE
14 years 3 months ago
Provisioning a Multi-tiered Data Staging Area for Extreme-Scale Machines
—Massively parallel scientific applications, running on extreme-scale supercomputers, produce hundreds of terabytes of data per run, driving the need for storage solutions to im...
Ramya Prabhakar, Sudharshan S. Vazhkudai, Youngjae...
124
Voted
PPOPP
2005
ACM
15 years 9 months ago
Fault tolerant high performance computing by a coding approach
As the number of processors in today’s high performance computers continues to grow, the mean-time-to-failure of these computers are becoming significantly shorter than the exe...
Zizhong Chen, Graham E. Fagg, Edgar Gabriel, Julie...
119
Voted
GPC
2007
Springer
15 years 7 months ago
A Design of Cooperation Management System to Improve Reliability in Resource Sharing Computing Environment
Resource sharing computing is a project that realizes high performance computing by utilizing the resources of peers that are connected to the Internet. Resource sharing computing ...
Ji Su Park, Kwang-Sik Chung, Jin Gon Shon
ICDCS
2002
IEEE
15 years 8 months ago
The Complexity of Adding Failsafe Fault-Tolerance
In this paper, we focus our attention on the problem of automating the addition of failsafe fault-tolerance where fault-tolerance is added to an existing (fault-intolerant) progra...
Sandeep S. Kulkarni, Ali Ebnenasir
DEBS
2007
ACM
15 years 7 months ago
A QoS policy configuration modeling language for publish/subscribe middleware platforms
Publish/subscribe (pub/sub) middleware platforms for eventbased distributed systems often provide many configurable policies that affect end-to-end quality of service (QoS). Altho...
Joe Hoffert, Douglas C. Schmidt, Aniruddha S. Gokh...