Sciweavers

86 search results - page 5 / 18
» Hybrid checkpointing for parallel applications in cluster fe...
Sort
View
SRDS
2003
IEEE
15 years 3 months ago
Raptor: Integrating Checkpoints and Thread Migration for Cluster Management
distributed shared-memory (SDSM) provides the abstraction necessary to run shared-memory applications on cost-effective parallel platforms such as clusters of workstations. Howeve...
Hazim Shafi, Evan Speight, John K. Bennett
ICDCS
2011
IEEE
13 years 9 months ago
Provisioning a Multi-tiered Data Staging Area for Extreme-Scale Machines
—Massively parallel scientific applications, running on extreme-scale supercomputers, produce hundreds of terabytes of data per run, driving the need for storage solutions to im...
Ramya Prabhakar, Sudharshan S. Vazhkudai, Youngjae...
PDPTA
2000
14 years 11 months ago
Dependable High Performance Computing on a Parallel Sysplex Cluster
In this paper we address the issue of dependable distributed high performance computing in the field of Symbolic Computation. We describe the extension of a middleware infrastructu...
Wolfgang Blochinger, Reinhard Bündgen, Andrea...
ICPADS
2007
IEEE
15 years 4 months ago
Federated clusters using the transparent remote Execution (TREx) environment
- Due to the increasing complexity of scientific models, large-scale simulation tools often require a critical amount of computational power to produce results in a reasonable amou...
Richert Wang, Enrique Cauich, Isaac D. Scherson
PODC
1994
ACM
15 years 2 months ago
A Checkpoint Protocol for an Entry Consistent Shared Memory System
Workstation clusters are becoming an interesting alternative to dedicated multiprocessors. In this environment, the probability of a failure, during an application's executio...
Nuno Neves, Miguel Castro, Paulo Guedes