Sciweavers

146 search results - page 28 / 30
» Transparent Checkpoint-Restart of Distributed Applications o...
Sort
View
IPPS
2008
IEEE
15 years 3 months ago
Overcoming scaling challenges in biomolecular simulations across multiple platforms
NAMD† is a portable parallel application for biomolecular simulations. NAMD pioneered the use of hybrid spatial and force decomposition, a technique now used by most scalable pr...
Abhinav Bhatele, Sameer Kumar, Chao Mei, James C. ...
CLUSTER
2004
IEEE
15 years 1 months ago
Improved message logging versus improved coordinated checkpointing for fault tolerant MPI
Fault tolerance is a very important concern for critical high performance applications using the MPI library. Several protocols provide automatic and transparent fault detection a...
Pierre Lemarinier, Aurelien Bouteiller, Thomas H&e...
ASPLOS
1996
ACM
15 years 1 months ago
Shasta: A Low Overhead, Software-Only Approach for Supporting Fine-Grain Shared Memory
This paper describes Shasta, a system that supports a shared address space in software on clusters of computers with physically distributed memory. A unique aspect of Shasta compa...
Daniel J. Scales, Kourosh Gharachorloo, Chandramoh...
97
Voted
CLUSTER
2006
IEEE
15 years 1 months ago
Improving Communication Performance on InfiniBand by Using Efficient Data Placement Strategies
Despite using high-speed network interconnection systems like InfiniBand, the communication overhead for parallel applications is still high. In this paper we show, how such costs...
Robert Rex, Frank Mietke, Wolfgang Rehm, Christoph...
82
Voted
INFOCOM
2002
IEEE
15 years 2 months ago
KNITS: Switch-based Connection Hand-off
—This paper describes a mechanism allowing nodes to hand-off active connections by utilizing connection splicing at an edge-switch serving as a gateway to a server cluster. The m...
Eric Van Hensbergen, Athanasios E. Papathanasiou