Search Sciweavers | Sciweavers

9 search results - page 2 / 2

» Coordinated Checkpoint versus Message Log for Fault Tolerant...

click to vote

IJHPCA
2006

117views more IJHPCA 2006»

MPICH-V Project: A Multiprotocol Automatic Fault-Tolerant MPI

13 years 4 months ago

Download www.cs.utk.edu

Abstract-- High performance computing platforms like Clusters, Grid and Desktop Grids are becoming larger and subject to more frequent failures. MPI is one of the most used message...

Aurelien Bouteiller, Thomas Hérault, G&eacu...

claim paper

Read More »

click to vote

IPPS
2007
IEEE

129views Distributed And Parallel Com...» more IPPS 2007»

A Fault Tolerance Protocol with Fast Fault Recovery

13 years 11 months ago

Download www.cecs.uci.edu

Fault tolerance is an important issue for large machines with tens or hundreds of thousands of processors. Checkpoint-based methods, currently used on most machines, rollback all ...

Sayantan Chakravorty, Laxmikant V. Kalé

claim paper

Read More »

click to vote

SC
2000
ACM

110views Applied Computing» more SC 2000»

Scalable Fault-Tolerant Distributed Shared Memory

13 years 9 months ago

Download www.sc2000.org

This paper shows how a state-of-the-art software distributed shared-memory (DSM) protocol can be eﬃciently extended to tolerate single-node failures. In particular, we extend a ...

Florin Sultan, Thu D. Nguyen, Liviu Iftode

claim paper

Read More »

click to vote

HPDC
2009
IEEE

101views Distributed And Parallel Com...» more HPDC 2009»

Interconnect agnostic checkpoint/restart in open MPI

13 years 11 months ago

Download www.osl.iu.edu

Long running High Performance Computing (HPC) applications at scale must be able to tolerate inevitable faults if they are to harness current and future HPC systems. Message Passi...

Joshua Hursey, Timothy Mattox, Andrew Lumsdaine

claim paper

Read More »

« Prev « First page 2 / 2 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers