Search Sciweavers | Sciweavers

37 search results - page 3 / 8

» High performance linpack benchmark: a fault tolerant impleme...

click to vote

CLUSTER
2003
IEEE

165views Distributed And Parallel Com...» more CLUSTER 2003»

Coordinated Checkpoint versus Message Log for Fault Tolerant MPI

14 years 2 months ago

Download www.cs.utk.edu

— Large Clusters, high availability clusters and Grid deployments often suffer from network, node or operating system faults and thus require the use of fault tolerant programmin...

Aurelien Bouteiller, Pierre Lemarinier, Gér...

claim paper

Read More »

click to vote

IPPS
2007
IEEE

95views Distributed And Parallel Com...» more IPPS 2007»

Implementing and Evaluating Automatic Checkpointing

14 years 3 months ago

Download www.cecs.uci.edu

As the size and popularity of computer clusters go on growing, fault tolerance is becoming a crucial factor to ensure high performance and reliability for applications. To provide...

Antonio S. Martins, Ronaldo Augusto Lara Gon&ccedi...

claim paper

Read More »

click to vote

CCGRID
2006
IEEE

131views Distributed And Parallel Com...» more CCGRID 2006»

Proposal of MPI Operation Level Checkpoint/Rollback and One Implementation

14 years 3 months ago

Download icl.cs.utk.edu

With the increasing number of processors in modern HPC(High Performance Computing) systems, there are two emergent problems to solve. One is scalability, the other is fault tolera...

Yuan Tang, Graham E. Fagg, Jack Dongarra

claim paper

Read More »

click to vote

FGCS
2008

140views more FGCS 2008»

Blocking vs. non-blocking coordinated checkpointing for large-scale fault tolerant MPI Protocols

13 years 9 months ago

Download www.public.iastate.edu

A long-term trend in high-performance computing is the increasing number of nodes in parallel computing platforms, which entails a higher failure probability. Fault tolerant progr...

Darius Buntinas, Camille Coti, Thomas Hérau...

claim paper

Read More »

click to vote

IPPS
2007
IEEE

102views Distributed And Parallel Com...» more IPPS 2007»

DejaVu: Transparent User-Level Checkpointing, Migration, and Recovery for Distributed Systems

14 years 3 months ago

Download www.cecs.uci.edu

In this paper, we present a new fault tolerance system called DejaVu for transparent and automatic checkpointing, migration, and recovery of parallel and distributed applications....

Joseph F. Ruscio, Michael A. Heffner, Srinidhi Var...

claim paper

Read More »

« Prev « First page 3 / 8 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers