Search Sciweavers | Sciweavers

86 search results - page 1 / 18

» Hybrid checkpointing for parallel applications in cluster fe...

click to vote

CCGRID
2004
IEEE

66views Distributed And Parallel Com...» more CCGRID 2004»

Hybrid checkpointing for parallel applications in cluster federations

13 years 8 months ago

Download hal.archives-ouvertes.fr

Sébastien Monnet, Christine Morin, Ramamurt...

claim paper

Read More »

click to vote

ICPADS
2010
IEEE

169views Distributed And Parallel Com...» more ICPADS 2010»

Hybrid Checkpointing for MPI Jobs in HPC Environments

13 years 2 months ago

Download moss.csc.ncsu.edu

As the core count in high-performance computing systems keeps increasing, faults are becoming common place. Checkpointing addresses such faults but captures full process images ev...

Chao Wang, Frank Mueller, Christian Engelmann, Ste...

claim paper

Read More »

click to vote

CLUSTER
2004
IEEE

103views Distributed And Parallel Com...» more CLUSTER 2004»

MPI/FT: A Model-Based Approach to Low-Overhead Fault Tolerant Message-Passing Middleware

13 years 4 months ago

Download www.cse.msstate.edu

Fault tolerance in parallel systems has traditionally been achieved through a combination of redundancy and checkpointing methods. This notion has also been extended to message-pas...

Rajanikanth Batchu, Yoginder S. Dandass, Anthony S...

claim paper

Read More »

click to vote

HIPC
2009
Springer

146views Distributed And Parallel Com...» more HIPC 2009»

Fast checkpointing by Write Aggregation with Dynamic Buffer and Interleaving on multicore architecture

13 years 2 months ago

Download nowlab.cse.ohio-state.edu

Large scale compute clusters continue to grow to ever-increasing proportions. However, as clusters and applications continue to grow, the Mean Time Between Failures (MTBF) has redu...

Xiangyong Ouyang, Karthik Gopalakrishnan, Tejus Ga...

claim paper

Read More »

click to vote

HIPC
2007
Springer

133views Distributed And Parallel Com...» more HIPC 2007»

A Scalable Asynchronous Replication-Based Strategy for Fault Tolerant MPI Applications

13 years 11 months ago

Download www.cse.buffalo.edu

As computational clusters increase in size, their mean-time-to-failure reduces. Typically checkpointing is used to minimize the loss of computation. Most checkpointing techniques, ...

John Paul Walters, Vipin Chaudhary

claim paper

Read More »

« Prev « First page 1 / 18 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers