Search Sciweavers | Sciweavers

63 search results - page 2 / 13

» Adaptive incremental checkpointing for massively parallel sy...

click to vote

LCPC
2007
Springer

139views System Software» more LCPC 2007»

Compiler-Enhanced Incremental Checkpointing

13 years 11 months ago

Download greg.bronevetsky.com

As modern supercomputing systems reach the peta-ﬂop performance range, they grow in both size and complexity. This makes them increasingly vulnerable to failures from a variety o...

Greg Bronevetsky, Daniel Marques, Keshav Pingali, ...

claim paper

Read More »

click to vote

IPPS
2007
IEEE

102views Distributed And Parallel Com...» more IPPS 2007»

DejaVu: Transparent User-Level Checkpointing, Migration, and Recovery for Distributed Systems

13 years 11 months ago

Download www.cecs.uci.edu

In this paper, we present a new fault tolerance system called DejaVu for transparent and automatic checkpointing, migration, and recovery of parallel and distributed applications....

Joseph F. Ruscio, Michael A. Heffner, Srinidhi Var...

claim paper

Read More »

click to vote

SC
2009
ACM

306views Applied Computing» more SC 2009»

Leveraging 3D PCRAM technologies to reduce checkpoint overhead for future exascale systems

14 years 3 days ago

Download www.cs.utah.edu

The scalability of future massively parallel processing (MPP) systems is being severely challenged by high failure rates. Current hard disk drive (HDD) checkpointing results in ov...

Xiangyu Dong, Naveen Muralimanohar, Norman P. Joup...

claim paper

Read More »

click to vote

ICDCS
2008
IEEE

128views Distributed And Parallel Com...» more ICDCS 2008»

stdchk: A Checkpoint Storage System for Desktop Grid Computing

13 years 11 months ago

Download www.ece.ubc.ca

— Checkpointing is an indispensable technique to provide fault tolerance for long-running high-throughput applications like those running on desktop grids. This paper argues that...

Samer Al-Kiswany, Matei Ripeanu, Sudharshan S. Vaz...

claim paper

Read More »

click to vote

CLOUDCOM
2010
Springer

142views Distributed And Parallel Com...» more CLOUDCOM 2010»

REMEM: REmote MEMory as Checkpointing Storage

13 years 3 months ago

Download ft.ornl.gov

Checkpointing is a widely used mechanism for supporting fault tolerance, but notorious in its high-cost disk access. The idea of memory-based checkpointing has been extensively stu...

Hui Jin, Xian-He Sun, Yong Chen, Tao Ke

claim paper

Read More »

« Prev « First page 2 / 13 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers