Sciweavers

66 search results - page 2 / 14
» The Checkpoint Problem
Sort
View
ISCA
2011
IEEE
238views Hardware» more  ISCA 2011»
12 years 9 months ago
Rebound: scalable checkpointing for coherent shared memory
As we move to large manycores, the hardware-based global checkpointing schemes that have been proposed for small shared-memory machines do not scale. Scalability barriers include ...
Rishi Agarwal, Pranav Garg, Josep Torrellas
IPPS
1998
IEEE
13 years 9 months ago
A Generalized Forward Recovery Checkpointing Scheme
We propose a generalized forward recovery checkpointing scheme, with lookahead execution and rollback validation. This method takes advantage of voting and comparison on multiple v...
Ke Huang, Jie Wu, Eduardo B. Fernández
CLUSTER
2003
IEEE
13 years 10 months ago
Coordinated Checkpoint versus Message Log for Fault Tolerant MPI
— Large Clusters, high availability clusters and Grid deployments often suffer from network, node or operating system faults and thus require the use of fault tolerant programmin...
Aurelien Bouteiller, Pierre Lemarinier, Gér...
SIAMSC
2010
132views more  SIAMSC 2010»
13 years 3 months ago
New Algorithms for Optimal Online Checkpointing
Frequently, the computation of derivatives for optimizing time-dependent problems is based on the integration of the adjoint differential equation. For this purpose, the knowledge...
Philipp Stumm, Andrea Walther
HPDC
2007
IEEE
13 years 11 months ago
Failure-aware checkpointing in fine-grained cycle sharing systems
Fine-Grained Cycle Sharing (FGCS) systems aim at utilizing the large amount of idle computational resources available on the Internet. Such systems allow guest jobs to run on a ho...
Xiaojuan Ren, Rudolf Eigenmann, Saurabh Bagchi