Sciweavers

35 search results - page 3 / 7
» Transparent checkpoints of closed distributed systems in Emu...
Sort
View
ICDCS
2012
IEEE
13 years 15 hour ago
Combining Partial Redundancy and Checkpointing for HPC
Today’s largest High Performance Computing (HPC) systems exceed one Petaflops (1015 floating point operations per second) and exascale systems are projected within seven years...
James Elliott, Kishor Kharbas, David Fiala, Frank ...
86
Voted
CLUSTER
2004
IEEE
15 years 1 months ago
Improved message logging versus improved coordinated checkpointing for fault tolerant MPI
Fault tolerance is a very important concern for critical high performance applications using the MPI library. Several protocols provide automatic and transparent fault detection a...
Pierre Lemarinier, Aurelien Bouteiller, Thomas H&e...
SAC
2005
ACM
15 years 3 months ago
Adaptation point analysis for computation migration/checkpointing
Finding the appropriate location of adaptation points for computation migration/checkpointing is critical since the distance between two consecutive adaptation points determines t...
Yanqing Ji, Hai Jiang, Vipin Chaudhary
WORDS
2003
IEEE
15 years 2 months ago
Decentralized Resource Management and Fault-Tolerance for Distributed CORBA Applications
Assigning an application’s fault-tolerance properties (e.g., replication style, checkpointing frequency) statically, and in an arbitrary manner, can lead to the application not ...
Carlos F. Reverte, Priya Narasimhan
ECOOPW
1999
Springer
15 years 1 months ago
Providing Policy-Neutral and Transparent Access Control in Extensible Systems
Extensible systems, such as Java or the SPIN extensible operating system, allow for units of code, or extensions, to be added to a running system in almost arbitrary fashion. Exte...
Robert Grimm, Brian N. Bershad