Sciweavers

62 search results - page 4 / 13
» Checkpoint and Recovery Methods in the ParaSol Simulation Sy...
Sort
View
CCGRID
2006
IEEE
15 years 3 months ago
Proposal of MPI Operation Level Checkpoint/Rollback and One Implementation
With the increasing number of processors in modern HPC(High Performance Computing) systems, there are two emergent problems to solve. One is scalability, the other is fault tolera...
Yuan Tang, Graham E. Fagg, Jack Dongarra
ICDE
2004
IEEE
119views Database» more  ICDE 2004»
15 years 11 months ago
Improving Logging and Recovery Performance in Phoenix/App
Phoenix/App supports software components whose states are made persistent across a system crash via redo recovery, replaying logged interactions. Our initial prototype force logge...
Roger S. Barga, Shimin Chen, David B. Lomet
ISCA
2002
IEEE
115views Hardware» more  ISCA 2002»
15 years 2 months ago
ReVive: Cost-Effective Architectural Support for Rollback Recovery in Shared-Memory Multiprocessors
This paper presents ReVive, a novel general-purpose rollback recovery mechanism for shared-memory multiprocessors. ReVive carefully balances the conflicting requirements of avail...
Milos Prvulovic, Josep Torrellas, Zheng Zhang
DSN
2004
IEEE
15 years 1 months ago
Optimal Object State Transfer - Recovery Policies for Fault Tolerant Distributed Systems
Recent developments in the field of object-based fault tolerance and the advent of the first OMG FTCORBA compliant middleware raise new requirements for the design process of dist...
Panagiotis Katsaros, Constantine Lazos
SIGMOD
2011
ACM
171views Database» more  SIGMOD 2011»
14 years 16 days ago
BRRL: a recovery library for main-memory applications in the cloud
In this demonstration we present BRRL, a library for making distributed main-memory applications fault tolerant. BRRL is optimized for cloud applications with frequent points of c...
Tuan Cao, Benjamin Sowell, Marcos Antonio Vaz Sall...