Sciweavers

695 search results - page 65 / 139
» Cache based fault recovery for distributed systems
Sort
View
SRDS
2008
IEEE
15 years 4 months ago
Dynamically Quantifying and Improving the Reliability of Distributed Storage Systems
In this paper, we argue that the reliability of large-scale storage systems can be significantly improved by using better reliability metrics and more efficient policies for rec...
Rekha Bachwani, Leszek Gryz, Ricardo Bianchini, Ce...
IPPS
2006
IEEE
15 years 3 months ago
Coordinated checkpoint from message payload in pessimistic sender-based message logging
Execution of MPI applications on Clusters and Grid deployments suffers from node and network failure that motivates the use of fault tolerant MPI implementations. Two category tec...
M. Aminian, Mohammad K. Akbari, Bahman Javadi
TPDS
1998
135views more  TPDS 1998»
14 years 9 months ago
On Coordinated Checkpointing in Distributed Systems
—Coordinated checkpointing simplifies failure recovery and eliminates domino effects in case of failures by preserving a consistent global checkpoint on stable storage. However, ...
Guohong Cao, Mukesh Singhal
COMPSAC
2002
IEEE
15 years 2 months ago
On Bootstrapping Replicated CORBA Applications
Critical components of a distributed system must be replicated to achieve high availability and fault tolerance. Current faulttolerant CORBA infrastructures have concentrated on m...
Wenbing Zhao, Louise E. Moser, P. M. Melliar-Smith
SPAA
1998
ACM
15 years 2 months ago
Lamport Clocks: Verifying a Directory Cache-Coherence Protocol
Modern shared-memory multiprocessors use complex memory system implementations that include a variety of non-trivial and interacting optimizations. More time is spent in verifying...
Manoj Plakal, Daniel J. Sorin, Anne Condon, Mark D...