In this paper, we describe a scheme for tolerating and recovering from mid-query faults in a distributed shared nothing database. Rather than aborting and restarting queries, our s...
Christopher Yang, Christine Yen, Ceryen Tan, Samue...
Assigning an application’s fault-tolerance properties (e.g., replication style, checkpointing frequency) statically, and in an arbitrary manner, can lead to the application not ...
Clusters and distributed systems offer fault tolerance and high performance through load sharing. When all n computers are up and running, we would like the load to be evenly distr...
We present a scalable garbage collection scheme for systems that store objects at multiple servers while clients run transactions on locally cached copies of objects. It is the fi...
Replication is a well-know fault-tolerance technique, and several replication strategies exist (e.g. active, passive, and semi-active replication). To be used in hard real-time sy...