This paper introduces dual-quorum replication, a novel data replication algorithm designed to support Internet edge services. Edge services allow clients to access Internet service...
Lei Gao, Michael Dahlin, Jiandan Zheng, Lorenzo Al...
As computational clusters increase in size, their mean-time-to-failure reduces. Typically checkpointing is used to minimize the loss of computation. Most checkpointing techniques, ...
In high energy physics, bioinformatics, and other disciplines, we encounter applications involving numerous, loosely coupled jobs that both access and generate large data sets. So...
This paper describes the design of an object replication scheme for the Arjuna distributed system. ThedesignsupportsK-resiliency,where,intheabsenceofnetworkpartitions,Koutofa tota...
Most of today‘s HPC systems employ a single head node for control, which represents a single point of failure as it interrupts an entire HPC system upon failure. Furthermore, it...
Kai Uhlemann, Christian Engelmann, Stephen L. Scot...