We present SGuard, a new fault-tolerance technique for distributed stream processing engines (SPEs) running in clusters of commodity servers. SGuard is less disruptive to normal s...
YongChul Kwon, Magdalena Balazinska, Albert G. Gre...
Pangaea is a wide-area file system that supports data sharing among a community of widely distributed users. It is built on a symmetrically decentralized infrastructure that consi...
Yasushi Saito, Christos T. Karamanolis, Magnus Kar...
Replicated file-systems can experience degraded performance that might not be adequately handled by the underlying fault-tolerant protocols. We describe the design and implementa...
Supporting high availability by checkpointing and switching to a backup upon failure of a primary has a cost. Trade-off studies help system architects to decide whether higher ava...
In this paper we present results from a six-month empirical study of the high availability aspectsof the CodaFile System. We reporton the servicefailures experienced by Coda clien...