Parallel dataflow programs generate enormous amounts of distributed data that are short-lived, yet are critical for completion of the job and for good run-time performance. We ca...
Steven Y. Ko, Imranul Hoque, Brian Cho, Indranil G...
Distributed storage systems employ replicas or erasure code to ensure high reliability and availability of data. Such replicas create great amount of network traffic that negative...
Crash and omission failures are common in service providers: a disk can break down or a link can fail anytime. In addition, the probability of a node failure increases with the num...
The Cooperative File System (CFS) is a new peer-to-peer readonly storage system that provides provable guarantees for the efficiency, robustness, and load-balance of file storag...
Frank Dabek, M. Frans Kaashoek, David R. Karger, R...
This paper describes a new method for providingtransparent fault tolerance for parallel applications on a network of workstations. We have designed our method in the context of sh...