This paper presents scalable algorithms for recovery and page coherency in multicomputer object stores. Recovery and coherency are central to object store engineering and distribu...
Stephen M. Blackburn, Robin B. Stanton, Stephan J....
Abstract--Parallel programming is hard, because it is impractical to test all possible thread interleavings. One promising approach to improve a multi-threaded program's relia...
Code Compression has been shown to be efficient in minimizing the memory requirements for embedded systems as well as in power consumption reduction and performance improvement. I...
This paper presents a comprehensive statistical analysis of workloads collected on data-intensive clusters and Grids. The analysis is conducted at different levels, including Virt...
We develop an availability solution, called SafetyNet, that uses a unified, lightweight checkpoint/recovery mechanism to support multiple long-latency fault detection schemes. At...
Daniel J. Sorin, Milo M. K. Martin, Mark D. Hill, ...