The predicted shift to non-volatile, byte-addressable memory (e.g., Phase Change Memory and Memristor), the growth of “big data”, and the subsequent emergence of frameworks su...
Software failures in server applications are a significant problem for preserving system availability. We present ASSURE, a system that introduces rescue points that recover softw...
Stelios Sidiroglou, Oren Laadan, Carlos Perez, Nic...
Reliability and availability are increasingly important in large-scale storage systems built from thousands of individual storage devices. Large systems must survive the failure o...
Qin Xin, Ethan L. Miller, Thomas J. E. Schwarz, Da...
Modern scientific experiments can generate hundreds of gigabytes to terabytes or even petabytes of data that may furthermore be maintained in large numbers of relatively small fil...
Wantao Liu, Brian Tieman, Rajkumar Kettimuthu, Ian...
Supporting uninterrupted services for distributed soft real-time applications is hard in resource-constrained and dynamic environments, where processor or process failures and sys...