Exception handling is a powerful abstraction that can be used to help manage errors and support the construction of reliable operating systems. Using exceptions to notify system co...
Francis M. David, Jeffrey C. Carlyle, Ellick Chan,...
—Soft errors (or Transient faults) are temporary faults that arise in a circuit due to a variety of internal noise and external sources such as cosmic particle hits. Though soft ...
Avi Timor, Avi Mendelson, Yitzhak Birk, Neeraj Sur...
Abstract— In this work, we demonstrate the power of providing a common set of operating system services to Grid Architectures, including high-performance I/O, communication, reso...
Abstract— This work concerns metrics for evaluating microarchitectural enhancements to improve processor lifetime reliability. A commonly reported reliability metric is mean time...
Pradeep Ramachandran, Sarita V. Adve, Pradip Bose,...
Recent advances in commodity network interface technology enable scientists and engineers to build clusters of workstations or PCs to execute parallel applications. However, raw-h...