Active storage clouds are an attractive platform for executing large data intensive workloads found in many fields of science. However, active storage presents new system managem...
The productivity of HPC system is determined not only by their performance, but also by their reliability. The conventional method to limit the impact of failures is checkpointing...
To assure dependable onboard evolution, we have developed a methodology called guarded software upgrading (GSU). In this paper, we focus on a low-cost approach to error containmen...
Ann T. Tai, Kam S. Tso, Leon Alkalai, Savio N. Cha...
While NAND flash memories have rapidly increased in both capacity and performance and are increasingly used as a storage device in many embedded systems, their reliability has de...
Reliability is a major requirement for most safety-related systems. To meet this requirement, fault-tolerant techniques such as hardware replication and software re-execution are ...
Jia Huang, Jan Olaf Blech, Andreas Raabe, Christia...