We propose a novel system-level error tolerance approach specifically targeted for multimedia compression algorithms. In particular we focus on the motion estimation process perf...
In the research reported in this paper, transient faults were injected in the nodes and in the communication subsystem (by using software fault injection) of a commercial parallel...
Safety-critical embedded systems often operate in harsh environmental conditions that necessitate fault-tolerant computing techniques. Many safety-critical systems also execute re...
With technology scaling, manufacture-time and in-field permanent faults are becoming a fundamental problem. Multi-core architectures with spares can tolerate them by detecting an...
Shuou Nomura, Matthew D. Sinclair, Chen-Han Ho, Ve...
The productivity of HPC system is determined not only by their performance, but also by their reliability. The conventional method to limit the impact of failures is checkpointing...