: The transition from microelectronics to nanoelectronics reaches physical limits and results in a paradigm shift in the design and fabrication of electronic circuits. The conserva...
We propose a periodic diagnostic algorithm based on the testing model of computation for real-time systems. The diagnostic task runs on every processor of the system. When the task...
Fault tolerance is an important issue for large machines with tens or hundreds of thousands of processors. Checkpoint-based methods, currently used on most machines, rollback all ...
The probability that a failure will occur before the end of the computation increases as the number of processors used in a high performance computing application increases. For l...