Fault tolerance is an important issue for large machines with tens or hundreds of thousands of processors. Checkpoint-based methods, currently used on most machines, rollback all ...
The probability that a failure will occur before the end of the computation increases as the number of processors used in a high performance computing application increases. For l...
Interference is an unavoidable property of the wireless communication medium and, in sensor networks, such interference is exacerbated due to the energy-starved nature of the netw...
We propose a built-in self-test (BIST) procedure for nanofabrics implemented using chemically assembled electronic nanotechnology. Several fault detection configurations are prese...
—Computing systems will grow significantly larger in the near future to satisfy the needs of computational scientists in areas like climate modeling, biophysics and cosmology. S...