No cache based techniques for roll-forward fault recovery exist at present. A split-cache approach is proposed that provides e cient support for checkpointing and roll-forward fau...
The Simultaneous Optical Multiprocessor Exchange Bus (SOME-Bus) is a low-latency, high-bandwidth interconnection network which directly links arbitrary pairs of processor nodes wit...
Recent developments in the field of object-based fault tolerance and the advent of the first OMG FTCORBA compliant middleware raise new requirements for the design process of dist...
As transistor dimensions continue to scale deep into the nanometer regime, silicon reliability is becoming a chief concern. At the same time, transistor counts are scaling up, ena...
Andrew DeOrio, Konstantinos Aisopos, Valeria Berta...
Fault tolerance is an important issue for large machines with tens or hundreds of thousands of processors. Checkpoint-based methods, currently used on most machines, rollback all ...