We present a technique that masks failures in a cluster to provide high availability and fault-tolerance for long-running, parallelized dataflows. We can use these dataflows to im...
Mehul A. Shah, Joseph M. Hellerstein, Eric A. Brew...
Device and interconnect fabrics at the nanoscale will have a density of defects and susceptibility to transient faults far exceeding those of current silicon technologies. In this...
Andrey V. Zykov, Elias Mizan, Margarida F. Jacome,...
Virtualization provides the possibility of whole machine migration and thus enables a new form of fault tolerance that is completely transparent to applications and operating syst...
Evolutionary Algorithms, including Genetic Programming (GP), are frequently employed to solve difficult real-life problems, which can require up to days or months of computation. ...
The current multiprocessors such asCray T3D support interprocessor communication using partitioned dimension-order routers (PDRs). In a PDR implementation, the routing logic and sw...