This paper presents a new conceptual model, the XBWModel. Distributed computing is becoming a cost effective way to implement safety critical control systems. To support the devel...
Abstract Until now, the analysis of fault tolerance of peerto-peer systems usually only covers random faults of some kind. Contrary to traditional algorithmic research, faults as w...
Mega grids span several continents and may consist of millions of nodes and billions of tasks executing at any point in time. This setup calls for scalable and highly available re...
Soft error tolerant design becomes more crucial due to exponential increase in the vulnerability of computer systems to soft errors. Accurate estimation of soft error rate (SER), ...
With the increasing number of processors in modern HPC(High Performance Computing) systems, there are two emergent problems to solve. One is scalability, the other is fault tolera...