Transient faults are single-shot hardware errors caused by high energy particles from space, manufacturing defects, overheating, and other sources. Such faults can be devastating f...
Developing large-scale distributed real-time and embedded (DRE) systems is hard in part due to complex deployment and configuration issues involved in satisfying multiple quality f...
Jaiganesh Balasubramanian, Aniruddha S. Gokhale, A...
Fault tolerance is an important aspect in real-time computing. In real-time control systems, tasks could be faulty due to various reasons. Faulty tasks may compromise the performa...
Xue Liu, Hui Ding, Kihwal Lee, Qixin Wang, Lui Sha
To improve performance and reduce power, processor designers employ advances that shrink feature sizes, lower voltage levels, reduce noise margins, and increase clock rates. Howev...
George A. Reis, Jonathan Chang, Neil Vachharajani,...
We observe increasing interest in aggregating geographically distributed, heterogeneous resources to perform large scale computations. MPI remains the most popular programming par...