We propose an approach for analyzing software architectures with respect to reliability to improve fault tolerance. The approach defines a failure scenario model that is based on ...
This paper explores the challenges associated with distributed application management in large-scale computing environments. In particular, we investigate several techniques for e...
Nikolay Topilski, Jeannie R. Albrecht, Amin Vahdat
An increasing number of mission-critical, embedded, telecommunications, and financial distributed systems are being developed using distributed object computing middleware, such a...
Balachandran Natarajan, Aniruddha S. Gokhale, Shal...
This work addresses the issue of design optimization for faulttolerant hard real-time systems. In particular, our focus is on the handling of transient faults using both checkpoin...
Petru Eles, Viacheslav Izosimov, Paul Pop, Zebo Pe...
Abstract. Fault Tolerance is an increasing challenge for integrated circuits due to semiconductor technology scaling. This paper looks at how artificial evolution may be tuned to ...