In this paper we present an approach to the design optimization of faulttolerant embedded systems for safety-critical applications. Processes are statically scheduled and communic...
Viacheslav Izosimov, Paul Pop, Petru Eles, Zebo Pe...
This paper presents an experimental evaluation of the fault-tolerant communication (FTCOM) layer of the DECOS integrated architecture. The FTCOM layer implements different agreemen...
Jonny Vinter, Henrik Eriksson, Astrit Ademaj, Bern...
We present a technique that masks failures in a cluster to provide high availability and fault-tolerance for long-running, parallelized dataflows. We can use these dataflows to im...
Mehul A. Shah, Joseph M. Hellerstein, Eric A. Brew...
An increasing number of applications are being developed using distributed object computing (DOC) middleware, such as CORBA. Many of these applications require the underlying midd...
Aniruddha S. Gokhale, Balachandran Natarajan, Doug...
As high performance clusters continue to grow in size, the mean time between failure shrinks. Thus, the issues of fault tolerance and reliability are becoming one of the challengi...