Group communications are commonly used in parallel and distributed environment. However, existing migration mechanisms do not support group communications. This weakness prevents ...
— We consider the problem of reliable broadcast in an infinite grid (or finite toroidal) radio network under Byzantine and crash-stop failures. We present bounds on the maximum...
This paper describes the use of fault tolerance in a multiagent system. Such an approach is based on the modeling of autonomous agents with planning capabilities. These capabiliti...
Application run-time information is a fundamental component in application and job scheduling. However, accurate predictions of run times are difficult to achieve for parallel app...
We describe and test a software approach to overcoming radiation-induced errors in spaceborne applications running on commercial off-the-shelf components. The approach uses checks...