Group communications are commonly used in parallel and distributed environment. However, existing migration mechanisms do not support group communications. This weakness prevents ...
Abstract. Current solutions for fault-tolerance in HPC systems focus on dealing with the result of a failure. However, most are unable to handle runtime system configuration change...
As technology scales and the energy of computation continually approaches thermal equilibrium [1,2], parameter variations and noise levels will lead to larger error rates at vario...
Abstract. Achieving higher levels of dependability is a goal in any software project, therefore strategies for software reliability improvement are very attractive. This work intro...
Daniel O. Bortolas, Avelino F. Zorzo, Eduardo A. B...
This paper presents an efficient technique for placement and routing of sensors/actuators and processing units in a grid network. Our system requires an extremely high level of ro...