Abstract. Grids reliability remains an order of magnitude below clusters on production infrastructures. This work is aimsed at improving grid application performances by improving ...
Diane Lingrand, Johan Montagnat, Janusz Martyniak,...
Commodity computer clusters are often composed of hundreds of computing nodes. These generally off-the-shelf systems are not designed for high reliability. Node failures therefore...
The Solaris 10 Operating System includes a number of new features for predictive self-healing. One such feature is the ability of the Fault Management software to diagnose memory ...
Dong Tang, Peter Carruthers, Zuheir Totari, Michae...
As most computer systems are expected to remain operational 24 hours a day, 7 days a week, they must complete maintenance work while in operation. This work is in addition to the ...
Qi Zhang, Ningfang Mi, Evgenia Smirni, Alma Riska,...
All-to-all broadcast is one of the common collective operations that involve dense communication between all processes in a parallel program. Previously, programmable Network Inte...