Sciweavers

285 search results - page 28 / 57
» Improving the reliability of commodity operating systems
Sort
View
JSSPP
2009
Springer
15 years 4 months ago
Analyzing the EGEE Production Grid Workload: Application to Jobs Submission Optimization
Abstract. Grids reliability remains an order of magnitude below clusters on production infrastructures. This work is aimsed at improving grid application performances by improving ...
Diane Lingrand, Johan Montagnat, Janusz Martyniak,...
IPPS
2005
IEEE
15 years 3 months ago
Fault-Tolerant Parallel Applications with Dynamic Parallel Schedules
Commodity computer clusters are often composed of hundreds of computing nodes. These generally off-the-shelf systems are not designed for high reliability. Node failures therefore...
Sebastian Gerlach, Roger D. Hersch
DSN
2006
IEEE
15 years 3 months ago
Assessment of the Effect of Memory Page Retirement on System RAS Against Hardware Faults
The Solaris 10 Operating System includes a number of new features for predictive self-healing. One such feature is the ability of the Fault Management software to diagnose memory ...
Dong Tang, Peter Carruthers, Zuheir Totari, Michae...
DSN
2006
IEEE
15 years 3 months ago
Evaluating the Performability of Systems with Background Jobs
As most computer systems are expected to remain operational 24 hours a day, 7 days a week, they must complete maintenance work while in operation. This work is in addition to the ...
Qi Zhang, Ningfang Mi, Evgenia Smirni, Alma Riska,...
CLUSTER
2004
IEEE
15 years 1 months ago
Scalable, high-performance NIC-based all-to-all broadcast over Myrinet/GM
All-to-all broadcast is one of the common collective operations that involve dense communication between all processes in a parallel program. Previously, programmable Network Inte...
Weikuan Yu, Dhabaleswar K. Panda, Darius Buntinas