In this paper, we describe the design and implementation of two mechanisms for fault-tolerance and recovery for complex scientific workflows on computational grids. We present our ...
This paper presents the design and implementation of the Distributed Autonomous Replication Management (DARM) framework built on top of the Spread group communication system. The ...
The demand for an efficient fault tolerance system has led to the development of complex monitoring infrastructure, which in turn has created an overwhelming task of data and even...
This work presents a flexible, CORBA compliant Middle-Tier Server architecture which is capable of adding dependability (namely, reliability, availability, and performability) to ...
Domenico Cotroneo, Luigi Romano, Stefano Russo, Ni...
Data center power infrastructure incurs massive capital costs, which typically exceed energy costs over the life of the facility. To squeeze maximum value from the infrastructure,...
Steven Pelley, David Meisner, Pooya Zandevakili, T...