Sciweavers

28 search results - page 5 / 6
» Design and Implementation of a Pluggable Fault Tolerant CORB...
Sort
View
CCGRID
2008
IEEE
13 years 5 months ago
Fault Tolerance and Recovery of Scientific Workflows on Computational Grids
In this paper, we describe the design and implementation of two mechanisms for fault-tolerance and recovery for complex scientific workflows on computational grids. We present our ...
Gopi Kandaswamy, Anirban Mandal, Daniel A. Reed
EDCC
2008
Springer
13 years 7 months ago
A Distributed Approach to Autonomous Fault Treatment in Spread
This paper presents the design and implementation of the Distributed Autonomous Replication Management (DARM) framework built on top of the Spread group communication system. The ...
Hein Meling, Joakim L. Gilje
CCGRID
2006
IEEE
13 years 9 months ago
IPMI-based Efficient Notification Framework for Large Scale Cluster Computing
The demand for an efficient fault tolerance system has led to the development of complex monitoring infrastructure, which in turn has created an overwhelming task of data and even...
Chokchai Leangsuksun, Tirumala Rao, Anand Tikoteka...
PDSE
2000
116views more  PDSE 2000»
13 years 6 months ago
A CORBA-Based Architecture for Adding Dependability to Legacy Servers
This work presents a flexible, CORBA compliant Middle-Tier Server architecture which is capable of adding dependability (namely, reliability, availability, and performability) to ...
Domenico Cotroneo, Luigi Romano, Stefano Russo, Ni...
ASPLOS
2010
ACM
14 years 3 days ago
Power routing: dynamic power provisioning in the data center
Data center power infrastructure incurs massive capital costs, which typically exceed energy costs over the life of the facility. To squeeze maximum value from the infrastructure,...
Steven Pelley, David Meisner, Pooya Zandevakili, T...