Sciweavers

VEE
2012
ACM
215views Virtualization» more  VEE 2012»
12 years 16 days ago
SecondSite: disaster tolerance as a service
This paper describes the design and implementation of SecondSite, a cloud-based service for disaster tolerance. SecondSite extends the Remus virtualization-based high availability...
Shriram Rajagopalan, Brendan Cully, Ryan O'Connor,...
CLUSTER
1999
IEEE
13 years 4 months ago
Simulative performance analysis of gossip failure detection for scalable distributed systems
Three protocols for gossip-based failure detection services in large-scale heterogeneous clusters are analyzed and compared. The basic gossip protocol provides a means by which fai...
Mark W. Burns, Alan D. George, Bradley A. Wallace
CACM
1999
92views more  CACM 1999»
13 years 4 months ago
Putting OO Distributed Programming to Work
stractions underlying distributed computing. We attempted to keep our preaims at an abstract and general level. In this column, we make those claims more concrete. More precisely, ...
Pascal Felber, Rachid Guerraoui, Mohamed Fayad
SOQUA
2007
13 years 6 months ago
An approach to detecting failures automatically
Failure detection is a difficult and often expensive task. The principle of self-healing addresses this cost issue, but poses new research questions. This work focuses on detectin...
Jochen Wuttke
DSN
2004
IEEE
13 years 8 months ago
Cluster-Based Failure Detection Service for Large-Scale Ad Hoc Wireless Network Applications
The growing interest in ad hoc wireless network applications that are made of large and dense populations of lightweight system resources calls for scalable approaches to fault to...
Ann T. Tai, Kam S. Tso, William H. Sanders
WETICE
1999
IEEE
13 years 9 months ago
A Hierarchical Proxy Architecture for Internet-Scale Event Services
The rapid growth of the Web has made it possible to build collaborative applications on an unprecedented scale. However, the request-reply interaction model of HTTP limits the rang...
Haobo Yu, Deborah Estrin, Ramesh Govindan
DSN
2003
IEEE
13 years 10 months ago
Node Failure Detection and Membership in CANELy
Fault-tolerant distributed systems based on fieldbuses may benefit to a great extent from the availabilityof semantically rich communication services,such as those provided by g...
José Rufino, Paulo Veríssimo, Guilhe...
KDD
2005
ACM
178views Data Mining» more  KDD 2005»
13 years 10 months ago
Failure detection and localization in component based systems by online tracking
The increasing complexity of today’s systems makes fast and accurate failure detection essential for their use in mission-critical applications. Various monitoring methods provi...
Haifeng Chen, Guofei Jiang, Cristian Ungureanu, Ke...
SSS
2007
Springer
130views Control Systems» more  SSS 2007»
13 years 11 months ago
Secure Failure Detection in TrustedPals
We present a modular redesign of TrustedPals, a smartcard-based security framework for solving secure multiparty computation (SMC)[?]. TrustedPals allows to reduce SMC to the probl...
Roberto Cortiñas, Felix C. Freiling, Marjan...
GPC
2007
Springer
13 years 11 months ago
Fault Management in P2P-MPI
We present in this paper the recent developments done in P2P-MPI, a grid middleware, concerning the fault management, which covers fault-tolerance for applications and fault detect...
Stéphane Genaud, Choopan Rattanapoka