Sciweavers

695 search results - page 103 / 139
» Cache based fault recovery for distributed systems
Sort
View
IPPS
1998
IEEE
15 years 1 months ago
Design and Implementation of a Parallel I/O Runtime System for Irregular Applications
In this paper we present the design, implementation and evaluation of a runtime system based on collective I/O techniques for irregular applications. We present two models, namely...
Jaechun No, Sung-Soon Park, Jesús Carretero...
NOMS
2010
IEEE
201views Communications» more  NOMS 2010»
14 years 7 months ago
Checkpoint-based fault-tolerant infrastructure for virtualized service providers
Crash and omission failures are common in service providers: a disk can break down or a link can fail anytime. In addition, the probability of a node failure increases with the num...
Iñigo Goiri, Ferran Julià, Jordi Gui...
ICPP
2007
IEEE
15 years 4 months ago
A Meta-Learning Failure Predictor for Blue Gene/L Systems
The demand for more computational power in science and engineering has spurred the design and deployment of ever-growing cluster systems. Even though the individual components use...
Prashasta Gujrati, Yawei Li, Zhiling Lan, Rajeev T...
CIIT
2004
104views Communications» more  CIIT 2004»
14 years 11 months ago
Semi-automatic compensation of the propagation delay in fault-tolerant systems
In control systems the jitter is a major problem since in a time-varying system the theoretical results for analysis and design of time-invariant systems cannot be used directly. ...
Thomas Losert, Wilfried Elmenreich, Martin Schlage...
HPDC
2006
IEEE
15 years 3 months ago
Resource Availability Prediction in Fine-Grained Cycle Sharing Systems
Fine-Grained Cycle Sharing (FGCS) systems aim at utilizing the large amount of computational resources available on the Internet. In FGCS, host computers allow guest jobs to utili...
Xiaojuan Ren, Seyong Lee, Rudolf Eigenmann, Saurab...