Sciweavers

19 search results - page 1 / 4
» Reliable Cluster Computing with a New Checkpointing RAID-x A...
Sort
View
HCW
2000
IEEE
13 years 9 months ago
Reliable Cluster Computing with a New Checkpointing RAID-x Architecture
In a serverless cluster of PCs or workstations, the cluster must allow remote file accesses or parallel I/O directly performed over disks distributed to all client nodes. We intro...
Kai Hwang, Hai Jin, Roy S. C. Ho, Wonwoo Ro
HPDC
2000
IEEE
13 years 9 months ago
RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing
A new RAID-x (redundant array of inexpensive disks at level x) architecture is presented for distributed I/O processing on a serverless cluster of computers. The RAID-x architectu...
Kai Hwang, Hai Jin, Roy S. C. Ho
SC
2009
ACM
13 years 11 months ago
FALCON: a system for reliable checkpoint recovery in shared grid environments
In Fine-Grained Cycle Sharing (FGCS) systems, machine owners voluntarily share their unused CPU cycles with guest jobs, as long as the performance degradation is tolerable. For gu...
Tanzima Zerin Islam, Saurabh Bagchi, Rudolf Eigenm...
IPPS
2005
IEEE
13 years 10 months ago
Performance Implications of Periodic Checkpointing on Large-Scale Cluster Systems
Large-scale systems like BlueGene/L are susceptible to a number of software and hardware failures that can affect system performance. Periodic application checkpointing is a commo...
Adam J. Oliner, Ramendra K. Sahoo, José E. ...
GRID
2004
Springer
13 years 10 months ago
Checkpoint and Restart for Distributed Components in XCAT3
With the advent of Grid computing, more and more highend computational resources become available for use to a scientist. While this opens up new avenues for scientific research,...
Sriram Krishnan, Dennis Gannon