Sciweavers

392 search results - page 3 / 79
» Fault Tolerance in a DSM Cluster Operating System
Sort
View
CCGRID
2003
IEEE
13 years 10 months ago
Improved Read Performance in a Cost-Effective, Fault-Tolerant Parallel Virtual File System (CEFT-PVFS)
Due to the ever-widening performance gap between processors and disks, I/O operations tend to become the major performance bottleneck of data-intensive applications on modern clus...
Yifeng Zhu, Hong Jiang, Xiao Qin, Dan Feng, David ...
CCGRID
2006
IEEE
13 years 11 months ago
Proposal of MPI Operation Level Checkpoint/Rollback and One Implementation
With the increasing number of processors in modern HPC(High Performance Computing) systems, there are two emergent problems to solve. One is scalability, the other is fault tolera...
Yuan Tang, Graham E. Fagg, Jack Dongarra
CLUSTER
2003
IEEE
13 years 10 months ago
Coordinated Checkpoint versus Message Log for Fault Tolerant MPI
— Large Clusters, high availability clusters and Grid deployments often suffer from network, node or operating system faults and thus require the use of fault tolerant programmin...
Aurelien Bouteiller, Pierre Lemarinier, Gér...
SOSP
2009
ACM
14 years 2 months ago
Upright cluster services
The UpRight library seeks to make Byzantine fault tolerance (BFT) a simple and viable alternative to crash fault tolerance for a range of cluster services. We demonstrate UpRight ...
Allen Clement, Manos Kapritsos, Sangmin Lee, Yang ...
CLUSTER
2002
IEEE
13 years 10 months ago
Design and Validation of Portable Communication Infrastructure for Fault-Tolerant Cluster Middleware
We describe the communication infrastructure (CI) for our fault-tolerant cluster middleware, which is optimized for two classes of communication: for the applications and for the ...
Ming Li, Wenchao Tao, Daniel Goldberg, Israel Hsu,...