Sciweavers

535 search results - page 17 / 107
» Fault tolerant high performance computing by a coding approa...
Sort
View
ASPLOS
2006
ACM
15 years 6 months ago
Dependable != unaffordable
This paper presents a software architecture for hardware fault tolerance based on loosely-synchronized, redundant virtual machines (LSRVM). LSRVM will provide high levels of relia...
Alan L. Cox, Kartik Mohanram, Scott Rixner
LCPC
2000
Springer
15 years 4 months ago
SmartApps: An Application Centric Approach to High Performance Computing
State-of-the-art run-time systems are a poor match to diverse, dynamic distributed applications because they are designed to provide support to a wide variety of applications, with...
Lawrence Rauchwerger, Nancy M. Amato, Josep Torrel...
116
Voted
CLUSTER
2004
IEEE
15 years 4 months ago
Improved message logging versus improved coordinated checkpointing for fault tolerant MPI
Fault tolerance is a very important concern for critical high performance applications using the MPI library. Several protocols provide automatic and transparent fault detection a...
Pierre Lemarinier, Aurelien Bouteiller, Thomas H&e...
ISPA
2004
Springer
15 years 5 months ago
Highly Reliable Linux HPC Clusters: Self-Awareness Approach
Abstract. Current solutions for fault-tolerance in HPC systems focus on dealing with the result of a failure. However, most are unable to handle runtime system configuration change...
Chokchai Leangsuksun, Tong Liu, Yudan Liu, Stephen...
SASO
2007
IEEE
15 years 6 months ago
e-SAFE: An Extensible, Secure and Fault Tolerant Storage System
With the rapidly falling price of hardware, and increasingly available bandwidth, the storage technology is seeing a paradigm shift from centralized and managed mode to distribute...
Sandip Agarwala, Arnab Paul, Umakishore Ramachandr...