Sciweavers

264 search results - page 26 / 53
» Bounding the number of tolerable faults in majority-based sy...
Sort
View
119
Voted
ICDCS
2012
IEEE
12 years 12 months ago
Combining Partial Redundancy and Checkpointing for HPC
Today’s largest High Performance Computing (HPC) systems exceed one Petaflops (1015 floating point operations per second) and exascale systems are projected within seven years...
James Elliott, Kishor Kharbas, David Fiala, Frank ...
SOSP
2005
ACM
15 years 6 months ago
Fault-scalable Byzantine fault-tolerant services
A fault-scalable service can be configured to tolerate increasing numbers of faults without significant decreases in performance. The Query/Update (Q/U) protocol is a new tool t...
Michael Abd-El-Malek, Gregory R. Ganger, Garth R. ...
WDAG
2007
Springer
100views Algorithms» more  WDAG 2007»
15 years 3 months ago
Bounded Wait-Free Implementation of Optimally Resilient Byzantine Storage Without (Unproven) Cryptographic Assumptions
We present the first optimally resilient, bounded, wait-free implementation of a distributed atomic register, tolerating Byzantine readers and (up to one-third of) Byzantine serve...
Amitanand S. Aiyer, Lorenzo Alvisi, Rida A. Bazzi
DATE
2008
IEEE
104views Hardware» more  DATE 2008»
15 years 3 months ago
Multi-Vector Tests: A Path to Perfect Error-Rate Testing
The importance of testing approaches that exploit error tolerance to improve yield has previously been established. Error rate, defined as the percentage of vectors for which the...
Shideh Shahidi, Sandeep Gupta
PVM
2010
Springer
14 years 7 months ago
Dodging the Cost of Unavoidable Memory Copies in Message Logging Protocols
Abstract. With the number of computing elements spiraling to hundred of thousands in modern HPC systems, failures are common events. Few applications are nevertheless fault toleran...
George Bosilca, Aurelien Bouteiller, Thomas H&eacu...