Sciweavers

37 search results - page 5 / 8
» High performance linpack benchmark: a fault tolerant impleme...
Sort
View
CCGRID
2006
IEEE
13 years 11 months ago
Exploit Failure Prediction for Adaptive Fault-Tolerance in Cluster Computing
As the scale of cluster computing grows, it is becoming hard for long-running applications to complete without facing failures on large-scale clusters. To address this issue, chec...
Yawei Li, Zhiling Lan
ASPLOS
2006
ACM
13 years 11 months ago
Understanding prediction-based partial redundant threading for low-overhead, high- coverage fault tolerance
Redundant threading architectures duplicate all instructions to detect and possibly recover from transient faults. Several lighter weight Partial Redundant Threading (PRT) archite...
Vimal K. Reddy, Eric Rotenberg, Sailashri Parthasa...
NSDI
2010
13 years 6 months ago
Prophecy: Using History for High-Throughput Fault Tolerance
Byzantine fault-tolerant (BFT) replication has enjoyed a series of performance improvements, but remains costly due to its replicated work. We eliminate this cost for read-mostly ...
Siddhartha Sen, Wyatt Lloyd, Michael J. Freedman
HPDC
2009
IEEE
14 years 1 days ago
Interconnect agnostic checkpoint/restart in open MPI
Long running High Performance Computing (HPC) applications at scale must be able to tolerate inevitable faults if they are to harness current and future HPC systems. Message Passi...
Joshua Hursey, Timothy Mattox, Andrew Lumsdaine
SOSP
2007
ACM
14 years 2 months ago
Tolerating byzantine faults in transaction processing systems using commit barrier scheduling
This paper describes the design, implementation, and evaluation of a replication scheme to handle Byzantine faults in transaction processing database systems. The scheme compares ...
Ben Vandiver, Hari Balakrishnan, Barbara Liskov, S...