Sciweavers

1113 search results - page 46 / 223
» Performance under Failures of DAG-based Parallel Computing
Sort
View
PODC
1997
ACM
15 years 6 months ago
The Load and Availability of Byzantine Quorum Systems
Replicated services accessed via quorums enable each access to be performed at only a subset (quorum) of the servers and achieve consistency across accesses by requiring any two qu...
Dahlia Malkhi, Michael K. Reiter, Avishai Wool
166
Voted
ICDCS
2012
IEEE
13 years 4 months ago
Combining Partial Redundancy and Checkpointing for HPC
Today’s largest High Performance Computing (HPC) systems exceed one Petaflops (1015 floating point operations per second) and exascale systems are projected within seven years...
James Elliott, Kishor Kharbas, David Fiala, Frank ...
CCGRID
2008
IEEE
15 years 2 months ago
Fault Tolerance and Recovery of Scientific Workflows on Computational Grids
In this paper, we describe the design and implementation of two mechanisms for fault-tolerance and recovery for complex scientific workflows on computational grids. We present our ...
Gopi Kandaswamy, Anirban Mandal, Daniel A. Reed
JPDC
2006
120views more  JPDC 2006»
15 years 1 months ago
HeteroMPI: Towards a message-passing library for heterogeneous networks of computers
The paper presents Heterogeneous MPI (HeteroMPI), an extension of MPI for programming high-performance computations on heterogeneous networks of computers. It allows the applicati...
Alexey L. Lastovetsky, Ravi Reddy
99
Voted
NCA
2006
IEEE
15 years 7 months ago
A Primary-Backup Protocol for In-Memory Database Replication
The paper presents a primary-backup protocol to manage replicated in-memory database systems (IMDBs). The protocol exploits two features of IMDBs: coarse-grain concurrency control...
Lásaro J. Camargos, Fernando Pedone, Rodrig...