Sciweavers

13 search results - page 2 / 3
» Scalable Fault Tolerant MPI: Extending the Recovery Algorith...
Sort
View
PVM
2010
Springer
13 years 3 months ago
Dodging the Cost of Unavoidable Memory Copies in Message Logging Protocols
Abstract. With the number of computing elements spiraling to hundred of thousands in modern HPC systems, failures are common events. Few applications are nevertheless fault toleran...
George Bosilca, Aurelien Bouteiller, Thomas H&eacu...
IPPS
1998
IEEE
13 years 9 months ago
Hyper Butterfly Network: A Scalable Optimally Fault Tolerant Architecture
Boundeddegreenetworks like deBruijn graphsor wrapped butterfly networks are very important from VLSI implementation point of view as well as for applications where the computing n...
Wei Shi, Pradip K. Srimani
ICS
2011
Tsinghua U.
12 years 8 months ago
High performance linpack benchmark: a fault tolerant implementation without checkpointing
The probability that a failure will occur before the end of the computation increases as the number of processors used in a high performance computing application increases. For l...
Teresa Davies, Christer Karlsson, Hui Liu, Chong D...
SC
2000
ACM
13 years 9 months ago
Scalable Fault-Tolerant Distributed Shared Memory
This paper shows how a state-of-the-art software distributed shared-memory (DSM) protocol can be efficiently extended to tolerate single-node failures. In particular, we extend a ...
Florin Sultan, Thu D. Nguyen, Liviu Iftode
AAIM
2009
Springer
172views Algorithms» more  AAIM 2009»
13 years 9 months ago
PLDA: Parallel Latent Dirichlet Allocation for Large-Scale Applications
Abstract. This paper presents PLDA, our parallel implementation of Latent Dirichlet Allocation on MPI and MapReduce. PLDA smooths out storage and computation bottlenecks and provid...
Yi Wang, Hongjie Bai, Matt Stanton, Wen-Yen Chen, ...