Search Sciweavers | Sciweavers

31 search results - page 3 / 7

» Failure Recovery for Distributed Processes in Single System ...

click to vote

ICPP
1999
IEEE

124views Distributed And Parallel Com...» more ICPP 1999»

Coherence-Centric Logging and Recovery for Home-Based Software Distributed Shared Memory

13 years 9 months ago

Download www.cacs.louisiana.edu

The probability of failures in software distributed shared memory (SDSM) increases as the system size grows. This paper introduces a new, efficient message logging technique, call...

Angkul Kongmunvattana, Nian-Feng Tzeng

claim paper

Read More »

click to vote

PVM
2005
Springer

78views Distributed And Parallel Com...» more PVM 2005»

Scalable Fault Tolerant MPI: Extending the Recovery Algorithm

13 years 10 months ago

Download icl.cs.utk.edu

ct Fault Tolerant MPI (FT-MPI)[6] was designed as a solution to allow applications diﬀerent methods to handle process failures beyond simple check-point restart schemes. The init...

Graham E. Fagg, Thara Angskun, George Bosilca, Jel...

claim paper

Read More »

click to vote

CCGRID
2006
IEEE

131views Distributed And Parallel Com...» more CCGRID 2006»

Proposal of MPI Operation Level Checkpoint/Rollback and One Implementation

13 years 11 months ago

Download icl.cs.utk.edu

With the increasing number of processors in modern HPC(High Performance Computing) systems, there are two emergent problems to solve. One is scalability, the other is fault tolera...

Yuan Tang, Graham E. Fagg, Jack Dongarra

claim paper

Read More »

click to vote

NOMS
2002
IEEE

133views Communications» more NOMS 2002»

Highly available and efficient load cluster management system using SNMP and Web

13 years 10 months ago

Download dpnm.postech.ac.kr

To cope with the explosive increase in the number of requests to Internet server systems, one popular solution is a load-balancing technique that uses a dispatcher in the front-en...

Myung-Sup Kim, Mi-Jeong Choi, James W. Hong

claim paper

Read More »

click to vote

FAST
2007

127views Operating System» more FAST 2007»

Disk Failures in the Real World: What Does an MTTF of 1, 000, 000 Hours Mean to You?

13 years 6 months ago

Download www.cs.cmu.edu

Component failure in large-scale IT installations is becoming an ever larger problem as the number of components in a single cluster approaches a million. In this paper, we presen...

Bianca Schroeder, Garth A. Gibson

claim paper

Read More »

« Prev « First page 3 / 7 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers