Search Sciweavers | Sciweavers

1024 search results - page 136 / 205

» Fault Tolerance in Decentralized Systems

click to vote

MIDDLEWARE
2009
Springer

139views Distributed And Parallel Com...» more MIDDLEWARE 2009»

Why Do Upgrades Fail and What Can We Do about It?

15 years 4 months ago

Download www.ece.cmu.edu

Abstract. Enterprise-system upgrades are unreliable and often produce downtime or data-loss. Errors in the upgrade procedure, such as broken dependencies, constitute the leading ca...

Tudor Dumitras, Priya Narasimhan

claim paper

Read More »

111

click to vote

ISCA
2002
IEEE

115views Hardware» more ISCA 2002»

SafetyNet: Improving the Availability of Shared Memory Multiprocessors with Global Checkpoint/Recovery

15 years 2 months ago

Download www.cs.wisc.edu

We develop an availability solution, called SafetyNet, that uses a uniﬁed, lightweight checkpoint/recovery mechanism to support multiple long-latency fault detection schemes. At...

Daniel J. Sorin, Milo M. K. Martin, Mark D. Hill, ...

claim paper

Read More »

113

click to vote

HPDC
2009
IEEE

101views Distributed And Parallel Com...» more HPDC 2009»

Interconnect agnostic checkpoint/restart in open MPI

15 years 4 months ago

Download www.osl.iu.edu

Long running High Performance Computing (HPC) applications at scale must be able to tolerate inevitable faults if they are to harness current and future HPC systems. Message Passi...

Joshua Hursey, Timothy Mattox, Andrew Lumsdaine

claim paper

Read More »

click to vote

IPPS
2006
IEEE

106views Distributed And Parallel Com...» more IPPS 2006»

Coordinated checkpoint from message payload in pessimistic sender-based message logging

15 years 3 months ago

Download www.cecs.uci.edu

Execution of MPI applications on Clusters and Grid deployments suffers from node and network failure that motivates the use of fault tolerant MPI implementations. Two category tec...

M. Aminian, Mohammad K. Akbari, Bahman Javadi

claim paper

Read More »

110

click to vote

ISORC
2003
IEEE

167views Distributed And Parallel Com...» more ISORC 2003»

A Dynamic Shadow Approach for Mobile Agents to Survive Crash Failures

15 years 3 months ago

Download www.comp.leeds.ac.uk

Fault tolerance schemes for mobile agents to survive agent server crash failures are complex since developers normally have no control over remote agent servers. Some solutions mo...

Simon Pears, Jie Xu, Cornelia Boldyreff

claim paper

Read More »

« Prev « First page 136 / 205 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers