Sciweavers

91 search results - page 3 / 19
» Whither Generic Recovery from Application Faults
Sort
View
IPPS
2007
IEEE
13 years 11 months ago
A Fault Tolerance Protocol with Fast Fault Recovery
Fault tolerance is an important issue for large machines with tens or hundreds of thousands of processors. Checkpoint-based methods, currently used on most machines, rollback all ...
Sayantan Chakravorty, Laxmikant V. Kalé
ATAL
2009
Springer
13 years 12 months ago
Combining fault injection and model checking to verify fault tolerance in multi-agent systems
The ability to guarantee that a system will continue to operate correctly under degraded conditions is key to the success of adopting multi-agent systems (MAS) as a paradigm for d...
Jonathan Ezekiel, Alessio Lomuscio
CLUSTER
2006
IEEE
13 years 5 months ago
Autonomous recovery in componentized Internet applications
In this paper we show how to reduce downtime of J2EE applications by rapidly and automatically recovering from transient and intermittent software failures, without requiring appl...
George Candea, Emre Kiciman, Shinichi Kawamoto, Ar...
PVM
2010
Springer
13 years 3 months ago
Dodging the Cost of Unavoidable Memory Copies in Message Logging Protocols
Abstract. With the number of computing elements spiraling to hundred of thousands in modern HPC systems, failures are common events. Few applications are nevertheless fault toleran...
George Bosilca, Aurelien Bouteiller, Thomas H&eacu...
ENTCS
2002
115views more  ENTCS 2002»
13 years 5 months ago
Component-Based Applications: A Dynamic Reconfiguration Approach with Fault Tolerance Support
This paper presents a mechanism for dynamic reconfiguration of component-based applications and its fault tolerance strategy. The mechanism, named generic connector, allows compos...
Thaís Vasconcelos Batista, Milano Gadelha C...