Sciweavers

76 search results - page 9 / 16
» Fault Tolerant Multi-Agent Systems: its communication and co...
Sort
View
EDCC
2008
Springer
14 years 11 months ago
A Distributed Approach to Autonomous Fault Treatment in Spread
This paper presents the design and implementation of the Distributed Autonomous Replication Management (DARM) framework built on top of the Spread group communication system. The ...
Hein Meling, Joakim L. Gilje
CSJM
2006
76views more  CSJM 2006»
14 years 9 months ago
Graph Coloring using Peer-to-Peer Networks
The popularity of distributed file systems continues to grow in last years. The reasons they are preferred over traditional centralized systems include fault tolerance, availabili...
Adrian Iftene, Cornelius Croitoru
VEE
2012
ACM
239views Virtualization» more  VEE 2012»
13 years 5 months ago
Facilitating inter-application interactions for OS-level virtualization
OS-level virtualization generates a minimal start-up and run-time overhead on the host OS and thus suits applications that require both good isolation and high efficiency. However...
Zhiyong Shan, Xin Wang 0001, Tzi-cker Chiueh, Xiao...
ICDCS
2012
IEEE
13 years 17 hour ago
Combining Partial Redundancy and Checkpointing for HPC
Today’s largest High Performance Computing (HPC) systems exceed one Petaflops (1015 floating point operations per second) and exascale systems are projected within seven years...
James Elliott, Kishor Kharbas, David Fiala, Frank ...
71
Voted
PVM
2010
Springer
14 years 8 months ago
Dodging the Cost of Unavoidable Memory Copies in Message Logging Protocols
Abstract. With the number of computing elements spiraling to hundred of thousands in modern HPC systems, failures are common events. Few applications are nevertheless fault toleran...
George Bosilca, Aurelien Bouteiller, Thomas H&eacu...