Service-orientation has been proposed as a way of facilitating the development and integration of increasingly complex and heterogeneous system components. However, there are many...
Abstract. A fault tolerant parallel virtual file system is designed and implemented to provide high I/O performance and high reliability. A queuing model is used to analyze in deta...
Fault tolerance in parallel systems has traditionally been achieved through a combination of redundancy and checkpointing methods. This notion has also been extended to message-pas...
Rajanikanth Batchu, Yoginder S. Dandass, Anthony S...
Checkpoint/restart is a general idea for which particular implementations enable various functionalities in computer systems, including process migration, gang scheduling, hiberna...
Dynamic resource management is a crucial part of the infrastructure for emerging mission-critical distributed real-time embedded system. Because of this, the resource manager must ...
Paul Rubel, Joseph P. Loyall, Richard E. Schantz, ...