Distributed information systems are critical to the functioning of many businesses; designing them to be dependable is a challenging but important task. We report our experience i...
Jeremy Bryans, John S. Fitzgerald, Alexander Roman...
Today organizations and business enterprises of all sizes need to deal with unprecedented amounts of digital information, creating challenging demands for mass storage and on-dema...
Sangeetha Seshadri, Ling Liu, Brian F. Cooper, Law...
Abstract— We present on-line tunable diagnostic and membership protocols for generic time-triggered (TT) systems to detect crashes, send/receive omission faults and network parti...
Fault tolerance in parallel systems has traditionally been achieved through a combination of redundancy and checkpointing methods. This notion has also been extended to message-pas...
Rajanikanth Batchu, Yoginder S. Dandass, Anthony S...
In distributed IT systems, replication of information is commonly used to strengthen the fault tolerance on a technical level or the autonomy of an organization on a business level...