Numerous mathematical approaches have been proposed to determine the optimal checkpoint interval for minimizing total execution time of an application in the presence of failures....
Fault tolerance is an important issue for large machines with tens or hundreds of thousands of processors. Checkpoint-based methods, currently used on most machines, rollback all ...
Software distributed shared memory (DSM) improves the programmability of message-passing machines and workclusters by providing a shared memory abstract (i.e., a coherent global a...
The emerging mobile wireless environment poses exciting challenges for distributed fault tolerant (FT) computing. This paper proposes a message loggingand recovery protocol on the...
Mobile computing allows ubiquitous and continuousaccess to computing resources while the users travel or work at a client's site. The flexibility introduced by mobile computi...