Long running High Performance Computing (HPC) applications at scale must be able to tolerate inevitable faults if they are to harness current and future HPC systems. Message Passi...
— Checkpointing is an indispensable technique to provide fault tolerance for long-running high-throughput applications like those running on desktop grids. This paper argues that...
Samer Al-Kiswany, Matei Ripeanu, Sudharshan S. Vaz...
We propose a distributed algorithm to construct a balanced communication tree that serves in gathering data from the network nodes to a sink. Our algorithm constructs a near-optim...
Uncorrupted log files are the critical system component for computer forensics in case of intrusion and for real time system monitoring and auditing. Protection from tampering wit...
An important factor in the successful deployment of federated web-services-based business activities will be the ability to guarantee reliable distributed operation and execution....