—Coordinated Checkpoint/Restart (C/R) is a widely deployed strategy to achieve fault-tolerance. However, C/R by itself is not capable enough to meet the demands of upcoming exasc...
DMTCP (Distributed MultiThreaded CheckPointing) is a transparent user-level checkpointing package for distributed applications. Checkpointing and restart is demonstrated for a wid...
Software vulnerabilities have been the main contributing factor to the Internet security problems such as fast spreading worms. Among these software vulnerabilities, memory corrup...
We describe a new suite of computational benchmarks that models applications featuring multiple levels of parallelism. Such parallelism is often available in realistic flow comput...
We present S, the first system to provide transparent, lowoverhead application record-replay and the ability to go live from replayed execution. S i...