As more applications rely on underlying peer-to-peer topologies, the need for efficient and resilient infrastructure has become more pressing. A number of important classes of top...
In this paper we show how to reduce downtime of J2EE applications by rapidly and automatically recovering from transient and intermittent software failures, without requiring appl...
George Candea, Emre Kiciman, Shinichi Kawamoto, Ar...
Assigning an application’s fault-tolerance properties (e.g., replication style, checkpointing frequency) statically, and in an arbitrary manner, can lead to the application not ...
A major challenge facing grid applications is the appropriate handling of failures. In this paper we address the problem of making parallel Java applications based on Remote Method...
We suggest that a combination of randomization and gossip communication can be used to overcome scalability barriers that limit the utility of many technologies for distributed sys...