This paper discusses extensions to the Rover toolkit for constructing reliable mobile-aware applications. The extensions improve upon the existing failure model, which only addres...
Clustering is a well known technique that allows scalability and fault tolerance in distributed systems. In the J2EE framework, clustering can be used to improve the performance a...
P2P systems inherently have high scalability, robustness and fault tolerance because there is no centralized server and the network self-organizes itself. This is achieved at the ...
Checkpointing is a widely used mechanism for supporting fault tolerance, but notorious in its high-cost disk access. The idea of memory-based checkpointing has been extensively stu...
Large web or e-commerce sites are frequently hosted on clusters. Successful open-source tools exist for clustering the front tiers of such sites (web servers and application serve...