PC grids represent massive computation capacity at a low cost, but are challenging to employ for parallel computing because of variable and unpredictable performance and availabili...
Nagarajan Kanna, Jaspal Subhlok, Edgar Gabriel, Es...
— Checkpointing is an indispensable technique to provide fault tolerance for long-running high-throughput applications like those running on desktop grids. This paper argues that...
Samer Al-Kiswany, Matei Ripeanu, Sudharshan S. Vaz...
Reliability has become a serious concern as systems embrace nanometer technologies. In this paper, we propose a novel approach for organizing redundancy that provides high degree ...
Abstract. Dependable distributed applications require flexible infrastructure support for controlled redundancy, replication, and recovery of components and services. However, mos...
—Ultra large scale (ULS) systems are future software intensive systems that have billions of lines of code, composed of heterogeneous, changing, inconsistent and independent elem...