To improve the whole dependability of large-scale cluster systems, an online fault detection mechanism is proposed in this paper. This mechanism can detect the fault in time befor...
In most large-scale peer-to-peer (P2P) applications, it is necessary to collect vital statistics data — sometimes referred to as logs — from up to millions of peers. Tradition...
As we continue to evolve into large-scale parallel systems, many of them employing hundreds of computing engines to take on mission-critical roles, it is crucial to design those s...
Yanyong Zhang, Mark S. Squillante, Anand Sivasubra...
— We propose a new scheme for content distribution of large files that is based on network coding. With network coding, each node of the distribution network is able to generate...
Cluster systems have been gradually more popular and are being broadly used in a variety of applications. On the other hand, many of those systems are not tolerant to system failu...