Self-stabilizing token circulation algorithms are not always adapted for dynamic networks. Random walks are well known to play a crucial role in the design of randomized algorithm...
The productivity of HPC system is determined not only by their performance, but also by their reliability. The conventional method to limit the impact of failures is checkpointing...
There has recently been increasing interests in using system virtualization to improve the dependability of HPC cluster systems. However, it is not cost-free and may come with som...
Haibo Chen, Rong Chen, Fengzhe Zhang, Binyu Zang, ...
Failure detectors represent a very important building block in distributed applications. The speed and the accuracy of the failure detectors is critical to the performance of the ...
NEBLO is a library and runtime system based on a structured overlay network. The API presented by NEBLO offers simple primitives and powerful mechanisms for programming generic p...