Failure detectors are fundamental building blocks in distributed systems. Multi-node failure detectors, where the detector is tasked with monitoring N other nodes, play a critical...
Automatic management of large-scale production systems requires a continuous monitoring service to keep track of the states of the managed system. However, it is challenging to ac...
Abstract. Gossip-based information dissemination protocols are considered easy to deploy, scalable and resilient to network dynamics. Loadbalancing is inherent in these protocols a...
Name services are critical for mapping logical resource names to physical resources in large-scale distributed systems. The Domain Name System (DNS) used on the Internet, however,...
The very nature of implementing and evaluating fully distributed algorithms or protocols in application-layer overlay networks involves certain programming tasks that are at best m...