Failures of any type are common in current datacenters, partly due to the higher scales of the data stored. As data scales up, its availability becomes more complex, while differe...
Nicolas Bonvin, Thanasis G. Papaioannou, Karl Aber...
A self-stabilizing distributed protocol can recover from any state-corrupting fault. A self-stabilizing protocol is called adaptive if its recovery time is proportional to the numb...
Process checkpointing is a basic mechanism required for providing High Throughput Computing service on distributively owned resources. We present a new process checkpoint and migr...
—Service-Oriented Architecture (SOA) is a popular design paradigm for distributed systems today. Its dynamics and loose coupling are predestined for self-adaptive systems. This a...
Abstract—Resource sharing on the Internet is becoming increasingly pervasive. Recently, there is growing interest in distributed systems such as peer-to-peer and grid, with effor...