Designing a distributed fault tolerance algorithm requires careful analysis of both fault models and diagnosis strategies. A system will fail if there are too many active faults, ...
Managing usage service level agreements (USLAs) within environments that integrate participants and resources spanning multiple physical institutions is a challenging problem. Mai...
Network applications require a certain level of network performance for their proper operation. These individual guarantees can be provided if su cient amounts of network resource...
- Processor scheduling in distributed-memory systems has received considerable attention in recent years. Several commercial distributed-memory systems use spacesharing processor s...
Fault-tolerance techniques based on checkpointing and message logging have been increasingly used in real-world applications to reduce service down-time. Most industrial applicati...