In today’s high performance computing practice, fail-stop failures are often tolerated by checkpointing. While checkpointing is a very general technique and can often be applied...
In this paper, we introduce the concept of hierarchy-based fault-local stabilization and a novel self-healing/fault-containment technique and apply them in Stalk. Stalk is an algo...
Murat Demirbas, Anish Arora, Tina Nolte, Nancy A. ...
We propose a distributed data structure for maintaining spatial data sets on message-passing, distributed memory machines. The data structure is based on orthogonal bisection tree...
—Real-time applications typically operate under strict timing and dependability constraints. Although traditional data replication protocols provide fault tolerance, real-time gu...
Ashish Mehra, Jennifer Rexford, Hock-Siong Ang, Fa...
Data center power infrastructure incurs massive capital costs, which typically exceed energy costs over the life of the facility. To squeeze maximum value from the infrastructure,...
Steven Pelley, David Meisner, Pooya Zandevakili, T...