Performance prediction is set to play a significant role in supportive middleware that is designed to manage workload on parallel and distributed computing systems. This middlewa...
Stephen A. Jarvis, Daniel P. Spooner, Helene N. Li...
Designers of distributed systems often rely on replicas for increased robustness, scalability, and performance. Replicated server architectures require some technique to select a ...
Most application level fault tolerance schemes in literature are non-adaptive in the sense that the fault tolerance schemes incorporated in applications are usually designed witho...
Zizhong Chen, Ming Yang, Guillermo A. Francia III,...
In this paper, we provide an overview of Logistical Runtime System (LoRS). LoRS is an integrated ensemble of tools and services that aggregate primitive (best effort, faulty) stor...
James S. Plank, Micah Beck, Jack Dongarra, Richard...
We have designed and built a set of miniature robots called Scouts and have developed a distributed software system to control them. This paper addresses the fundamental choices we...
Paul E. Rybski, Sascha Stoeter, Maria L. Gini, Dea...