This paper explores the feasibility of a cost-efficient storage architecture that offers the reliability and access performance characteristics of a high-end system. This architec...
Most of today‘s HPC systems employ a single head node for control, which represents a single point of failure as it interrupts an entire HPC system upon failure. Furthermore, it...
Kai Uhlemann, Christian Engelmann, Stephen L. Scot...
A fault-scalable service can be configured to tolerate increasing numbers of faults without significant decreases in performance. The Query/Update (Q/U) protocol is a new tool t...
Michael Abd-El-Malek, Gregory R. Ganger, Garth R. ...
—Coordinated Checkpoint/Restart (C/R) is a widely deployed strategy to achieve fault-tolerance. However, C/R by itself is not capable enough to meet the demands of upcoming exasc...
Regular Expressions (RegExes) are widely used in various applications to identify strings of text. Their flexibility, however, increases the complexity of the detection system and ...
Masanori Bando, N. Sertac Artan, Nishit Mehta, Yi ...