The demand for more computational power in science and engineering has spurred the design and deployment of ever-growing cluster systems. Even though the individual components use...
Abstract. Our main objective is to combine partial-order methods with verification techniques for infinite-state systems in order to obtain efficient verification algorithms fo...
Large-scale hosting infrastructures require automatic system anomaly management to achieve continuous system operation. In this paper, we present a novel adaptive runtime anomaly ...
To rapidly evolve new designs of peer-to-peer (P2P) multimedia streaming systems, it is highly desirable to test and troubleshoot them in a controlled and repeatable experimental ...
Checkpoint and Recovery (CPR) systems have many uses in high-performance computing. Because of this, many developers have implemented it, by hand, into their applications. One of ...
Greg Bronevetsky, Rohit Fernandes, Daniel Marques,...