: A powerful and widely-used method for analyzing the performance behavior of parallel programs is event tracing. When an application is traced, performancerelevant events, such as...
Felix Wolf, Felix Freitag, Bernd Mohr, Shirley Moo...
Good network hardware performance is often squandered by overheads for accessing the network interface (NI) within a host. NIs that support user-level messaging avoid frequent ope...
Communications co-processors (CCPs) have become commonplace in modern MPPs and networks of workstations. These co-processors provide dedicated hardware support for fast communicat...
Klaus E. Schauser, Chris J. Scheiman, J. Mitchell ...
Recently, decentralized publish-subscribe (pub-sub) systems have gained popularity as a scalable asynchronous messaging paradigm over wide-area networks. Most existing pub-sub sys...
Jianxia Chen, Lakshmish Ramaswamy, David Lowenthal
We propose a new algorithm for recovering asynchronously from failures in a distributed computation. Our algorithm is based on two novel concepts - a fault-tolerant vector clock t...