We present a generic module, called Fast Collect. Fast Collect is an implementation of Single-Writer Multi-Reader (SWMR) Shared-Memory in an asynchronous system in which a process...
Data prefetching has been considered an effective way to mask data access latency caused by cache misses and to bridge the performance gap between processor and memory. With hardw...
The Distributed Virtual Communication Machine (DVCM) is a software communication architecture for clusters of workstations equipped with programmable network interfaces (NIs) for ...
Several recent processor designs have proposed to enhance performance by increasing the clock frequency to the point where timing faults occur, and by adding error-correcting supp...
Brian Greskamp, Lu Wan, Ulya R. Karpuzcu, Jeffrey ...
Modern out-of-order processors tolerate long latency memory operations by supporting a large number of inflight instructions. This is particularly useful in numerical applications...