Recent work in low-latency, high-bandwidth communication systems has resulted in building user–level Network InControllers (NICs) and communication abstractions that support dir...
On a distributed memory machine, hand-coded message passing leads to the most efficient execution, but it is difficult to use. Parallelizing compilers can approach the performance...
There has been considerable recent interest in the support of transactional memory (TM) in both hardware and software. We present an intermediate approach, in which hardware is us...
Arrvindh Shriraman, Michael F. Spear, Hemayet Hoss...
In distributed shared memory multiprocessors, remote memory references generate processor-to-memory traffic, which may result in a bottleneck. It is therefore important to design ...
Recent research advocates memory streaming techniques to alleviate the performance bottleneck caused by the high latencies of off-chip memory accesses. Temporal memory streaming r...
Stephen Somogyi, Thomas F. Wenisch, Anastasia Aila...