Overlapping communication with computation is an important optimization on current cluster architectures; its importance is likely to increase as the doubling of processing power ...
Wei-Yu Chen, Dan Bonachea, Costin Iancu, Katherine...
For maximum performance, an out-of-order processor must issue load instructions as early as possible, while avoiding memory-order violations with prior store instructions that wri...
In deep submicron technology, wire delay is no longer negligible and is gradually dominating the system latency. Some state-of-the-art architectural synthesis flows adopt the distr...
This paper introduces a new highly optimized architecture for remote memory access (RMA). RMA, using put and get operations, is a one-sided communication function which amongst ot...
Networks-on-Chip (NoC) is emerging as a practical development platform for future systems-on-chip products. We propose an energyefficient static algorithm which optimizes the ener...