Recently, platform FPGAs that integrate sequential processors with a spatial fabric have become prevalent. While these hybrid architectures ease the burden of integrating sequenti...
On machines with high-performance processors, the memory system continues to be a performance bottleneck. Compilers insert prefetch operations and reorder data accesses to improve...
Nathaniel McIntosh, Sandya Mannarswamy, Robert Hun...
Prefetching in shared-memory multiprocessor systems is an increasingly difficult problem. As system designs grow to incorporate larger numbers of faster processors, memory latency...
Cycles per Instruction (CPI) stacks break down processor execution time into a baseline CPI plus a number of miss event CPI components. CPI breakdowns can be very helpful in gaini...
In this paper, we propose a fully automatic dynamic scratchpad memory (SPM) management technique for instructions. Our technique loads required code segments into the SPM on deman...
Bernhard Egger, Chihun Kim, Choonki Jang, Yoonsung...