In order to extract high levels of performance from modern parallel architectures, the effective management of deep memory hierarchies is very important. While architectural advan...
Mahmut T. Kandemir, Alok N. Choudhary, J. Ramanuja...
We demonstrate that data reordering can substantially improve the performance of fine-grained irregular sharedmemory benchmarks, on both hardware and software shared-memory syste...
Abstract—This paper describes an algorithm for deriving data and computation partitions on scalable shared memory multiprocessors. The algorithm establishes affinity relationshi...
We present a unified approach to locality optimization that employs both data and control transformations. Data transformations include changing the array layout in memory. Contr...
In this paper, we present a shared instruction and data memory controller for the On-Chip Memory (OCM) bus of the PowerPC embedded in the Virtex-4 chip. The traditional design of ...