Current on-chip block-centric memory hierarchies exploit access patterns at the fine-grain scale of small blocks. Several recently proposed techniques for coherence traffic reduct...
We present a novel algorithm to compute cache-efficient layouts of bounding volume hierarchies (BVHs) of polygonal models. Our approach does not make any assumptions about the cac...
HPL is a parallel Linpack benchmark package widely adopted in massive cluster system performance test. On HPL data layout among processors, a law to determine block size NB theoret...
Although it is increasingly difficult for large scientific programs to attain a significant fraction of peak performance on systems based on microprocessors with substantial instr...
John M. Mellor-Crummey, Robert J. Fowler, Gabriel ...
In order to extract high levels of performance from modern parallel architectures, the effective management of deep memory hierarchies is very important. While architectural advan...
Mahmut T. Kandemir, Alok N. Choudhary, J. Ramanuja...