Dynamic data distributions offer a number of performance benefits, but require more sophisticated compiler support and incur run-time overhead. We investigate attainable benefits ...
Management of program data to improve data locality and reduce false sharing is critical for scaling performanceon NUMA shared memorymultiprocessors. We use HPF-like data decomposi...
In this paper we describe the design, implementation and experimental evaluation of a technique for operating system schedulers called processor pool-based scheduling [51]. Our tec...
Our recent work on uniprocessor and single-node multiprocessor (SMP) active memory systems uses address remapping techniques in conjunction with extended cache coherence protocols...
Over the last decade, Message Passing Interface (MPI) has become a very successful parallel programming environment for distributed memory architectures such as clusters. However, ...