Management of program data to improve data locality and reduce false sharing is critical for scaling performanceon NUMA shared memorymultiprocessors. We use HPF-like data decomposition directives to partition and place arrays in data-parallel applications on Hector, a shared-memory NUMA multiprocessor. We describe a compiler system for automating the partitioningand placement of arrays. The compiler exploits Hector's shared memory architecture to e cientlyimplementdistributedarrays. Experimentalresults froma prototypeimplementation demonstrate the e ectiveness of these techniques. They also demonstrate the magnitude of the performanceimprovementattainablewhen ourcompiler-baseddatamanagementschemesare used instead of operatingsystem data managementpolicies performanceimproves by up to a factor of 5.
Tarek S. Abdelrahman, Thomas N. Wong