We present compiler techniques for translating OpenMP shared-memory parallel applications into MPI messagepassing programs for execution on distributed memory systems. This transl...
Previous studies have shown that array regrouping and structure splitting significantly improve data locality. The most effective technique relies on profiling every access to eve...
Excessive power consumption is becoming a major barrier to extracting the maximum performance from high-performance parallel systems. Therefore, techniques oriented towards reduci...
In an out-of-order issue processor, instructions are dynamically reordered and issued to function units in their dataready order rather than their original program order to achiev...
Thread-level speculation (TLS) allows potentially dependent threads to speculatively execute in parallel, thus making it easier for the compiler to extract parallel threads. Howeve...