VLIW and EDGE (Explicit Data Graph Execution) architectures rely on compilers to form high-quality hyperblocks for good performance. These compilers typically perform hyperblock f...
Bertrand A. Maher, Aaron Smith, Doug Burger, Kathr...
Effective use of communication networks is critical to the performance and scalability of parallel applications. Partitioned Global Address Space languages like UPC bring the pro...
Portable low-level C programs must often support multiple equivalent in-memory layouts of data, due to the byte or bit order of the compiler, architecture, or external data formats...
Thread-Level Speculation (TLS) allows us to automatically parallelize general-purpose programs by supporting parallel execution of threads that might not actually be independent. ...
J. Gregory Steffan, Christopher B. Colohan, Antoni...
—The implementation and optimization of collective communication operations is an important field of active research. Such operations directly influence application performance...
Torsten Hoefler, Christian Siebert, Andrew Lumsdai...