This paper presents a technique to visualize the communication pattern of a parallel application at different points during its execution. Unlike many existing tools that show the...
Alex K. Jones, Raymond R. Hoare, Joseph St. Onge, ...
Compiler optimizations are often driven by specific assumptions about the underlying architecture and implementation of the target machine. For example, when targeting shared-mem...
Jack L. Lo, Susan J. Eggers, Henry M. Levy, Sujay ...
The increasing complexity of hardware features for recent processors makes high performance code generation very challenging. In particular, several optimization targets have to b...
This paper describes a generalisation of modulo scheduling to parallelise loops for SpMT processors that exploits simultaneously both instruction-level parallelism and thread-leve...
Lin Gao 0002, Quan Hoang Nguyen, Lian Li 0002, Jin...
Tiling exploits temporal reuse carried by an outer loop of a loop nest to enhance cache locality. Loop skewing is typically required to make tiling legal. This restricts parallelis...