Loop nest optimization is a combinatorial problem. Due to the growing complexity of modern architectures, it involves two increasingly difficult tasks: (1) analyzing the profita...
—Thread-level parallelism (TLP) has been extensively studied in order to overcome the limitations of exploiting instruction-level parallelism (ILP) on high-performance superscala...
We want to perform compile-time analysis of an SPMD program and place barriers in it to synchronize it correctly, minimizing the runtime cost of the synchronization. This is the b...
Data locality is critical to achievinghigh performance on large-scale parallel machines. Non-local data accesses result in communication that can greatly impact performance. Thus ...
This paper emphasizes on load balancing issues associated with hybrid programming models for the parallelization of fully permutable nested loops onto SMP clusters. Hybrid paralle...