This paper presents an automated performance tuning solution, which partitions a program into a number of tuning sections and finds the best combination of compiler options for e...
The integration of microprocessors and field-programmable gate array (FPGA) fabric on a single chip increases both the utility and necessity of tools that automatically move softw...
Scott Sirowy, Yonghui Wu, Stefano Lonardi, Frank V...
The data distribution problem is very complex, because it involves trade-offdecisions between minimizing communication and maximizing parallelism. A common approach towards solving...
The recent shift in the industry towards chip multiprocessor (CMP) designs has brought the need for multi-threaded applications to mainstream computing. As observed in several lim...
Effective use of communication networks is critical to the performance and scalability of parallel applications. Partitioned Global Address Space languages like UPC bring the pro...