Sciweavers

12 search results - page 2 / 3
» Optimal automatic multi-pass shader partitioning by dynamic ...
Sort
View
IEEEPACT
2006
IEEE
13 years 10 months ago
Fast, automatic, procedure-level performance tuning
This paper presents an automated performance tuning solution, which partitions a program into a number of tuning sections and finds the best combination of compiler options for e...
Zhelong Pan, Rudolf Eigenmann
DATE
2007
IEEE
114views Hardware» more  DATE 2007»
13 years 11 months ago
Two-level microprocessor-accelerator partitioning
The integration of microprocessors and field-programmable gate array (FPGA) fabric on a single chip increases both the utility and necessity of tools that automatically move softw...
Scott Sirowy, Yonghui Wu, Stefano Lonardi, Frank V...
LCPC
1997
Springer
13 years 9 months ago
Automatic Data Decomposition for Message-Passing Machines
The data distribution problem is very complex, because it involves trade-offdecisions between minimizing communication and maximizing parallelism. A common approach towards solving...
Mirela Damian-Iordache, Sriram V. Pemmaraju
ASPLOS
2008
ACM
13 years 6 months ago
Communication optimizations for global multi-threaded instruction scheduling
The recent shift in the industry towards chip multiprocessor (CMP) designs has brought the need for multi-threaded applications to mainstream computing. As observed in several lim...
Guilherme Ottoni, David I. August
IEEEPACT
2007
IEEE
13 years 11 months ago
Performance Portable Optimizations for Loops Containing Communication Operations
Effective use of communication networks is critical to the performance and scalability of parallel applications. Partitioned Global Address Space languages like UPC bring the pro...
Costin Iancu, Wei Chen, Katherine A. Yelick