Sciweavers

36 search results - page 1 / 8
» Performance Portable Optimizations for Loops Containing Comm...
Sort
View
IEEEPACT
2007
IEEE
13 years 11 months ago
Performance Portable Optimizations for Loops Containing Communication Operations
Effective use of communication networks is critical to the performance and scalability of parallel applications. Partitioned Global Address Space languages like UPC bring the pro...
Costin Iancu, Wei Chen, Katherine A. Yelick
IEEEPACT
2002
IEEE
13 years 10 months ago
Optimizing Loop Performance for Clustered VLIW Architectures
Modern embedded systems often require high degrees of instruction-level parallelism (ILP) within strict constraints on power consumption and chip cost. Unfortunately, a high-perfo...
Yi Qian, Steve Carr, Philip H. Sweany
ICS
2009
Tsinghua U.
13 years 11 months ago
Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs
Iterative stencil loops (ISLs) are used in many applications and tiling is a well-known technique to localize their computation. When ISLs are tiled across a parallel architecture...
Jiayuan Meng, Kevin Skadron
PLDI
2010
ACM
13 years 9 months ago
Detecting Inefficiently-Used Containers to Avoid Bloat
Runtime bloat degrades significantly the performance and scalability of software systems. An important source of bloat is the inefficient use of containers. It is expensive to cre...
Guoqing Xu, Atanas Rountev
LCN
2003
IEEE
13 years 10 months ago
Pipelining and Overlapping for MPI Collective Operations
Collective operations are an important aspect of the currently most important message-passing programming model MPI (Message Passing Interface). Many MPI applications make heavy u...
Joachim Worringen