Sciweavers

16 search results - page 3 / 4
» Automatic code generation for executing tiled nested loops o...
Sort
View
ICS
2009
Tsinghua U.
14 years 2 days ago
Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs
Iterative stencil loops (ISLs) are used in many applications and tiling is a well-known technique to localize their computation. When ISLs are tiled across a parallel architecture...
Jiayuan Meng, Kevin Skadron
IEEEPACT
2009
IEEE
13 years 12 months ago
Automatic Tuning of Discrete Fourier Transforms Driven by Analytical Modeling
—Analytical models have been used to estimate optimal values for parameters such as tile sizes in the context of loop nests. However, important algorithms such as fast Fourier tr...
Basilio B. Fraguela, Yevgen Voronenko, Markus P&uu...
WCRE
2003
IEEE
13 years 10 months ago
Extracting an Explicitly Data-Parallel Representation of Image-Processing Programs
Our research goal is to retarget image processing programs written in sequential languages (e.g., C) to architectures with data-parallel processing capabilities. Image processing ...
Lewis B. Baumstark Jr., Murat Guler, Linda M. Will...
CASES
2009
ACM
13 years 11 months ago
CGRA express: accelerating execution using dynamic operation fusion
Coarse-grained reconfigurable architectures (CGRAs) present an appealing hardware platform by providing programmability with the potential for high computation throughput, scalab...
Yongjun Park, Hyunchul Park, Scott A. Mahlke
CGO
2008
IEEE
13 years 11 months ago
Parallel-stage decoupled software pipelining
In recent years, the microprocessor industry has embraced chip multiprocessors (CMPs), also known as multi-core architectures, as the dominant design paradigm. For existing and ne...
Easwaran Raman, Guilherme Ottoni, Arun Raman, Matt...