Sciweavers

65 search results - page 8 / 13
» Automatic parallel code generation for tiled nested loops
Sort
View
ISHPC
2003
Springer
15 years 4 months ago
Code and Data Transformations for Improving Shared Cache Performance on SMT Processors
Simultaneous multithreaded processors use shared on-chip caches, which yield better cost-performance ratios. Sharing a cache between simultaneously executing threads causes excessi...
Dimitrios S. Nikolopoulos
SC
1990
ACM
15 years 3 months ago
Loop distribution with arbitrary control flow
Loop distribution is an integral part of transforming a sequential program into a parallel one. It is used extensively in parallelization,vectorization, and memory management. For...
Ken Kennedy, Kathryn S. McKinley
ICCAD
2009
IEEE
179views Hardware» more  ICCAD 2009»
14 years 9 months ago
Automatic memory partitioning and scheduling for throughput and power optimization
Hardware acceleration is crucial in modern embedded system design to meet the explosive demands on performance and cost. Selected computation kernels for acceleration are usually ...
Jason Cong, Wei Jiang, Bin Liu, Yi Zou
WCRE
2003
IEEE
15 years 4 months ago
Extracting an Explicitly Data-Parallel Representation of Image-Processing Programs
Our research goal is to retarget image processing programs written in sequential languages (e.g., C) to architectures with data-parallel processing capabilities. Image processing ...
Lewis B. Baumstark Jr., Murat Guler, Linda M. Will...
FCCM
2011
IEEE
331views VLSI» more  FCCM 2011»
14 years 3 months ago
Synthesis of Platform Architectures from OpenCL Programs
—The problem of automatically generating hardware modules from a high level representation of an application has been at the research forefront in the last few years. In this pap...
Muhsen Owaida, Nikolaos Bellas, Konstantis Dalouka...