Sciweavers

115 search results - page 7 / 23
» Fusion of Loops for Parallelism and Locality
Sort
View
IPPS
2010
IEEE
14 years 9 months ago
Restructuring parallel loops to curb false sharing on multicore architectures
The memory hierarchy of most multicore systems contains one or more levels of cache that is shared among multiple cores. The shared-cache architecture presents many opportunities f...
Santosh Sarangkar, Apan Qasem
CASES
2009
ACM
15 years 6 months ago
CGRA express: accelerating execution using dynamic operation fusion
Coarse-grained reconfigurable architectures (CGRAs) present an appealing hardware platform by providing programmability with the potential for high computation throughput, scalab...
Yongjun Park, Hyunchul Park, Scott A. Mahlke
ASPLOS
1994
ACM
15 years 3 months ago
Compiler Optimizations for Improving Data Locality
In the past decade, processor speed has become significantly faster than memory speed. Small, fast cache memories are designed to overcome this discrepancy, but they are only effe...
Steve Carr, Kathryn S. McKinley, Chau-Wen Tseng
99
Voted
ICPP
1996
IEEE
15 years 3 months ago
Scheduling of Wavefront Parallelism on Scalable Shared-memory Multiprocessors
Tiling exploits temporal reuse carried by an outer loop of a loop nest to enhance cache locality. Loop skewing is typically required to make tiling legal. This restricts parallelis...
Naraig Manjikian, Tarek S. Abdelrahman
SAC
2002
ACM
14 years 11 months ago
Automatic code generation for executing tiled nested loops onto parallel architectures
This paper presents a novel approach for the problem of generating tiled code for nested for-loops using a tiling transformation. Tiling or supernode transformation has been widel...
Georgios I. Goumas, Maria Athanasaki, Nectarios Ko...