Sciweavers

Share
PCI
2005
Springer

Tuning Blocked Array Layouts to Exploit Memory Hierarchy in SMT Architectures

11 years 7 months ago
Tuning Blocked Array Layouts to Exploit Memory Hierarchy in SMT Architectures
Cache misses form a major bottleneck for memory-intensive applications, due to the significant latency of main memory accesses. Loop tiling, in conjunction with other program transformations, have been shown to be an effective approach to improving locality and cache exploitation, especially for dense matrix scientific computations. Beyond loop nest optimizations, data transformation techniques, and in particular blocked data layouts, have been used to boost the cache performance. The stability of performance improvements achieved are heavily dependent on the appropriate selection of tile sizes. In this paper, we investigate the memory performance of blocked data layouts, and provide a theoretical analysis for the multiple levels of memory hierarchy, when they are organized in a set associative fashion. According to this analysis, the optimal tile size that maximizes L1 cache utilization, should completely fit in the L1 cache, even for loop bodies that access more than just one arr...
Evangelia Athanasaki, Kornilios Kourtis, Nikos Ana
Added 28 Jun 2010
Updated 28 Jun 2010
Type Conference
Year 2005
Where PCI
Authors Evangelia Athanasaki, Kornilios Kourtis, Nikos Anastopoulos, Nectarios Koziris
Comments (0)
books