The memory hierarchy of most multicore systems contains one or more levels of cache that is shared among multiple cores. The shared-cache architecture presents many opportunities f...
Coarse-grained reconfigurable architectures (CGRAs) present an appealing hardware platform by providing programmability with the potential for high computation throughput, scalab...
In the past decade, processor speed has become significantly faster than memory speed. Small, fast cache memories are designed to overcome this discrepancy, but they are only effe...
Tiling exploits temporal reuse carried by an outer loop of a loop nest to enhance cache locality. Loop skewing is typically required to make tiling legal. This restricts parallelis...
This paper presents a novel approach for the problem of generating tiled code for nested for-loops using a tiling transformation. Tiling or supernode transformation has been widel...
Georgios I. Goumas, Maria Athanasaki, Nectarios Ko...