This paper proposes and studies a distributed L2 cache management approach through page-level data to cache slice mapping in a future processor chip comprising many cores. L2 cach...
Increasing the core-count on current and future processors is posing critical challenges to the memory subsystem to efficiently handle concurrent memory requests. The current tren...
Abstract—Power modeling based on performance monitoring counters (PMCs) has attracted the interest of many researchers since it become a quick approach to understand and analyse ...
Multicore processors have become ubiquitous in today’s systems, but exploiting the parallelism they offer remains difficult, especially for legacy application and applications ...
Abstract--Loop tiling is an important compiler transformation used for enhancing data locality and exploiting coarsegrained parallelism. Tiled codes in which tile sizes are runtime...
Albert Hartono, Muthu Manikandan Baskaran, J. Rama...