In many computer systems with large data computations, the delay of memory access is one of the major performance bottlenecks. In this paper, we propose an enhanced field remappi...
Keoncheol Shin, Jungeun Kim, Seonggun Kim, Hwansoo...
This paper presents a new and simple prefetching mechanism to improve the memory performance of multimedia applications. This method adapts the memory access mechanism to the acce...
As the gap between memory and processor speeds continues to widen, cache efficiency is an increasingly important component of processor performance. Compiler techniques have been...
— Traditional level-one instruction caches and data caches for embedded systems typically have the same capacities. Configurable caches either shut down a part of the cache to su...
We present a cross-layer customization methodology for latency and bandwidth efficient inter-core communication in embedded multiprocessors. The methodology integrates compiler, o...