A cache oblivious matrix transposition algorithm is implemented and analyzed using simulation and hardware performance counters. Contrary to its name, the cache oblivious matrix tr...
D. Tsifakis, Alistair P. Rendell, Peter E. Strazdi...
Cache locality optimization is an efficient way for reducing the idle time of modern processors in waiting for needed data. This kind of optimization can be achieved either on the...
Heterogeneous multicores, such as Cell BE processors and GPGPUs, typically do not have caches for their accelerator cores because coherence traffic, cache misses, and latencies fr...
We present and evaluate the idea of adaptive processor cache management. Specifically, we describe a novel and general scheme by which we can combine any two cache management alg...
Ranjith Subramanian, Yannis Smaragdakis, Gabriel H...
In embedded system design, the designer has to choose an onchip memory configuration that is suitable for a specific application. To aid in this design choice, we present a memory...