A cache oblivious matrix transposition algorithm is implemented and analyzed using simulation and hardware performance counters. Contrary to its name, the cache oblivious matrix tr...
D. Tsifakis, Alistair P. Rendell, Peter E. Strazdi...
We address the design of algorithms for multicores that are oblivious to machine parameters. We propose HM, a multicore model consisting of a parallel shared-memory machine with hi...
Rezaul Alam Chowdhury, Francesco Silvestri, Brando...
While much work has been devoted to the study of cache behavior during the execution of codes with regular access patterns, little attention has been paid to irregular codes. An i...
Basilio B. Fraguela, Ramon Doallo, Emilio L. Zapat...
Recently, several experimental studies have been conducted on block data layout as a data transformation technique used in conjunction with tiling to improve cache performance. In...