Trace cache, an instruction fetch technique that reduces taken branch penalties by storing and fetching program instructions in dynamic execution order, dramatically improves inst...
Instance based locality optimization 6 is a semi automatic program restructuring method that reduces the number of cache misses. The method imitates the human approach of consideri...
Exploiting locality at run-time is a complementary approach to a compiler approach for those applications with dynamic memory access patterns. This paper proposes a memory-layout ...
Boundeddegreenetworks like deBruijn graphsor wrapped butterfly networks are very important from VLSI implementation point of view as well as for applications where the computing n...
Control intensive scalar programs pose a very different challenge to highly pipelined supercomputers than vectorizable numeric applications. Function call/return and branch instru...