Matrix multiplication is an important kernel in linear algebra algorithms, and the performance of both serial and parallel implementations is highly dependent on the memory system...
Siddhartha Chatterjee, Alvin R. Lebeck, Praveen K....
There has been much work recently on improving the locality performance of loop nests in scientific programs through the use of loop as well as data layout optimizations. However,...
Mahmut T. Kandemir, Alok N. Choudhary, J. Ramanuja...
On-line transaction processing exhibits poor memory behavior in high-end multiprocessor servers because of complex sharing patterns and substantial interaction between the databas...
Instruction fetch bandwidth is feared to be a major limiting factor to the performance of future wide-issue aggressive superscalars. In this paper, we focus on Database applicatio...
Instance based locality optimization 6 is a semi automatic program restructuring method that reduces the number of cache misses. The method imitates the human approach of consideri...