The speed of many computations is limited not by the number of arithmetic operations but by the time it takes to move and rearrange data in the increasingly complicated memory hie...
The efficient design of multiplierless implementa- The goal is to find the optimal sub-expressions across all N dot tions of constant matrix multipliers is challenged by the huge p...
We present a model that enables us to analyze the running time of an algorithm on a computer with a memory hierarchy with limited associativity, in terms of various cache parameter...