We consider a variety of dynamic, hardware-based methods for exploiting load/store parallelism, including mechanisms that use memory dependence speculation. While previous work ha...
In a 64-bit processor, many of the data values actually used in computations require much narrower data-widths. In this study, we demonstrate that instruction data-widths exhibit ...
The performance tradeoff between hardware complexity and clock speed is studied. First, a generic superscalar pipeline is defined. Then the specific areas of register renaming, ...
Subbarao Palacharla, Norman P. Jouppi, James E. Sm...
We investigate instruction distribution methods for quadcluster, dynamically-scheduled superscalar processors. We study a variety of methods with different cost, performance and c...
Analytical modeling is applied to the automated design of application-specific superscalar processors. Using an analytical method bridges the gap between the size of the design sp...