Exploitation of large amounts of instruction level parallelism requires a large amount of connectivity between the shared register file and the function units; this connectivity i...
The high arithmetic rates of media processing applications require architectures with tens to hundreds of functional units, multiple register files, and explicit interconnect betw...
Peter R. Mattson, William J. Dally, Scott Rixner, ...
Current-generation microprocessors are designed to process instructions with one and two source operands at equal cost. Handling two source operands requires multiple ports for ea...
Large numbers of logical registers can improve performance by allowing fast access to multiple subroutine contexts (register windows) and multiple thread contexts (multithreading)...
David W. Oehmke, Nathan L. Binkert, Trevor N. Mudg...
Clustering is an effective method to increase the available parallelism in VLIW datapaths without incurring severe penalties associated with large number of register file ports. E...
Viktor S. Lapinskii, Margarida F. Jacome, Gustavo ...