Future high-end computers will offer great performance improvements over today’s machines, enabling applications of far greater complexity. However, designers must solve the cha...
Guang R. Gao, Kevin B. Theobald, Ziang Hu, Haiping...
The high arithmetic rates of media processing applications require architectures with tens to hundreds of functional units, multiple register files, and explicit interconnect betw...
Peter R. Mattson, William J. Dally, Scott Rixner, ...
In this paper we present a processor microarchitecture that can simultaneously execute multiple threads and has a clustered design for scalability purposes. A main feature of the ...
Control intensive scalar programs pose a very different challenge to highly pipelined supercomputers than vectorizable numeric applications. Function call/return and branch instru...
OpenMP is an architecture-independent language for programming in the shared memory model. OpenMP is designed to be simple and in terms of programming abstractions. Unfortunately,...