We systematically evaluate the performance of five implementations of a single, user-level communication interface. Each implementation makes different architectural assumptions ...
The high arithmetic rates of media processing applications require architectures with tens to hundreds of functional units, multiple register files, and explicit interconnect betw...
Peter R. Mattson, William J. Dally, Scott Rixner, ...
Multi-core processors, with low communication costs and high availability of execution cores, will increase the use of execution and compilation models that use short threads to e...
By studying the behavior of programs in the SPECint95 suite we observed that six out of eight programs exhibit a new kind of value locality, the frequent value locality, according...
This paper presents the first analysis of operating system execution on a simultaneous multithreaded (SMT) processor. While SMT has been studied extensively over the past 6 years,...