A key challenge in architecting a CMP with many cores is maintaining cache coherence in an efficient manner. Directory-based protocols avoid the bandwidth overhead of snoop-based ...
Jason Zebchuk, Vijayalakshmi Srinivasan, Moinuddin...
Packet processing systems maintain high throughput despite relatively high memory latencies by exploiting the coarse-grained parallelism available between packets. In particular, ...
This paper examines the performance of simultaneous multithreading (SMT) for network servers using actual hardware, multiple network server applications, and several workloads. Us...
Yaoping Ruan, Vivek S. Pai, Erich M. Nahum, John M...
With faster CPU clocks and wider pipelines, all relevant microarchitecture components should scale accordingly. There have been many proposals for scaling the issue queue, registe...
Silicon technology will continue to provide an exponential increase in the availability of raw transistors. Effectively translating this resource into application performance, how...
Steven Swanson, Ken Michelson, Andrew Schwerin, Ma...