The cache hierarchy design in existing SMT and superscalar processors is optimized for latency, but not for bandwidth. The size of the L1 data cache did not scale over the past dec...
The cache hierarchy design in existing SMT and superscalar processors is optimized for latency, but not for bandwidth. The size of the L1 data cache did not scale over the past de...
—Switched networks have been adopted in on-chip communication for their scalability and efficient resource sharing. However, using a general network for a specific domain may res...
Partitioned Global Address Space (PGAS) languages provide a unique programming model that can span shared-memory multiprocessor (SMP) architectures, distributed memory machines, o...
This paper presents the study of running several core multimedia applications on a simultaneous multithreading (SMT) architecture and derives design principles for multimedia soft...
Yen-Kuang Chen, Rainer Lienhart, Eric Debes, Matth...