The cache hierarchy design in existing SMT and superscalar processors is optimized for latency, but not for bandwidth. The size of the L1 data cache did not scale over the past de...
The cache hierarchy design in existing SMT and superscalar processors is optimized for latency, but not for bandwidth. The size of the L1 data cache did not scale over the past dec...
This paper presents the study of running several core multimedia applications on a simultaneous multithreading (SMT) architecture and derives design principles for multimedia soft...
Yen-Kuang Chen, Rainer Lienhart, Eric Debes, Matth...
This paper presents a task-centric memory model for 1000-core compute accelerators. Visual computing applications are emerging as an important class of workloads that can exploit ...
John H. Kelm, Daniel R. Johnson, Steven S. Lumetta...
In this paper a parametrizable architecture of a motion estimator (ME) is presented. The ME is designed as a generic full pixel calculation module which can be adopted for dieren...