In order to extract high levels of performance from modern parallel architectures, the effective management of deep memory hierarchies is very important. While architectural advan...
Mahmut T. Kandemir, Alok N. Choudhary, J. Ramanuja...
Chip multiprocessors (CMPs) substantially increase capacity pressure on the on-chip memory hierarchy while requiring fast access. Neither private nor shared caches can provide bot...
Zeshan Chishti, Michael D. Powell, T. N. Vijaykuma...
This paper presents a new version of the OMPi OpenMP C compiler, enhanced by lightweight runtime support based on user-level multithreading. A large number of threads can be spawne...
Panagiotis E. Hadjidoukas, Vassilios V. Dimakopoul...
With technology scaling down to 90nm and below, many yield-driven design and optimization methodologies have been proposed to cope with the prominent process variation and to incr...
Fang Gong, Hao Yu, Yiyu Shi, Daesoo Kim, Junyan Re...
- We propose a near optimal hardware architecture for deblocking filter in H.264/MPEG-4 AVC. We propose a novel filtering order and a data reuse strategy that result in significant...