This paper presents an architectural study of a scalable system-level interconnection template. We explain why the shared bus, which is today's dominant template, will not me...
This paper presents a performance analysis of an accelerated 2-D rigid image registration implementation that employs the Compute Unified Device Architecture (CUDA) programming e...
Advanced accelerator simulations have played a prominent role in the design and analysis of modern accelerators. Given that accelerator simulations are computational intensive and...
Jungmin Lee, Zhiling Lan, J. Amundson, P. Spentzou...
Using off-the-shelf commodity workstations to build a cluster for parallel computing has become a common practice. In studying or designing a cluster of workstations one should ha...
The foremost goal of superscalar processor design is to increase performance through the exploitation of instruction-level parallelism (ILP). Previous studies have shown that spec...