GPGPUs have recently emerged as powerful vehicles for generalpurpose high-performance computing. Although a new Compute Unified Device Architecture (CUDA) programming model from N...
Although caches for decades have been the backbone of the memory system, the speed gap between CPU and main memory suggests their augmentation with prefetching mechanisms. Recentl...
Cyclops is a new architecture for high performance parallel computers being developed at the IBM T. J. Watson Research Center. The basic cell of this architecture is a single-chip...
Chip multi-processors (CMPs) have become ubiquitous, while tools that ease concurrent programming have not. The promise of increased performance for all applications through ever ...
Christopher J. Rossbach, Owen S. Hofmann, Emmett W...
Most modern Chip Multiprocessors (CMP) feature shared cache on chip. For multithreaded applications, the sharing reduces communication latency among co-running threads, but also r...