Through the algorthmic design patterns of data parallelism and task parallelism, the graphics processing unit (GPU) offers the potential to vastly accelerate discovery and innovat...
Jeremy S. Archuleta, Yong Cao, Thomas Scogland, Wu...
Transactional Coherence and Consistency (TCC) is a novel coherence scheme for shared memory multiprocessors that uses programmer-defined transactions as the fundamental unit of p...
The effect of the operating system on application performance is an increasingly important consideration in high performance computing. OS kernel measurement is key to understandi...
Aroon Nataraj, Allen D. Malony, Sameer Shende, Ala...
We present a simulation-based performance model to analyze a parallel sparse LU factorization algorithm on modern cached-based, high-end parallel architectures. We consider supern...
The Convex SPP-1000 is the first commercial implementation of a new generation of scalable shared memory parallel computers with full cache coherence. It employs a hierarchical s...
Thomas L. Sterling, Daniel Savarese, Peter MacNeic...