—We show that deadlocks due to dependencies on consumption channels are a fundamental problem in wormhole multicast routing. This type of resource deadlocks has not been addresse...
Rajendra V. Boppana, Suresh Chalasani, C. S. Ragha...
Effective overlap of computation and communication is a well understood technique for latency hiding and can yield significant performance gains for applications on high-end compu...
Aniruddha G. Shet, P. Sadayappan, David E. Bernhol...
The Convex SPP-1000 is the first commercial implementation of a new generation of scalable shared memory parallel computers with full cache coherence. It employs a hierarchical s...
Thomas L. Sterling, Daniel Savarese, Peter MacNeic...
Building processors with large instruction windows has been proposed as a mechanism for overcoming the memory wall, but finding a feasible and implementable design has been an elu...
We present the design of high-performance and energy-efficient dynamic instruction schedulers in a 3-Dimensional integration technology. Based on a previous observation that the c...