Clustering is a common technique to overcome the wire delay problem incurred by the evolution of technology. Fully-distributed architectures, where the register file, the functio...
Pre-execution attacks cache misses for which conventional address-prediction driven prefetching is ineffective. In pre-execution, copies of cache miss computations are isolated fr...
This paper presents a mesh with virtual buses as the bandwidth-efficient implementation of the mesh with multiple broadcasting on which many computational problems can be solved w...
Jong Hyuk Choi, Bong Wan Kim, Kyu Ho Park, Kwang-I...
Memory latency is an important bottleneck in system performance that cannot be adequately solved by hardware alone. Several promising software techniques have been shown to addres...
Mark Horowitz, Margaret Martonosi, Todd C. Mowry, ...
We evaluate the e ect of processor speed, network bandwidth, and software overhead on the performance of release-consistent software distributed shared memory. We examine ve di er...
Sandhya Dwarkadas, Peter J. Keleher, Alan L. Cox, ...