Traditional directory-based cache coherence protocols suffer from long-latency cache misses as a consequence of the indirection introduced by the home node, which must be accessed...
Abstract -- The development of cost-effective highperformance parallel computing on multi-processor supercomputers makes it attractive to port excessively time consuming simulation...
Ning Lu, Z. Todd Taylor, David P. Chassin, Ross T....
Parallel simulation is a technique to accelerate microarchitecture simulation of CMPs by exploiting the inherent parallelism of CMPs. In this paper, we explore the simulation para...
Stand-alone threading libraries lack sophisticated memory management techniques. In this paper, we present a methodology that allows threading libraries that implement non-preempti...
Load balancing and data locality are the two most important factors in the performance of parallel programs on distributed-memory multiprocessors. A good balancing scheme should e...