Run-time parallelization is often the only way to execute the code in parallel when data dependence information is incomplete at compile time. This situation is common in many imp...
Ever-increasing memory footprint of applications and increasing mainstream popularity of shared memory parallel computing motivate us to explore memory compression potential in di...
Both hardware and software prefetching have been shown to be e ective in tolerating the large memory latencies inherent in shared-memory multiprocessors however, both types of pre...
This paper examines how the performance of a shared-memory multiprocessor can be improved by including hardware support for block transfers. A system similar to the Hector multipr...
In this paper we consider several hardware implementations of the general-purpose atomic primitives fetch and Φ, compare and swap, load linked, and store conditionalon large-scal...