Abstract—Sharing patterns in shared-memory multiprocessors are the key to performance: uniprocessor latencytolerating techniques such as out-of-order execution and non-blocking c...
Efficient inter-thread value communication is essential for improving performance in Thread-Level Speculation (TLS). Although several mechanisms for improving value communication ...
Antonia Zhai, Christopher B. Colohan, J. Gregory S...
We show how the traditional protocol stack, such as TCP/IP, can be eliminated for socket based high speed communication within a cluster. The SCI shared memory interconnect is used...
HPL is a parallel Linpack benchmark package widely adopted in massive cluster system performance test. On HPL data layout among processors, a law to determine block size NB theoret...
Data communications between producer instructions and consumer instructions through memory incur extra delays that degrade processor performance. In this paper, we introduce a new...