Matrix transpose in parallel systems typically involves costly all-to-all communications. In this paper, we provide a comparative characterization of various efficient algorithms f...
This paper evaluates performance characteristics of the Compaq ES40 shared memory multiprocessor. The ES40 system contains up to four Alpha 21264 CPU’s together with a high-perf...
Stand-alone threading libraries lack sophisticated memory management techniques. In this paper, we present a methodology that allows threading libraries that implement non-preempti...
Passing messages through shared memory plays an important role on symmetric multiprocessors and on Clumps. The management of concurrent access to message queues is an important as...
Embedded DRAM (eDRAM) power-energy estimation is presented for system-on-a-chip (SOC) applications. The main feature is the signal swing based analytic (SSBA) model, which improve...