Matrix multiplication is an important kernel in linear algebra algorithms, and the performance of both serial and parallel implementations is highly dependent on the memory system...
Siddhartha Chatterjee, Alvin R. Lebeck, Praveen K....
Abstract. The nano-threads programming model was proposed to effectively integrate multiprogramming on shared-memory multiprocessors, with the exploitation of fine-grain parallelis...
Dimitrios S. Nikolopoulos, Eleftherios D. Polychro...
We have investigated the register file requirements of dynamically scheduled processors using register renaming and dispatch queues running the SPEC92 benchmarks. We looked at pro...
Traditional DMA requires the operating system to perform many tasks to initiate a transfer, with overhead on the order of hundreds or thousands of CPU instructions. This paper des...
Matthias A. Blumrich, Cezary Dubnicki, Edward W. F...
We present an algorithm for partial order reduction in the context of a countable universe of deterministic actions, of which finitely many are enabled at any given state. This mea...