Speculative multithreading (SpMT) architecture can exploit thread-level parallelism that cannot be identified statically. Speedup can be obtained by speculatively executing threa...
We describe the Java runtime parallelizing machine (Jrpm), a complete system for parallelizing sequential programs automatically. Jrpm is based on a chip multiprocessor (CMP) with...
The Tensor Contraction Engine (TCE) is a domain-specific compiler for implementing complex tensor contraction expressions arising in quantum chemistry applications modeling elect...
Efficient task scheduling is essential for obtaining high performance in heterogeneous distributed computing systems (or HeDCSs). Because of its key importance, several scheduling...
In a parallel system with multiple CPUs, one of the key problems is to assign loop iterations to processors. This problem, known as the loop scheduling problem, has been studied i...
Mahmut T. Kandemir, Taylan Yemliha, Seung Woo Son,...