In this paper, we present two new run-time algorithms for the parallelization of loops that have indirect access patterns. The algorithms can handle any type of loop-carried depen...
Multi-core processors, with low communication costs and high availability of execution cores, will increase the use of execution and compilation models that use short threads to e...
In this paper, we follow a new path to arrive at the idea of a COMA — a Cache Only Memory Architecture. We show how the evolution of another architecture (ADARC) leads quite nat...
This paper describes the design and implementation of MPI-SIM, a library for the execution driven parallel simulation of MPI programs. MPI-LITE, a portable library that supports m...
Abstract. Performance of the on-chip cache is critical for processor. The multithread program model usually employed by on-chip many-core architectures may have effects on cache ac...