This paper presents a new methodology for implementing fast synchronization on scalable cache-coherent multiprocessors, through the use of hybrid primitives. Hybrid primitives lev...
Dimitrios S. Nikolopoulos, Theodore S. Papatheodor...
Advances in IC processing allow for more microprocessor design options. The increasing gate density and cost of wires in advanced integrated circuit technologies require that we l...
Kunle Olukotun, Basem A. Nayfeh, Lance Hammond, Ke...
We consider iterative algorithms of the form z := f(z), executed by a parallel or distributed computing system. We focus on asynchronous implementations whereby each processor ite...
MPICH2 provides a layered architecture for implementing MPI-2. In this paper, we provide a new design for implementing MPI-2 over InfiniBand by extending the MPICH2 ADI3 layer. Ou...
Scalability is a crucial factor in performance evaluation and analysis of parallel and distributed systems. Much effort has been devoted to scalability research and several metric...