Abstract. Performance modeling is important for implementing efficient parallel applications and runtime systems. The LogP model captures the relevant aspects of message passing i...
This paper presents a new methodology for implementing fast synchronization on scalable cache-coherent multiprocessors, through the use of hybrid primitives. Hybrid primitives lev...
Dimitrios S. Nikolopoulos, Theodore S. Papatheodor...
Advances in IC processing allow for more microprocessor design options. The increasing gate density and cost of wires in advanced integrated circuit technologies require that we l...
Kunle Olukotun, Basem A. Nayfeh, Lance Hammond, Ke...
We consider iterative algorithms of the form z := f(z), executed by a parallel or distributed computing system. We focus on asynchronous implementations whereby each processor ite...
MPICH2 provides a layered architecture for implementing MPI-2. In this paper, we provide a new design for implementing MPI-2 over InfiniBand by extending the MPICH2 ADI3 layer. Ou...