We study the problem of exploiting parallelism from search-based AI systems on distributed machines. We propose stack-splitting, a technique for implementing orparallelism, which ...
Several methods have been proposed in the literature for the distribution of data on distributed memory machines, either oriented to dense or sparse structures. Many of the real a...
MPICH2 provides a layered architecture for implementing MPI-2. In this paper, we provide a new design for implementing MPI-2 over InfiniBand by extending the MPICH2 ADI3 layer. Ou...
Modulo scheduling is an e cient technique for exploiting instruction level parallelism in a variety of loops, resulting in high performance code but increased register requirement...
Alexandre E. Eichenberger, Edward S. Davidson, San...