The model of bulk-synchronous parallel computation (BSP) helps to implement portable general purpose algorithms while keeping predictable performance on different parallel compute...
— We present a novel hardware mechanism for dynamic program phase detection in distributed sharedmemory (DSM) multiprocessors. We show that successful hardware mechanisms for pha...
Parallel architectures are the way of the future, but are notoriously difficult to program. In addition to the low-level constructs they often present (e.g., locks, DMA, and non-...
Over the last decade, Message Passing Interface (MPI) has become a very successful parallel programming environment for distributed memory architectures such as clusters. However, ...
: This paper presents a Data-Distributed Execution approach that exploits interation-level parallelism in loops operating over arrays. It performs data-dependency analysis, based o...