hyperobjects (reducers) provide a linguistic abstraction for dynamic multithreading that allows different branches of a parallel program to maintain coordinated local views of the...
I.-Ting Angelina Lee, Aamir Shafi, Charles E. Leis...
Abstract. The MPI Standard does not make any performance guarantees, but users expect (and like) MPI implementations to deliver good performance. A common-sense expectation of perf...
This paper presents a new GPU-based tensor voting implementation which achieves significant performance improvement over the conventional CPU-based implementation. Although the t...
Abstract. The ability to offload functionality to a programmable network interface is appealing, both for increasing message passing performance and for reducing the overhead on t...
This paper investigates the utilization of the master-slave (MS) paradigm as an alternative to domain decomposition (DD) methods for parallelizing lattice gauge theory (LGT) model...