Abstract. In this paper we make the case for adding standard nonblocking collective operations to the MPI standard. The non-blocking point-to-point and blocking collective operatio...
Torsten Hoefler, Prabhanjan Kambadur, Richard L. G...
Abstract. We study the problem of dynamic load-balancing on hierarchical platforms. In particular, we consider applications involving heavy communications on a distributed platform...
—This paper reviews the characteristics of overlay networks and defines effective relay nodes that can improve the performance of interactive real-time applications. A heuristic ...
We present a system for allocating resources in shared data and compute clusters that improves MapReduce job scheduling in three ways. First, the system uses regulated and user-as...
Loop fusion improves data locality and reduces synchronization in data-parallel applications. However, loop fusion is not always legal. Even when legal, fusion may introduce loop-...