Abstract. Loops are an important source of optimization. In this paper, we address such optimizations for those cases when loops contain kernels mapped on reconfigurable fabric. We...
Ozana Silvia Dragomir, Elena Moscu Panainte, Koen ...
Distributed learning is a problem of fundamental interest in machine learning and cognitive science. In this paper, we present asynchronous distributed learning algorithms for two...
Abstract. Distributing process-oriented programs across a cluster of machines requires careful attention to the effects of network latency. The MPI standard, widely used for cluste...
We demonstrate the use of highly parallel graphics processing units (GPUs) to accelerate the Superposition/Convolution (S/C) algorithm to interactive rates while reducing the numbe...
Robert Jacques, Russell Taylor, John Wong, Todd Mc...
Tw o parallel programming models represented b y OpenMP and MPI are compared for PDE solvers based on regular sparse numerical operators. As a typical representative of such an app...