Aim of the paper is to develop a concise but comprehensive analytical model for the well-known Grace Hash Join algorithm on cost effective cluster architectures. This approach is ...
We present the algorithm to multiply univariate polynomials with integer coefficients efficiently using the Number Theoretic transform (NTT) on Graphics Processing Units (GPU). The...
We present a generic module, called Fast Collect. Fast Collect is an implementation of Single-Writer Multi-Reader (SWMR) Shared-Memory in an asynchronous system in which a process...
A quantitative comparison of the BSP and LogP models of parallel computation is developed. We concentrate on a variant of LogP that disallows the so-called stalling behavior, alth...
Gianfranco Bilardi, Kieran T. Herley, Andrea Pietr...
The huge number of cores existing in current Graphics Processor Units (GPUs) provides these devices with computing capabilities that can be exploited by distributed applications. I...