We present a high performance algorithm for multiplying sparse distributed polynomials using a multicore processor. Each core uses a heap of pointers to multiply parts of the poly...
We present the preliminary design for a C++ template library to enable the compositional construction of matrix classes suitable for high performance numerical linear algebra comp...
Exploiting the emerging reality of affordable multi-core architeces through providing programmers with simple abstractions that would enable them to easily turn their sequential p...
Abstract. Themain contributionof thiswork isto propose a numberof broadcastefficient VLSI architectures for computing the sum and the prefix sums of a w k-bit, k 2, binary sequenc...
Rong Lin, Koji Nakano, Stephan Olariu, Maria Crist...
Irregular programs are programs organized around pointer-based data structures such as trees and graphs. Recent investigations by the Galois project have shown that many irregular...
Milind Kulkarni, Martin Burtscher, Rajeshkar Inkul...