Abstract. We give a high-level description of some fundamental randomized and deterministic techniques for routing and sorting on xedconnection networks such as meshes, hypercubes ...
Loop distribution is an integral part of transforming a sequential program into a parallel one. It is used extensively in parallelization,vectorization, and memory management. For...
We present the algorithm to multiply univariate polynomials with integer coefficients efficiently using the Number Theoretic transform (NTT) on Graphics Processing Units (GPU). The...
This paper examines the scalable parallel implementation of QR factorization of a general matrix, targeting SMP and multi-core architectures. Two implementations of algorithms-by-...
Embedded computing architectures can be designed to meet a variety of application specific requirements. However, optimized hardware can require compiler support to realize the po...