Until recently, parallel programming has largely focused on the exploitation of data-parallelism in dense matrix programs. However, many important application domains, including m...
Milind Kulkarni, Martin Burtscher, Calin Cascaval,...
Abstract. We show a two-phase approach for minimizing various communication-cost metrics in fine-grain partitioning of sparse matrices for parallel processing. In the first phase...
The algorithms in the current sequential numerical linear algebra libraries (e.g. LAPACK) do not parallelize well on multicore architectures. A new family of algorithms, the tile a...
Emmanuel Agullo, Henricus Bouwmeester, Jack Dongar...
We present a novel dedicated hardware system for the extraction of second-order statistical features from high-resolution images. The selected features are based on gray level co-o...
Dimitris G. Bariamis, Dimitrios K. Iakovidis, Dimi...
This paper describes an implementation of parallel LU factorization. The focus is to achieve high performance on non-dedicated clusters, where the number of available computing re...