We present a performance model-driven framework for automated performance tuning (autotuning) of sparse matrix-vector multiply (SpMV) on systems accelerated by graphics processing...
Subword parallelism has succeeded in accelerating many multimedia applications. Subword permutation instructions have been proposed to efficiently rearrange subwords in or among r...
Efficient high level design tools that can map behavioral descriptions to FPGA architectures are one of the key requirements to fully leverage FPGA for high throughput computatio...
Malay Haldar, Anshuman Nayak, Alok N. Choudhary, P...
Abstract—The evolution towards cross-organizational collaboration and interaction patterns has led to the emergence of scalable, Web services-based composition infrastructures. T...
Abstract—The increasing number of cores per node in highperformance computing requires an efficient intra-node MPI communication subsystem. Most existing MPI implementations rel...