When integrating software threads together to boost performance on a processor with instruction-level parallel processing support, it is rarely clear which code regions should be ...
Wide Single Instruction, Multiple Thread (SIMT) architectures often require a static allocation of thread groups that are executed in lockstep throughout the entire application ker...
We report on the development of a new computational framework for efficiently carrying out parallel data redistribution in a limited memory environment. This new library, MADRE (T...
Lossy compression of hyperspectral and ultraspectral images is traditionally performed using 3D transform coding. This approach yields good performance, but its complexity and mem...
The problem of programmability on modern heterogeneous multicore and future manycore embedded platforms is still not solved satisfactorily: although many existing but incompatible ...