In LAPACK many matrix operations are cast as block algorithms which iteratively process a panel using an unblocked algorithm and then update a remainder matrix using the high perf...
In this work, we propose a new FPGA design flow that combines the CUDA programming model from Nvidia with the state of the art high-level synthesis tool AutoPilot from AutoESL, to...
To sustain emerging data-intensive scientific applications, High Performance Computing (HPC) centers invest a notable fraction of their operating budget on a specialized fast sto...
Henry M. Monti, Ali Raza Butt, Sudharshan S. Vazhk...
In many scientific applications, significant time is spent tuning codes for a particular highperformance architecture. Tuning approaches range from the relatively nonintrusive (...
Albert Hartono, Boyana Norris, Ponnuswamy Sadayapp...
JPEG2000 is the latest still image coding standard from the JPEG committee, which adopts new algorithms such as Embedded Block Coding with Optimized Truncation (EBCOT) and Discret...