In LAPACK many matrix operations are cast as block algorithms which iteratively process a panel using an unblocked algorithm and then update a remainder matrix using the high perf...
The definition of a commodity component is quite obvious when it comes to the PC as a basic compute engine and building block for clusters of PCs. Looking at the options for a mo...
The parallelization of two applications in symmetric cryptography is considered: block ciphering and a new method based on random sampling for the selection of basic substitution ...
Vincent Danjean, Roland Gillard, Serge Guelton, Je...
Processor architectures with tens to hundreds of arithmetic units are emerging to handle media processing applications. These applications, such as image coding, image synthesis, ...
Scott Rixner, William J. Dally, Brucek Khailany, P...
This paper focuses on SIMD implementations of the 2D discrete wavelet transform (DWT). The transforms considered are Daubechies’ real-to-real method of four coefficients (Daub-...
Asadollah Shahbahrami, Ben H. H. Juurlink, Stamati...