Sciweavers

3321 search results - page 345 / 665
» Performance of parallel computations with dynamic processor ...
Sort
View
PPOPP
2010
ACM
16 years 2 months ago
Scaling LAPACK panel operations using parallel cache assignment
In LAPACK many matrix operations are cast as block algorithms which iteratively process a panel using an unblocked algorithm and then update a remainder matrix using the high perf...
Anthony M. Castaldo, R. Clint Whaley
IPPS
2003
IEEE
15 years 10 months ago
Cost/Performance Tradeoffs in Network Interconnects for Clusters of Commodity PCs
The definition of a commodity component is quite obvious when it comes to the PC as a basic compute engine and building block for clusters of PCs. Looking at the options for a mo...
Christian Kurmann, Felix Rauch, Thomas Stricker
148
Voted
ISSAC
2007
Springer
132views Mathematics» more  ISSAC 2007»
15 years 11 months ago
Adaptive loops with kaapi on multicore and grid: applications in symmetric cryptography
The parallelization of two applications in symmetric cryptography is considered: block ciphering and a new method based on random sampling for the selection of basic substitution ...
Vincent Danjean, Roland Gillard, Serge Guelton, Je...
HPCA
2000
IEEE
15 years 9 months ago
Register Organization for Media Processing
Processor architectures with tens to hundreds of arithmetic units are emerging to handle media processing applications. These applications, such as image coding, image synthesis, ...
Scott Rixner, William J. Dally, Brucek Khailany, P...
ASAP
2005
IEEE
151views Hardware» more  ASAP 2005»
15 years 10 months ago
Performance Comparison of SIMD Implementations of the Discrete Wavelet Transform
This paper focuses on SIMD implementations of the 2D discrete wavelet transform (DWT). The transforms considered are Daubechies’ real-to-real method of four coefficients (Daub-...
Asadollah Shahbahrami, Ben H. H. Juurlink, Stamati...