This paper presents a new superscalar architecture for fast discrete cosine transform (DCT). Comparing with the general SIMD architecture, it speeds up the DCT computation by a fac...
General purpose programming on the graphics processing units (GPGPU) has received a lot of attention in the parallel computing community as it promises to offer the highest perfo...
M. Suhail Rehman, Kishore Kothapalli, P. J. Naraya...
Evaluating sums of multivariate Gaussians is a common computational task in computer vision and pattern recognition, including in the general and powerful kernel density estimatio...
Changjiang Yang, Ramani Duraiswami, Nail A. Gumero...
The explosive growth in integration technology and the parallel nature of rasterization-based graphics APIs changed the panorama of consumer-level graphics: today, GPUs are cheap,...
In this paper, we use the tensor product notation as the framework of a programming methodology for designing block recursive algorithms. We first express a computational problem ...