Loop vectorization, a key feature exploited to obtain high performance on Single Instruction Multiple Data (SIMD) vector architectures, is significantly hindered by irregular memo...
Byunghyun Jang, Perhaad Mistry, Dana Schaa, Rodrig...
The ability of worms to spread at rates that effectively preclude human-directed reaction has elevated them to a first-class security threat to distributed systems. We present th...
When vectorizing for SIMD architectures that are commonly employed by today’s multimedia extensions, one of the new challenges that arise is the handling of memory alignment. Pr...
Abstract. We present a hardware mechanism which dynamically detects uniform and affine vectors used in Graphics Processing Units, to minimize pressure on the register file and redu...
In this paper we use a mathematical approach to automatically generate high performance short vector code for the discrete Fourier transform (DFT). We represent the well-known Coo...