Short vector (SIMD) instructions are useful in signal processing, multimedia, and scientific applications. They offer higher performance, lower energy consumption, and better res...
Abstract. This paper introduces a method to generate efficient vectorized implementations of small stride permutations using only vector load and vector shuffle instructions. These...
The widespread presence of SIMD devices in today’s microprocessors has made compiler techniques for these devices tremendously important. One of the most important and difficul...
We introduce another view of group theory in the field of interconnection networks. With this approach it is possible to specify application specific network topologies for permut...
SIMD extension is one of the most common and effective technique to exploit data-level parallelism in today’s processor designs. However, the performance of SIMD architectures i...