Sciweavers

ICASSP
2009
IEEE

Generating high performance pruned FFT implementations

13 years 8 months ago
Generating high performance pruned FFT implementations
We derive a recursive general-radix pruned Cooley-Tukey fast Fourier transform (FFT) algorithm in Kronecker product notation. The algorithm is compatible with vectorization and parallelization required on state-of-the-art multicore CPUs. We include the pruned FFT algorithm into the program generation system Spiral, and automatically generate optimized implementations of the pruned FFT for the Intel Core2Duo multicore processor. Experimental results show that using the pruned FFT can indeed speed up the fastest available FFT implementations by up to 30% when the problem size and the pattern of unused inputs and outputs are known in advance.
Franz Franchetti, Markus Püschel
Added 17 Aug 2010
Updated 17 Aug 2010
Type Conference
Year 2009
Where ICASSP
Authors Franz Franchetti, Markus Püschel
Comments (0)