Sciweavers

VECPAR
2004
Springer

Automatically Tuned FFTs for BlueGene/L's Double FPU

13 years 9 months ago
Automatically Tuned FFTs for BlueGene/L's Double FPU
Abstract. IBM is currently developing the new line of BlueGene/L supercomputers. The top-of-the-line installation is planned to be a 65,536 processors system featuring a peak performance of 360 Tflop/s. This system is supposed to lead the Top 500 list when being installed in 2005 at the Lawrence Livermore National Laboratory. This paper presents one of the first numerical kernels run on a prototype BlueGene/L machine. We tuned our formal vectorization approach as well as the Vienna MAP vectorizer to support BlueGene/L’s custom two-way short vector SIMD “double” floating-point unit and connected the resulting methods to the automatic performance tuning systems Spiral and Fftw. Our approach produces automatically tuned high-performance FFT kernels for BlueGene/L that are up to 45 % faster than the best scalar Spiral generated code and up to 75 % faster than Fftw when run on a single BlueGene/L processor.
Franz Franchetti, Stefan Kral, Juergen Lorenz, Mar
Added 02 Jul 2010
Updated 02 Jul 2010
Type Conference
Year 2004
Where VECPAR
Authors Franz Franchetti, Stefan Kral, Juergen Lorenz, Markus Püschel, Christoph W. Ueberhuber
Comments (0)