How Far Can We Go on the x64 Processors?

10 years 5 months ago
How Far Can We Go on the x64 Processors?
This paper studies the state-of-the-art software optimization methodology for symmetric cryptographic primitives on the new 64-bit x64 processors, AMD Athlon64 (AMD64) and Intel Pentium 4 (EM64T). We fully utilize newly introduced 64-bit registers and instructions for extracting maximal performance of target primitives. Our program of AES with 128-bit key runs in 170 cycles/block on Athlon 64, which is, as far as we know, the fastest implementation of AES on a PC processor. Also we implemented a "bitsliced" AES and Camellia for the first time, both of which achieved very good performance. A bitslice implementation is important from the viewpoint of a countermeasure against cache timing attacks because it does not require lookup tables with a keydependent address. We also analyze performance of SHA256/512 and Whirlpool hash functions and show that SHA512 can run faster than SHA256 on Athlon 64. This paper exhibits an undocumented fact that 64-bit right shifts and 64-bit rotati...
Mitsuru Matsui
Added 23 Aug 2010
Updated 23 Aug 2010
Type Conference
Year 2006
Where FSE
Authors Mitsuru Matsui
Comments (0)