crypto: serpent - add 4-way parallel i586/SSE2 assembler implementation
authorJussi Kivilinna <jussi.kivilinna@mbnet.fi>
Wed, 9 Nov 2011 14:26:31 +0000 (16:26 +0200)
committerHerbert Xu <herbert@gondor.apana.org.au>
Mon, 21 Nov 2011 08:13:23 +0000 (16:13 +0800)
commit251496dbfc1be38bc43b49651f3d33c02faccc47
treee17a6704b90b94d0da126eba603fe20cb7ca822c
parent937c30d7f560210b0163035edd42b2aef78fed9e
crypto: serpent - add 4-way parallel i586/SSE2 assembler implementation

Patch adds i586/SSE2 assembler implementation of serpent cipher. Assembler
functions crypt data in four block chunks.

Patch has been tested with tcrypt and automated filesystem tests.

Tcrypt benchmarks results (serpent-sse2/serpent_generic speed ratios):

Intel Atom N270:

size    ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec
16      0.95x   1.12x   1.02x   1.07x   0.97x   0.98x
64      1.73x   1.82x   1.08x   1.82x   1.72x   1.73x
256     2.08x   2.00x   1.04x   2.07x   1.99x   2.01x
1024    2.28x   2.18x   1.05x   2.23x   2.17x   2.20x
8192    2.28x   2.13x   1.05x   2.23x   2.18x   2.20x

Full output:
 http://koti.mbnet.fi/axh/kernel/crypto/atom-n270/serpent-generic.txt
 http://koti.mbnet.fi/axh/kernel/crypto/atom-n270/serpent-sse2.txt

Userspace test results:

Encryption/decryption of sse2-i586 vs generic on Intel Atom N270:
 encrypt: 2.35x
 decrypt: 2.54x

Encryption/decryption of sse2-i586 vs generic on AMD Phenom II:
 encrypt: 1.82x
 decrypt: 2.51x

Encryption/decryption of sse2-i586 vs generic on Intel Xeon E7330:
 encrypt: 2.99x
 decrypt: 3.48x

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
arch/x86/crypto/Makefile
arch/x86/crypto/serpent-sse2-i586-asm_32.S [new file with mode: 0644]
arch/x86/include/asm/serpent.h
crypto/Kconfig