android_kernel_samsung_msm8976/crypto
Arnd Bergmann ef1b2640d9 crypto: improve gcc optimization flags for serpent and wp512
commit 7d6e9105026788c497f0ab32fa16c82f4ab5ff61 upstream.

An ancient gcc bug (first reported in 2003) has apparently resurfaced
on MIPS, where kernelci.org reports an overly large stack frame in the
whirlpool hash algorithm:

crypto/wp512.c:987:1: warning: the frame size of 1112 bytes is larger than 1024 bytes [-Wframe-larger-than=]

With some testing in different configurations, I'm seeing large
variations in stack frames size up to 1500 bytes for what should have
around 300 bytes at most. I also checked the reference implementation,
which is essentially the same code but also comes with some test and
benchmarking infrastructure.

It seems that recent compiler versions on at least arm, arm64 and powerpc
have a partial fix for this problem, but enabling "-fsched-pressure", but
even with that fix they suffer from the issue to a certain degree. Some
testing on arm64 shows that the time needed to hash a given amount of
data is roughly proportional to the stack frame size here, which makes
sense given that the wp512 implementation is doing lots of loads for
table lookups, and the problem with the overly large stack is a result
of doing a lot more loads and stores for spilled registers (as seen from
inspecting the object code).

Disabling -fschedule-insns consistently fixes the problem for wp512,
in my collection of cross-compilers, the results are consistently better
or identical when comparing the stack sizes in this function, though
some architectures (notable x86) have schedule-insns disabled by
default.

The four columns are:
default: -O2
press:	 -O2 -fsched-pressure
nopress: -O2 -fschedule-insns -fno-sched-pressure
nosched: -O2 -no-schedule-insns (disables sched-pressure)

				default	press	nopress	nosched
alpha-linux-gcc-4.9.3		1136	848	1136	176
am33_2.0-linux-gcc-4.9.3	2100	2076	2100	2104
arm-linux-gnueabi-gcc-4.9.3	848	848	1048	352
cris-linux-gcc-4.9.3		272	272	272	272
frv-linux-gcc-4.9.3		1128	1000	1128	280
hppa64-linux-gcc-4.9.3		1128	336	1128	184
hppa-linux-gcc-4.9.3		644	308	644	276
i386-linux-gcc-4.9.3		352	352	352	352
m32r-linux-gcc-4.9.3		720	656	720	268
microblaze-linux-gcc-4.9.3	1108	604	1108	256
mips64-linux-gcc-4.9.3		1328	592	1328	208
mips-linux-gcc-4.9.3		1096	624	1096	240
powerpc64-linux-gcc-4.9.3	1088	432	1088	160
powerpc-linux-gcc-4.9.3		1080	584	1080	224
s390-linux-gcc-4.9.3		456	456	624	360
sh3-linux-gcc-4.9.3		292	292	292	292
sparc64-linux-gcc-4.9.3		992	240	992	208
sparc-linux-gcc-4.9.3		680	592	680	312
x86_64-linux-gcc-4.9.3		224	240	272	224
xtensa-linux-gcc-4.9.3		1152	704	1152	304

aarch64-linux-gcc-7.0.0		224	224	1104	208
arm-linux-gnueabi-gcc-7.0.1	824	824	1048	352
mips-linux-gcc-7.0.0		1120	648	1120	272
x86_64-linux-gcc-7.0.1		240	240	304	240

arm-linux-gnueabi-gcc-4.4.7	840			392
arm-linux-gnueabi-gcc-4.5.4	784	728	784	320
arm-linux-gnueabi-gcc-4.6.4	736	728	736	304
arm-linux-gnueabi-gcc-4.7.4	944	784	944	352
arm-linux-gnueabi-gcc-4.8.5	464	464	760	352
arm-linux-gnueabi-gcc-4.9.3	848	848	1048	352
arm-linux-gnueabi-gcc-5.3.1	824	824	1064	336
arm-linux-gnueabi-gcc-6.1.1	808	808	1056	344
arm-linux-gnueabi-gcc-7.0.1	824	824	1048	352

Trying the same test for serpent-generic, the picture is a bit different,
and while -fno-schedule-insns is generally better here than the default,
-fsched-pressure wins overall, so I picked that instead.

				default	press	nopress	nosched
alpha-linux-gcc-4.9.3		1392	864	1392	960
am33_2.0-linux-gcc-4.9.3	536	524	536	528
arm-linux-gnueabi-gcc-4.9.3	552	552	776	536
cris-linux-gcc-4.9.3		528	528	528	528
frv-linux-gcc-4.9.3		536	400	536	504
hppa64-linux-gcc-4.9.3		524	208	524	480
hppa-linux-gcc-4.9.3		768	472	768	508
i386-linux-gcc-4.9.3		564	564	564	564
m32r-linux-gcc-4.9.3		712	576	712	532
microblaze-linux-gcc-4.9.3	724	392	724	512
mips64-linux-gcc-4.9.3		720	384	720	496
mips-linux-gcc-4.9.3		728	384	728	496
powerpc64-linux-gcc-4.9.3	704	304	704	480
powerpc-linux-gcc-4.9.3		704	296	704	480
s390-linux-gcc-4.9.3		560	560	592	536
sh3-linux-gcc-4.9.3		540	540	540	540
sparc64-linux-gcc-4.9.3		544	352	544	496
sparc-linux-gcc-4.9.3		544	344	544	496
x86_64-linux-gcc-4.9.3		528	536	576	528
xtensa-linux-gcc-4.9.3		752	544	752	544

aarch64-linux-gcc-7.0.0		432	432	656	480
arm-linux-gnueabi-gcc-7.0.1	616	616	808	536
mips-linux-gcc-7.0.0		720	464	720	488
x86_64-linux-gcc-7.0.1		536	528	600	536

arm-linux-gnueabi-gcc-4.4.7	592			440
arm-linux-gnueabi-gcc-4.5.4	776	448	776	544
arm-linux-gnueabi-gcc-4.6.4	776	448	776	544
arm-linux-gnueabi-gcc-4.7.4	768	448	768	544
arm-linux-gnueabi-gcc-4.8.5	488	488	776	544
arm-linux-gnueabi-gcc-4.9.3	552	552	776	536
arm-linux-gnueabi-gcc-5.3.1	552	552	776	536
arm-linux-gnueabi-gcc-6.1.1	560	560	776	536
arm-linux-gnueabi-gcc-7.0.1	616	616	808	536

I did not do any runtime tests with serpent, so it is possible that stack
frame size does not directly correlate with runtime performance here and
it actually makes things worse, but it's more likely to help here, and
the reduced stack frame size is probably enough reason to apply the patch,
especially given that the crypto code is often used in deep call chains.

Link: https://kernelci.org/build/id/58797d7559b5149efdf6c3a9/logs/
Link: http://www.larc.usp.br/~pbarreto/WhirlpoolPage.html
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11488
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79149
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2019-07-27 21:43:58 +02:00
..
asymmetric_keys crypto: crypto_memneq - add equality testing of memory regions w/o timing leaks 2019-07-27 21:42:52 +02:00
async_tx raid6test: use prandom_bytes() 2013-04-29 18:28:42 -07:00
842.c crypto: prefix module autoloading with "crypto-" 2015-01-29 17:40:57 -08:00
ablk_helper.c crypto: ablk_helper - Replace memcpy with struct assignment 2015-03-19 14:52:28 -07:00
ablkcipher.c crypto: skcipher - Add crypto_skcipher_has_setkey 2019-07-27 21:42:06 +02:00
aead.c crypto: user - fix info leaks in report API 2013-02-19 20:27:03 +08:00
aes_generic.c crypto: add missing crypto module aliases 2015-01-29 17:40:57 -08:00
af_alg.c crypto: af_alg - Forbid bind(2) when nokey child sockets are present 2019-07-27 21:42:08 +02:00
ahash.c crypto: hash - Add crypto_ahash_has_setkey 2019-07-27 21:42:05 +02:00
algapi.c crypto: api - Clear CRYPTO_ALG_DEAD bit before registering an alg 2019-07-27 21:43:50 +02:00
algboss.c crypto: algboss - Hold ref count on larval 2013-06-25 19:15:17 +08:00
algif_hash.c crypto: algif_hash - Fix race condition in hash_check_key 2019-07-27 21:42:08 +02:00
algif_skcipher.c crypto: algif_skcipher - Fix race condition in skcipher_check_key 2019-07-27 21:42:08 +02:00
ansi_cprng.c crypto: add missing crypto module aliases 2015-01-29 17:40:57 -08:00
anubis.c crypto: prefix module autoloading with "crypto-" 2015-01-29 17:40:57 -08:00
api.c crypto: api - Only abort operations on fatal signal 2015-11-09 10:12:59 -08:00
arc4.c crypto: prefix module autoloading with "crypto-" 2015-01-29 17:40:57 -08:00
authenc.c crypto: crypto_memneq - add equality testing of memory regions w/o timing leaks 2019-07-27 21:42:52 +02:00
authencesn.c crypto: crypto_memneq - add equality testing of memory regions w/o timing leaks 2019-07-27 21:42:52 +02:00
blkcipher.c crypto: skcipher - Fix blkcipher walk OOM crash 2019-07-27 21:42:09 +02:00
blowfish_common.c crypto: blowfish - split generic and common c code 2011-09-22 21:25:25 +10:00
blowfish_generic.c crypto: add missing crypto module aliases 2015-01-29 17:40:57 -08:00
camellia_generic.c crypto: add missing crypto module aliases 2015-01-29 17:40:57 -08:00
cast5_generic.c crypto: add missing crypto module aliases 2015-01-29 17:40:57 -08:00
cast6_generic.c crypto: add missing crypto module aliases 2015-01-29 17:40:57 -08:00
cast_common.c crypto: cast5/cast6 - move lookup tables to shared module 2012-12-06 17:16:26 +08:00
cbc.c crypto: include crypto- module prefix in template 2015-01-29 17:40:57 -08:00
ccm.c crypto: crypto_memneq - add equality testing of memory regions w/o timing leaks 2019-07-27 21:42:52 +02:00
chainiv.c This is the 3.10.67 stable release 2015-04-24 18:04:40 -07:00
cipher.c crypto: cipher - Fix checkpatch errors 2010-02-16 20:31:37 +08:00
cmac.c crypto: include crypto- module prefix in template 2015-01-29 17:40:57 -08:00
compress.c
crc32.c crypto: prefix module autoloading with "crypto-" 2015-01-29 17:40:57 -08:00
crc32c.c crypto: crc32c - add missing crypto module alias 2015-02-11 14:48:18 +08:00
cryptd.c crypto: cryptd - initialize child shash_desc on import 2019-07-27 21:42:09 +02:00
crypto_null.c crypto: prefix module autoloading with "crypto-" 2015-01-29 17:40:57 -08:00
crypto_user.c crypto: user - lock crypto_alg_list on alg dump 2016-02-19 14:22:41 -08:00
crypto_wq.c crypto: crypto_wq - Fix late crypto work queue initialization 2014-06-07 13:25:35 -07:00
ctr.c crypto: include crypto- module prefix in template 2015-01-29 17:40:57 -08:00
cts.c crypto: include crypto- module prefix in template 2015-01-29 17:40:57 -08:00
deflate.c crypto: prefix module autoloading with "crypto-" 2015-01-29 17:40:57 -08:00
des_generic.c crypto: add missing crypto module aliases 2015-01-29 17:40:57 -08:00
ecb.c crypto: include crypto- module prefix in template 2015-01-29 17:40:57 -08:00
eseqiv.c crypto: include crypto- module prefix in template 2015-01-29 17:40:57 -08:00
fcrypt.c crypto: prefix module autoloading with "crypto-" 2015-01-29 17:40:57 -08:00
fips.c crypto: api - Add fips_enable flag 2008-08-29 15:50:02 +10:00
gcm.c crypto: crypto_memneq - add equality testing of memory regions w/o timing leaks 2019-07-27 21:42:52 +02:00
gf128mul.c crypto: gf128mul - fix call to memset() 2011-07-08 17:21:21 +08:00
ghash-generic.c crypto: add missing crypto module aliases 2015-01-29 17:40:57 -08:00
hmac.c crypto: include crypto- module prefix in template 2015-01-29 17:40:57 -08:00
internal.h crypto: algboss - Hold ref count on larval 2013-06-25 19:15:17 +08:00
Kconfig arm: crypto: Add optimized SHA-256/224 2015-09-16 18:20:15 +05:30
khazad.c crypto: prefix module autoloading with "crypto-" 2015-01-29 17:40:57 -08:00
krng.c crypto: add missing crypto module aliases 2015-01-29 17:40:57 -08:00
lrw.c crypto: include crypto- module prefix in template 2015-01-29 17:40:57 -08:00
lzo.c crypto: prefix module autoloading with "crypto-" 2015-01-29 17:40:57 -08:00
Makefile crypto: improve gcc optimization flags for serpent and wp512 2019-07-27 21:43:58 +02:00
md4.c crypto: prefix module autoloading with "crypto-" 2015-01-29 17:40:57 -08:00
md5.c crypto: prefix module autoloading with "crypto-" 2015-01-29 17:40:57 -08:00
memneq.c crypto: crypto_memneq - add equality testing of memory regions w/o timing leaks 2019-07-27 21:42:52 +02:00
michael_mic.c crypto: prefix module autoloading with "crypto-" 2015-01-29 17:40:57 -08:00
pcbc.c crypto: include crypto- module prefix in template 2015-01-29 17:40:57 -08:00
pcompress.c crypto: user - fix info leaks in report API 2013-02-19 20:27:03 +08:00
pcrypt.c crypto: include crypto- module prefix in template 2015-01-29 17:40:57 -08:00
proc.c crypto: add module.h to those files that are explicitly using it 2011-10-31 19:31:11 -04:00
ripemd.h [CRYPTO] ripemd: Put all common RIPEMD values in header file 2008-07-10 20:35:12 +08:00
rmd128.c crypto: prefix module autoloading with "crypto-" 2015-01-29 17:40:57 -08:00
rmd160.c crypto: prefix module autoloading with "crypto-" 2015-01-29 17:40:57 -08:00
rmd256.c crypto: prefix module autoloading with "crypto-" 2015-01-29 17:40:57 -08:00
rmd320.c crypto: prefix module autoloading with "crypto-" 2015-01-29 17:40:57 -08:00
rng.c crypto: user - fix info leaks in report API 2013-02-19 20:27:03 +08:00
salsa20_generic.c crypto: add missing crypto module aliases 2015-01-29 17:40:57 -08:00
scatterwalk.c crypto: scatterwalk - Fix test in scatterwalk_done 2019-07-27 21:41:53 +02:00
seed.c crypto: prefix module autoloading with "crypto-" 2015-01-29 17:40:57 -08:00
seqiv.c crypto: include crypto- module prefix in template 2015-01-29 17:40:57 -08:00
serpent_generic.c crypto: add missing crypto module aliases 2015-01-29 17:40:57 -08:00
sha1_generic.c crypto: add missing crypto module aliases 2015-01-29 17:40:57 -08:00
sha256_generic.c crypto: add missing crypto module aliases 2015-01-29 17:40:57 -08:00
sha512_generic.c crypto: add missing crypto module aliases 2015-01-29 17:40:57 -08:00
shash.c crypto: shash - Fix has_key setting 2019-07-27 21:42:05 +02:00
tcrypt.c crypto: tcrypt - add async cipher speed tests for blowfish 2013-04-25 21:09:03 +08:00
tcrypt.h crypto: ctr - make rfc3686 asynchronous block cipher 2013-01-08 07:03:04 +01:00
tea.c crypto: add missing crypto module aliases 2015-01-29 17:40:57 -08:00
testmgr.c crypto: camellia - add AVX2/AES-NI/x86_64 assembler implementation of camellia cipher 2013-04-25 21:09:07 +08:00
testmgr.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2013-05-02 14:53:12 -07:00
tgr192.c crypto: add missing crypto module aliases 2015-01-29 17:40:57 -08:00
twofish_common.c crypto: twofish-x86_64-3way - add lrw support 2011-11-09 11:53:32 +08:00
twofish_generic.c crypto: add missing crypto module aliases 2015-01-29 17:40:57 -08:00
vmac.c crypto: include crypto- module prefix in template 2015-01-29 17:40:57 -08:00
wp512.c crypto: add missing crypto module aliases 2015-01-29 17:40:57 -08:00
xcbc.c crypto: include crypto- module prefix in template 2015-01-29 17:40:57 -08:00
xor.c add further __init annotations to crypto/xor.c 2012-10-11 13:42:32 +11:00
xts.c crypto: include crypto- module prefix in template 2015-01-29 17:40:57 -08:00
zlib.c crypto: prefix module autoloading with "crypto-" 2015-01-29 17:40:57 -08:00