android_kernel_samsung_msm8976/lib
Peter Zijlstra 5881a5ab50 lib/int_sqrt: optimize small argument
commit 3f3295709edea6268ff1609855f498035286af73 upstream.

The current int_sqrt() computation is sub-optimal for the case of small
@x.  Which is the interesting case when we're going to do cumulative
distribution functions on idle times, which we assume to be a random
variable, where the target residency of the deepest idle state gives an
upper bound on the variable (5e6ns on recent Intel chips).

In the case of small @x, the compute loop:

	while (m != 0) {
		b = y + m;
		y >>= 1;

		if (x >= b) {
			x -= b;
			y += m;
		}
		m >>= 2;
	}

can be reduced to:

	while (m > x)
		m >>= 2;

Because y==0, b==m and until x>=m y will remain 0.

And while this is computationally equivalent, it runs much faster
because there's less code, in particular less branches.

      cycles:                 branches:              branch-misses:

OLD:

hot:   45.109444 +- 0.044117  44.333392 +- 0.002254  0.018723 +- 0.000593
cold: 187.737379 +- 0.156678  44.333407 +- 0.002254  6.272844 +- 0.004305

PRE:

hot:   67.937492 +- 0.064124  66.999535 +- 0.000488  0.066720 +- 0.001113
cold: 232.004379 +- 0.332811  66.999527 +- 0.000488  6.914634 +- 0.006568

POST:

hot:   43.633557 +- 0.034373  45.333132 +- 0.002277  0.023529 +- 0.000681
cold: 207.438411 +- 0.125840  45.333132 +- 0.002277  6.976486 +- 0.004219

Averages computed over all values <128k using a LFSR to generate order.
Cold numbers have a LFSR based branch trace buffer 'confuser' ran between
each int_sqrt() invocation.

Link: http://lkml.kernel.org/r/20171020164644.876503355@infradead.org
Fixes: 30493cc9dd ("lib/int_sqrt.c: optimize square root algorithm")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Suggested-by: Anshul Garg <aksgarg1989@gmail.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Joe Perches <joe@perches.com>
Cc: David Miller <davem@davemloft.net>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Michael Davidson <md@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:46:05 +02:00
..
lz4 lz4: fix another possible overrun 2016-05-18 14:34:38 +05:30
lzo lzo: check for length overrun in variable length encoding. 2014-10-30 09:35:11 -07:00
mpi Import latest Samsung release 2017-04-18 03:43:52 +02:00
raid6
reed_solomon
xz
zlib_deflate
zlib_inflate
.gitignore
Kconfig lib: add lz4 compressor module 2015-09-16 18:20:12 +05:30
Kconfig.debug time: Remove CONFIG_TIMER_STATS 2017-04-22 23:02:59 +02:00
Kconfig.kasan kasan: enable instrumentation of global variables 2015-05-04 14:03:57 -07:00
Kconfig.kgdb
Kconfig.kmemcheck
Makefile lib: add lz4 compressor module 2015-09-16 18:20:12 +05:30
argv_split.c
asn1_decoder.c KEYS: fix NULL pointer dereference during ASN.1 parsing [ver #2] 2019-07-27 21:45:51 +02:00
atomic64.c
atomic64_test.c
audit.c
average.c
bcd.c
bch.c
bitmap.c Merge remote-tracking branch 'f2fs/linux-3.10.y' into HEAD 2017-04-18 17:02:28 +02:00
bitrev.c
bsearch.c
btree.c lib/btree.c: fix leak of whole btree nodes 2014-08-07 14:30:27 -07:00
bug.c Import latest Samsung release 2017-04-18 03:43:52 +02:00
build_OID_registry
bust_spinlocks.c
check_signature.c
checksum.c lib/checksum.c: fix build for generic csum_tcpudp_nofold 2015-02-11 14:48:17 +08:00
clz_tab.c
cmdline.c lib/cmdline.c: fix get_options() overflow while parsing ranges 2019-07-27 21:44:24 +02:00
cordic.c
cpu-notifier-error-inject.c
cpu_rmap.c irq: Allow multiple clients to register for irq affinity notification 2014-11-09 15:17:27 -08:00
cpumask.c sched/fair, cpumask: Export for_each_cpu_wrap() 2019-07-27 21:44:52 +02:00
crc-ccitt.c
crc-itu-t.c
crc-t10dif.c
crc7.c
crc8.c
crc16.c
crc32.c
crc32defs.h
ctype.c
debug_locks.c
debugobjects.c debugobjects: use kmemleak_not_leak for obj_cache 2015-05-29 19:35:14 +05:30
dec_and_lock.c
decompress.c
decompress_bunzip2.c decompress_bunzip2: off by one in get_next_block() 2015-01-27 07:52:33 -08:00
decompress_inflate.c lib/decompressors: fix "no limit" output buffer length 2014-02-06 11:08:12 -08:00
decompress_unlzma.c
decompress_unlzo.c
decompress_unxz.c
devres.c This is the 3.10.99 stable release 2017-04-18 17:17:46 +02:00
digsig.c lib/digsig: fix dereference of NULL user_key_payload 2019-07-27 21:44:22 +02:00
div64.c UPSTREAM: math64: New separate div64_u64_rem helper 2016-05-18 14:36:10 +05:30
dma-debug.c dma-debug: switch check from _text to _stext 2016-02-25 11:57:49 -08:00
dump_stack.c
dynamic_debug.c dynamic_debug: Handle kstrdup failure in dynamic_debug_init 2015-06-20 18:25:48 -07:00
dynamic_queue_limits.c
earlycpio.c
extable.c
fault-inject.c debugfs: add get/set for atomic types 2013-10-18 18:13:21 -07:00
fdt.c
fdt_ro.c
fdt_rw.c
fdt_strerror.c
fdt_sw.c
fdt_wip.c
find_last_bit.c
find_next_bit.c
flex_array.c
flex_proportions.c
gcd.c
gen_crc32table.c
genalloc.c Merge upstream linux-stable v3.10.28 into msm-3.10 2014-03-24 14:28:34 -07:00
halfmd4.c
hexdump.c
hweight.c
idr.c idr: fix overflow bug during maximum ID calculation at maximum height 2014-06-30 20:09:42 -07:00
inflate.c
int_sqrt.c lib/int_sqrt: optimize small argument 2019-07-27 21:46:05 +02:00
interval_tree.c
interval_tree_test_main.c
iomap.c lib: iomap: Add MSM RTB support 2014-09-04 19:40:43 -07:00
iomap_copy.c
iommu-helper.c
ioremap.c
iovec.c
irq_regs.c
is_single_threaded.c
jedec_ddr_data.c
kasprintf.c
kfifo.c
klist.c klist: fix starting point removed bug in klist iterators 2016-02-25 11:57:47 -08:00
kobject.c
kobject_uevent.c
kstrtox.c
kstrtox.h
lcm.c
libcrc32c.c
list_debug.c
list_sort.c
llist.c
locking-selftest-hardirq.h
locking-selftest-mutex.h
locking-selftest-rlock-hardirq.h
locking-selftest-rlock-softirq.h
locking-selftest-rlock.h
locking-selftest-rsem.h
locking-selftest-softirq.h
locking-selftest-spin-hardirq.h
locking-selftest-spin-softirq.h
locking-selftest-spin.h
locking-selftest-wlock-hardirq.h
locking-selftest-wlock-softirq.h
locking-selftest-wlock.h
locking-selftest-wsem.h
locking-selftest.c
lru_cache.c
md5.c
memory-notifier-error-inject.c
memweight.c
nlattr.c netlink: rate-limit leftover bytes warning and print process name 2014-06-26 15:12:37 -04:00
notifier-error-inject.c
notifier-error-inject.h
of-reconfig-notifier-error-inject.c
oid_registry.c
parser.c
pci_iomap.c
percpu_counter.c
plist.c
pm-notifier-error-inject.c
prio_heap.c
proportions.c
qmi_encdec.c This is the 3.10.84 stable release 2015-09-30 13:25:40 +05:30
qmi_encdec_priv.h
radix-tree.c radix-tree: fix race in gang lookup 2016-02-25 11:57:49 -08:00
random32.c random32: include missing header file 2017-09-08 18:50:21 +00:00
ratelimit.c Import latest Samsung release 2017-04-18 03:43:52 +02:00
rational.c
rbtree.c rbtree: add postorder iteration functions 2015-09-16 18:20:19 +05:30
rbtree_test.c
reciprocal_div.c
scatterlist.c Merge upstream linux-stable v3.10.28 into msm-3.10 2014-03-24 14:28:34 -07:00
sha1.c
show_mem.c
smp_processor_id.c
sort.c
stmp_device.c
string.c UPSTREAM: lib/string.c: introduce strreplace() 2016-05-18 14:36:10 +05:30
string_helpers.c
strncpy_from_user.c
strnlen_user.c lib: Fix strnlen_user() to not touch memory after specified maximum 2015-06-05 23:19:54 -07:00
swiotlb.c swiotlb: Setting default IO TBL value to 1MB 2014-06-02 08:46:43 -07:00
syscall.c
test-kstrtox.c
test-string_helpers.c
textsearch.c
timerqueue.c
ts_bm.c
ts_fsm.c
ts_kmp.c
ucs2_string.c lib/ucs2_string: Correct ucs2 -> utf8 conversion 2016-03-16 08:41:37 -07:00
usercopy.c
uuid.c
vsprintf.c vsprintf: ignore %n again 2014-05-30 10:23:23 -07:00