android_kernel_google_msm/lib
Davidlohr Bueso da5edc8231 lib/int_sqrt.c: optimize square root algorithm
Optimize the current version of the shift-and-subtract (hardware)
algorithm, described by John von Newmann[1] and Guy L Steele.

Iterating 1,000,000 times, perf shows for the current version:

 Performance counter stats for './sqrt-curr' (10 runs):

         27.170996 task-clock                #    0.979 CPUs utilized            ( +-  3.19% )
                 3 context-switches          #    0.103 K/sec                    ( +-  4.76% )
                 0 cpu-migrations            #    0.004 K/sec                    ( +-100.00% )
               104 page-faults               #    0.004 M/sec                    ( +-  0.16% )
        64,921,199 cycles                    #    2.389 GHz                      ( +-  0.03% )
        28,967,789 stalled-cycles-frontend   #   44.62% frontend cycles idle     ( +-  0.18% )
   <not supported> stalled-cycles-backend
       104,502,623 instructions              #    1.61  insns per cycle
                                             #    0.28  stalled cycles per insn  ( +-  0.00% )
        34,088,368 branches                  # 1254.587 M/sec                    ( +-  0.00% )
             4,901 branch-misses             #    0.01% of all branches          ( +-  1.32% )

       0.027763015 seconds time elapsed                                          ( +-  3.22% )

And for the new version:

Performance counter stats for './sqrt-new' (10 runs):

          0.496869 task-clock                #    0.519 CPUs utilized            ( +-  2.38% )
                 0 context-switches          #    0.000 K/sec
                 0 cpu-migrations            #    0.403 K/sec                    ( +-100.00% )
               104 page-faults               #    0.209 M/sec                    ( +-  0.15% )
           590,760 cycles                    #    1.189 GHz                      ( +-  2.35% )
           395,053 stalled-cycles-frontend   #   66.87% frontend cycles idle     ( +-  3.67% )
   <not supported> stalled-cycles-backend
           398,963 instructions              #    0.68  insns per cycle
                                             #    0.99  stalled cycles per insn  ( +-  0.39% )
            70,228 branches                  #  141.341 M/sec                    ( +-  0.36% )
             3,364 branch-misses             #    4.79% of all branches          ( +-  5.45% )

       0.000957440 seconds time elapsed                                          ( +-  2.42% )

Furthermore, this saves space in instruction text:

   text    data     bss     dec     hex filename
    111       0       0     111      6f lib/int_sqrt-baseline.o
     89       0       0      89      59 lib/int_sqrt.o

[1] http://en.wikipedia.org/wiki/First_Draft_of_a_Report_on_the_EDVAC

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Reviewed-by: Jonathan Gonzalez <jgonzlez@linets.cl>
Tested-by: Jonathan Gonzalez <jgonzlez@linets.cl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: flar2 <asegaert@gmail.com>
2019-03-16 12:42:19 +01:00
..
lz4 lz4: fix another possible overrun 2018-01-01 21:26:54 +03:00
lzo lzo: check for length overrun in variable length encoding. 2015-02-02 17:04:44 +08:00
mpi Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
raid6
reed_solomon
xz
zlib_deflate
zlib_inflate
.gitignore
argv_split.c
atomic64.c
atomic64_test.c
audit.c
average.c
bcd.c
bch.c
bitmap.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
bitrev.c
bsearch.c
btree.c lib/btree.c: fix leak of whole btree nodes 2014-08-07 12:00:11 -07:00
bug.c
bust_spinlocks.c
check_signature.c
checksum.c lib/checksum.c: fix build for generic csum_tcpudp_nofold 2015-04-14 17:34:04 +08:00
clz_tab.c
cmdline.c
cordic.c
cpu-notifier-error-inject.c
cpu_rmap.c
cpumask.c
crc-ccitt.c
crc-itu-t.c
crc-t10dif.c
crc7.c
crc8.c
crc16.c
crc32.c
crc32defs.h
ctype.c
debug_locks.c
debugobjects.c
dec_and_lock.c
decompress.c
decompress_bunzip2.c
decompress_inflate.c
decompress_unlzma.c
decompress_unlzo.c lib/lzo: Rename lzo1x_decompress.c to lzo1x_decompress_safe.c 2014-06-26 15:10:29 -04:00
decompress_unxz.c
devres.c devres: fix a for loop bounds check 2016-10-26 23:15:23 +08:00
digsig.c digsig: Fix memory leakage in digsig_verify_rsa() 2013-02-11 08:47:17 -08:00
div64.c
dma-debug.c
dump_stack.c
dynamic_debug.c
dynamic_queue_limits.c
extable.c
fault-inject.c
find_last_bit.c
find_next_bit.c
flex_array.c
gcd.c
gen_crc32table.c
genalloc.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
halfmd4.c
hexdump.c
hweight.c
idr.c idr: fix a subtle bug in idr_get_next() 2015-03-11 21:59:58 -07:00
inflate.c
int_sqrt.c lib/int_sqrt.c: optimize square root algorithm 2019-03-16 12:42:19 +01:00
iomap.c
iomap_copy.c
iommu-helper.c
ioremap.c
irq_regs.c
is_single_threaded.c
kasprintf.c
Kconfig lib: add lz4 compressor module 2018-01-01 21:26:46 +03:00
Kconfig.debug time: Remove CONFIG_TIMER_STATS 2017-07-02 13:03:26 +03:00
Kconfig.kgdb
Kconfig.kmemcheck
klist.c klist: del waiter from klist_remove_waiters before wakeup waitting process 2013-06-07 12:49:13 -07:00
kobject.c kobject: fix kset_find_obj() race with concurrent last kobject_put() 2013-04-16 21:27:27 -07:00
kobject_uevent.c
kstrtox.c
kstrtox.h
lcm.c
libcrc32c.c
list_debug.c
list_sort.c
llist.c
locking-selftest-hardirq.h
locking-selftest-mutex.h
locking-selftest-rlock-hardirq.h
locking-selftest-rlock-softirq.h
locking-selftest-rlock.h
locking-selftest-rsem.h
locking-selftest-softirq.h
locking-selftest-spin-hardirq.h
locking-selftest-spin-softirq.h
locking-selftest-spin.h
locking-selftest-wlock-hardirq.h
locking-selftest-wlock-softirq.h
locking-selftest-wlock.h
locking-selftest-wsem.h
locking-selftest.c
lru_cache.c
Makefile lib: add lz4 compressor module 2018-01-01 21:26:46 +03:00
md5.c
memory_alloc.c flo: Put device-specific code behind #ifndef CONFIG_UML. 2015-05-20 15:22:06 +09:00
nlattr.c netlink: rate-limit leftover bytes warning and print process name 2014-06-26 15:10:28 -04:00
parser.c
pci_iomap.c
percpu_counter.c
plist.c
prio_heap.c
prio_tree.c
proportions.c
radix-tree.c
random32.c random32: include missing header file 2017-10-27 21:48:02 +03:00
ratelimit.c
rational.c
rbtree.c
reciprocal_div.c
rwsem-spinlock.c
rwsem.c
scatterlist.c lib/scatterlist.c: don't flush_kernel_dcache_page on slab page 2013-11-13 12:01:49 +09:00
sha1.c
show_mem.c mm, show_mem: suppress page counts in non-blockable contexts 2013-10-13 15:42:49 -07:00
smp_processor_id.c
sort.c
spinlock_debug.c spinlock_debug: Print offset in addition to symbol name 2013-02-27 18:18:25 -08:00
string.c random: add and use memzero_explicit() for clearing data 2015-02-02 17:04:54 +08:00
string_helpers.c
swiotlb.c
syscall.c
test-kstrtox.c
textsearch.c
timerqueue.c
ts_bm.c
ts_fsm.c
ts_kmp.c
uuid.c
vsprintf.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00