android_kernel_samsung_msm8976/include
Davidlohr Bueso 7c1a95e0ae mm: per-thread vma caching
This patch is a continuation of efforts trying to optimize find_vma(),
avoiding potentially expensive rbtree walks to locate a vma upon faults.
The original approach (https://lkml.org/lkml/2013/11/1/410), where the
largest vma was also cached, ended up being too specific and random,
thus further comparison with other approaches were needed.  There are
two things to consider when dealing with this, the cache hit rate and
the latency of find_vma().  Improving the hit-rate does not necessarily
translate in finding the vma any faster, as the overhead of any fancy
caching schemes can be too high to consider.

We currently cache the last used vma for the whole address space, which
provides a nice optimization, reducing the total cycles in find_vma() by
up to 250%, for workloads with good locality.  On the other hand, this
simple scheme is pretty much useless for workloads with poor locality.
Analyzing ebizzy runs shows that, no matter how many threads are
running, the mmap_cache hit rate is less than 2%, and in many situations
below 1%.

The proposed approach is to replace this scheme with a small per-thread
cache, maximizing hit rates at a very low maintenance cost.
Invalidations are performed by simply bumping up a 32-bit sequence
number.  The only expensive operation is in the rare case of a seq
number overflow, where all caches that share the same address space are
flushed.  Upon a miss, the proposed replacement policy is based on the
page number that contains the virtual address in question.  Concretely,
the following results are seen on an 80 core, 8 socket x86-64 box:

1) System bootup: Most programs are single threaded, so the per-thread
   scheme does improve ~50% hit rate by just adding a few more slots to
   the cache.

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 50.61%   | 19.90            |
| patched        | 73.45%   | 13.58            |
+----------------+----------+------------------+

2) Kernel build: This one is already pretty good with the current
   approach as we're dealing with good locality.

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 75.28%   | 11.03            |
| patched        | 88.09%   | 9.31             |
+----------------+----------+------------------+

3) Oracle 11g Data Mining (4k pages): Similar to the kernel build workload.

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 70.66%   | 17.14            |
| patched        | 91.15%   | 12.57            |
+----------------+----------+------------------+

4) Ebizzy: There's a fair amount of variation from run to run, but this
   approach always shows nearly perfect hit rates, while baseline is just
   about non-existent.  The amounts of cycles can fluctuate between
   anywhere from ~60 to ~116 for the baseline scheme, but this approach
   reduces it considerably.  For instance, with 80 threads:

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 1.06%    | 91.54            |
| patched        | 99.97%   | 14.18            |
+----------------+----------+------------------+

[akpm@linux-foundation.org: fix nommu build, per Davidlohr]
[akpm@linux-foundation.org: document vmacache_valid() logic]
[akpm@linux-foundation.org: attempt to untangle header files]
[akpm@linux-foundation.org: add vmacache_find() BUG_ON]
[hughd@google.com: add vmacache_valid_mm() (from Oleg)]
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: adjust and enhance comments]
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Michel Lespinasse <walken@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Tested-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-27 22:08:06 +02:00
..
acpi
asm-generic scsi: dma-mapping: always provide dma_get_cache_alignment 2019-07-27 21:46:11 +02:00
clocksource
crypto crypto: authenc - Export key parsing helper function 2019-07-27 21:53:36 +02:00
drm Merge remote-tracking branch 'f2fs/linux-3.10.y' into HEAD 2017-04-18 17:02:28 +02:00
dt-bindings
keys
kvm
linux mm: per-thread vma caching 2019-07-27 22:08:06 +02:00
math-emu
media media: v4l: event: Prevent freeing event subscriptions while accessed 2019-07-27 21:51:55 +02:00
memory
misc
net net: ipv4: use a dedicated counter for icmp_v4 redirect packets 2019-07-27 22:07:53 +02:00
pcmcia
ras
rdma RDMA/core: Fix incorrect structure packing for booleans 2019-07-27 21:43:09 +02:00
rxrpc
scsi scsi: libsas: align sata_device's rps_resp on a cacheline 2019-07-27 21:46:11 +02:00
sdp Import latest Samsung release 2017-04-18 03:43:52 +02:00
soc/qcom Merge tag 'LA.BR.1.3.6-03910-8976.0' of https://source.codeaurora.org/quic/la/kernel/msm-3.10 into HEAD 2017-05-26 13:28:48 +02:00
sound ALSA: pcm: Return -EBUSY for OSS ioctls changing busy streams 2019-07-27 21:52:25 +02:00
target target: Avoid mappedlun symlink creation during lun shutdown 2019-07-27 21:44:16 +02:00
trace ext4: force inode writes when nfsd calls commit_metadata() 2019-07-27 21:53:31 +02:00
uapi ppp: remove the PPPIOCDETACH ioctl 2019-07-27 21:52:16 +02:00
video
xen Merge remote-tracking branch 'f2fs/linux-3.10.y' into HEAD 2017-04-18 17:02:28 +02:00
Kbuild