android_kernel_samsung_msm8976/kernel
Davidlohr Bueso 7c1a95e0ae mm: per-thread vma caching
This patch is a continuation of efforts trying to optimize find_vma(),
avoiding potentially expensive rbtree walks to locate a vma upon faults.
The original approach (https://lkml.org/lkml/2013/11/1/410), where the
largest vma was also cached, ended up being too specific and random,
thus further comparison with other approaches were needed.  There are
two things to consider when dealing with this, the cache hit rate and
the latency of find_vma().  Improving the hit-rate does not necessarily
translate in finding the vma any faster, as the overhead of any fancy
caching schemes can be too high to consider.

We currently cache the last used vma for the whole address space, which
provides a nice optimization, reducing the total cycles in find_vma() by
up to 250%, for workloads with good locality.  On the other hand, this
simple scheme is pretty much useless for workloads with poor locality.
Analyzing ebizzy runs shows that, no matter how many threads are
running, the mmap_cache hit rate is less than 2%, and in many situations
below 1%.

The proposed approach is to replace this scheme with a small per-thread
cache, maximizing hit rates at a very low maintenance cost.
Invalidations are performed by simply bumping up a 32-bit sequence
number.  The only expensive operation is in the rare case of a seq
number overflow, where all caches that share the same address space are
flushed.  Upon a miss, the proposed replacement policy is based on the
page number that contains the virtual address in question.  Concretely,
the following results are seen on an 80 core, 8 socket x86-64 box:

1) System bootup: Most programs are single threaded, so the per-thread
   scheme does improve ~50% hit rate by just adding a few more slots to
   the cache.

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 50.61%   | 19.90            |
| patched        | 73.45%   | 13.58            |
+----------------+----------+------------------+

2) Kernel build: This one is already pretty good with the current
   approach as we're dealing with good locality.

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 75.28%   | 11.03            |
| patched        | 88.09%   | 9.31             |
+----------------+----------+------------------+

3) Oracle 11g Data Mining (4k pages): Similar to the kernel build workload.

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 70.66%   | 17.14            |
| patched        | 91.15%   | 12.57            |
+----------------+----------+------------------+

4) Ebizzy: There's a fair amount of variation from run to run, but this
   approach always shows nearly perfect hit rates, while baseline is just
   about non-existent.  The amounts of cycles can fluctuate between
   anywhere from ~60 to ~116 for the baseline scheme, but this approach
   reduces it considerably.  For instance, with 80 threads:

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 1.06%    | 91.54            |
| patched        | 99.97%   | 14.18            |
+----------------+----------+------------------+

[akpm@linux-foundation.org: fix nommu build, per Davidlohr]
[akpm@linux-foundation.org: document vmacache_valid() logic]
[akpm@linux-foundation.org: attempt to untangle header files]
[akpm@linux-foundation.org: add vmacache_find() BUG_ON]
[hughd@google.com: add vmacache_valid_mm() (from Oleg)]
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: adjust and enhance comments]
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Michel Lespinasse <walken@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Tested-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-27 22:08:06 +02:00
..
cpu sched/idle: Add missing checks to the exit condition of cpu_idle_poll() 2019-07-27 21:50:45 +02:00
debug mm: per-thread vma caching 2019-07-27 22:08:06 +02:00
events perf/core: Fix perf_pmu_unregister() locking 2019-07-27 21:53:14 +02:00
gcov
irq This is the 3.10.99 stable release 2017-04-18 17:17:46 +02:00
locking locking/rtmutex: Avoid a NULL pointer dereference on deadlock 2019-07-27 22:07:54 +02:00
power PM / wakeup: Only update last time for active wakeup sources 2019-07-27 21:52:47 +02:00
rcu rcu: Don't disable CPU hotplug during OOM notifiers 2016-01-06 23:11:06 -08:00
sched sched/autogroup: Fix possible Spectre-v1 indexing for sched_prio_to_weight[] 2019-07-27 21:52:06 +02:00
time nohz: Fix local_timer_softirq_pending() 2019-07-27 21:52:58 +02:00
trace ring-buffer: Allow for rescheduling when removing pages 2019-07-27 21:51:54 +02:00
.gitignore
acct.c
async.c kernel/async.c: revert "async: simplify lowest_in_progress()" 2019-07-27 21:49:48 +02:00
audit.c BACKPORT: audit: consistently record PIDs with task_tgid_nr() 2019-07-27 21:50:56 +02:00
audit.h Import latest Samsung release 2017-04-18 03:43:52 +02:00
audit_tree.c
audit_watch.c audit: fix use-after-free in audit_add_watch 2019-07-27 21:51:40 +02:00
auditfilter.c
auditsc.c BACKPORT: audit: consistently record PIDs with task_tgid_nr() 2019-07-27 21:50:56 +02:00
backtracetest.c
bounds.c
capability.c
cgroup.c cgroup: prefer %pK to %p 2016-12-06 09:24:09 -08:00
cgroup_freezer.c
compat.c
configs.c
context_tracking.c
cpu.c cpu: send KOBJ_ONLINE event when enabling cpus 2017-07-24 01:09:04 -07:00
cpu_pm.c
cpuset.c cpuset: PF_SPREAD_PAGE and PF_SPREAD_SLAB should be atomic flags 2019-07-27 21:44:59 +02:00
crash_dump.c
cred.c
delayacct.c
dma.c
elfcore.c
exec_domain.c ANDROID: exec_domains: Disable request_module() call for personalities 2016-05-18 14:34:40 +05:30
exit.c kernel: Only expose su when daemon is running 2017-05-15 14:43:52 +00:00
extable.c kernel/extable.c: mark core_kernel_text notrace 2019-07-27 21:44:25 +02:00
fork.c mm: per-thread vma caching 2019-07-27 22:08:06 +02:00
freezer.c
futex.c futex: Remove unnecessary warning from get_futex_key 2019-07-27 21:51:41 +02:00
futex_compat.c ptrace: use fsuid, fsgid, effective creds for fs access checks 2016-02-25 11:57:47 -08:00
groups.c kernel: make groups_sort calling a responsibility group_info allocators 2019-07-27 21:46:18 +02:00
hrtimer.c hrtimer: Ensure POSIX compliance (relative CLOCK_REALTIME hrtimers) 2019-07-27 21:49:51 +02:00
hung_task.c kernel/hung_task.c: break RCU locks based on jiffies 2019-07-27 22:06:04 +02:00
irq_work.c irq_work: Remove BUG_ON in irq_work_run() 2016-01-07 00:42:12 -08:00
itimer.c
jump_label.c
kallsyms.c Import latest Samsung release 2017-04-18 03:43:52 +02:00
kcmp.c ptrace: use fsuid, fsgid, effective creds for fs access checks 2016-02-25 11:57:47 -08:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks Import latest Samsung release 2017-04-18 03:43:52 +02:00
Kconfig.preempt
kexec.c
kmod.c
kprobes.c
ksysfs.c
kthread.c
latencytop.c
Makefile UPSTREAM: KEYS: Separate the kernel signature checking keyring from module signing 2016-05-18 14:36:10 +05:30
modsign_pubkey.c
module-internal.h UPSTREAM: KEYS: Separate the kernel signature checking keyring from module signing 2016-05-18 14:36:10 +05:30
module.c module: Invalidate signatures on force-loaded modules 2019-07-27 21:42:00 +02:00
module_signing.c UPSTREAM: KEYS: Separate the kernel signature checking keyring from module signing 2016-05-18 14:36:10 +05:30
notifier.c
nsproxy.c
padata.c padata: avoid race in reordering 2019-07-27 21:44:05 +02:00
panic.c printk: do cond_resched() between lines while outputting to consoles 2019-07-27 21:41:46 +02:00
params.c kernel/params.c: align add_sysfs_param documentation with code 2019-07-27 21:45:35 +02:00
pid.c BACKPORT: FROMLIST: pids: make task_tgid_nr_ns() safe 2018-05-26 00:39:33 +02:00
pid_namespace.c
posix-cpu-timers.c posix-timers: Sanitize overrun handling 2019-07-27 21:53:21 +02:00
posix-timers.c posix-timers: Sanitize overrun handling 2019-07-27 21:53:21 +02:00
printk.c printk: use rcuidle console tracepoint 2019-07-27 21:44:09 +02:00
profile.c
ptrace.c ptrace: change __ptrace_unlink() to clear ->ptrace under ->siglock 2019-07-27 21:45:46 +02:00
range.c
relay.c kernel/relay.c: limit kmalloc size to KMALLOC_MAX_SIZE 2019-07-27 21:49:13 +02:00
res_counter.c
resource.c /proc/iomem: only expose physical resource addresses to privileged users 2019-07-27 22:05:58 +02:00
seccomp.c UPSTREAM: seccomp: always propagate NO_NEW_PRIVS on tsync 2019-07-27 21:51:01 +02:00
signal.c signal: Only reschedule timers on signals timers have sent 2019-07-27 21:44:51 +02:00
smp.c Import latest Samsung release 2017-04-18 03:43:52 +02:00
smpboot.c
smpboot.h
softirq.c Import latest Samsung release 2017-04-18 03:43:52 +02:00
stacktrace.c
stop_machine.c
sys.c kernel/sys.c: fix potential Spectre v1 issue 2019-07-27 21:52:10 +02:00
sys_ni.c
sysctl.c pipe: reject F_SETPIPE_SZ with size over UINT_MAX 2019-07-27 21:49:46 +02:00
sysctl_binary.c
system_certificates.S UPSTREAM: KEYS: Separate the kernel signature checking keyring from module signing 2016-05-18 14:36:10 +05:30
system_keyring.c UPSTREAM: KEYS: Separate the kernel signature checking keyring from module signing 2016-05-18 14:36:10 +05:30
task_work.c
taskstats.c
test_kprobes.c
time.c time: Make sure jiffies_to_msecs() preserves non-zero time periods 2019-07-27 21:52:48 +02:00
timeconst.bc
timer.c timers: Use proper base migration in add_timer_on() 2019-07-27 21:42:23 +02:00
tracepoint.c
tsacct.c
uid16.c kernel: make groups_sort calling a responsibility group_info allocators 2019-07-27 21:46:18 +02:00
up.c
user-return-notifier.c
user.c
user_namespace.c userns: move user access out of the mutex 2019-07-27 21:51:26 +02:00
utsname.c
utsname_sysctl.c
watchdog.c
workqueue.c workqueue: trigger WARN if queue_delayed_work() is called with NULL @wq 2019-07-27 21:45:23 +02:00
workqueue_internal.h