android_kernel_samsung_msm8976/fs
Davidlohr Bueso 7c1a95e0ae mm: per-thread vma caching
This patch is a continuation of efforts trying to optimize find_vma(),
avoiding potentially expensive rbtree walks to locate a vma upon faults.
The original approach (https://lkml.org/lkml/2013/11/1/410), where the
largest vma was also cached, ended up being too specific and random,
thus further comparison with other approaches were needed.  There are
two things to consider when dealing with this, the cache hit rate and
the latency of find_vma().  Improving the hit-rate does not necessarily
translate in finding the vma any faster, as the overhead of any fancy
caching schemes can be too high to consider.

We currently cache the last used vma for the whole address space, which
provides a nice optimization, reducing the total cycles in find_vma() by
up to 250%, for workloads with good locality.  On the other hand, this
simple scheme is pretty much useless for workloads with poor locality.
Analyzing ebizzy runs shows that, no matter how many threads are
running, the mmap_cache hit rate is less than 2%, and in many situations
below 1%.

The proposed approach is to replace this scheme with a small per-thread
cache, maximizing hit rates at a very low maintenance cost.
Invalidations are performed by simply bumping up a 32-bit sequence
number.  The only expensive operation is in the rare case of a seq
number overflow, where all caches that share the same address space are
flushed.  Upon a miss, the proposed replacement policy is based on the
page number that contains the virtual address in question.  Concretely,
the following results are seen on an 80 core, 8 socket x86-64 box:

1) System bootup: Most programs are single threaded, so the per-thread
   scheme does improve ~50% hit rate by just adding a few more slots to
   the cache.

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 50.61%   | 19.90            |
| patched        | 73.45%   | 13.58            |
+----------------+----------+------------------+

2) Kernel build: This one is already pretty good with the current
   approach as we're dealing with good locality.

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 75.28%   | 11.03            |
| patched        | 88.09%   | 9.31             |
+----------------+----------+------------------+

3) Oracle 11g Data Mining (4k pages): Similar to the kernel build workload.

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 70.66%   | 17.14            |
| patched        | 91.15%   | 12.57            |
+----------------+----------+------------------+

4) Ebizzy: There's a fair amount of variation from run to run, but this
   approach always shows nearly perfect hit rates, while baseline is just
   about non-existent.  The amounts of cycles can fluctuate between
   anywhere from ~60 to ~116 for the baseline scheme, but this approach
   reduces it considerably.  For instance, with 80 threads:

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 1.06%    | 91.54            |
| patched        | 99.97%   | 14.18            |
+----------------+----------+------------------+

[akpm@linux-foundation.org: fix nommu build, per Davidlohr]
[akpm@linux-foundation.org: document vmacache_valid() logic]
[akpm@linux-foundation.org: attempt to untangle header files]
[akpm@linux-foundation.org: add vmacache_find() BUG_ON]
[hughd@google.com: add vmacache_valid_mm() (from Oleg)]
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: adjust and enhance comments]
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Michel Lespinasse <walken@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Tested-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-27 22:08:06 +02:00
..
9p Merge tag 'LA.BR.1.3.6-03910-8976.0' of https://source.codeaurora.org/quic/la/kernel/msm-3.10 into HEAD 2017-05-26 13:28:48 +02:00
adfs fs: push sync_filesystem() down to the file system's remount_fs() 2015-09-16 18:20:11 +05:30
affs This is the 3.10.84 stable release 2015-09-30 13:25:40 +05:30
afs
autofs4 move d_rcu from overlapping d_child to overlapping d_alias 2015-04-29 10:34:00 +02:00
befs fs: push sync_filesystem() down to the file system's remount_fs() 2015-09-16 18:20:11 +05:30
bfs
btrfs Merge tag 'LA.BR.1.3.6-03910-8976.0' of https://source.codeaurora.org/quic/la/kernel/msm-3.10 into HEAD 2017-05-26 13:28:48 +02:00
cachefiles cachefiles: fix the race between cachefiles_bury_object() and rmdir(2) 2019-07-27 21:52:39 +02:00
ceph move d_rcu from overlapping d_child to overlapping d_alias 2015-04-29 10:34:00 +02:00
cifs This is the 3.10.102 stable release 2017-04-18 17:22:08 +02:00
coda This is the 3.10.84 stable release 2015-09-30 13:25:40 +05:30
configfs
cramfs fs: push sync_filesystem() down to the file system's remount_fs() 2015-09-16 18:20:11 +05:30
crypto fscrypt: remove broken support for detecting keyring key revocation 2019-07-27 21:51:53 +02:00
debugfs BACKPORT: dentry name snapshots 2017-12-22 20:25:56 +00:00
devpts This is the 3.10.98 stable release 2017-04-18 17:17:24 +02:00
dlm
ecryptfs eCryptfs: use after free in ecryptfs_release_messaging() 2019-07-27 21:51:50 +02:00
efivarfs efi: Make efivarfs entries immutable by default 2016-03-16 08:41:37 -07:00
efs fs: push sync_filesystem() down to the file system's remount_fs() 2015-09-16 18:20:11 +05:30
exfat Import latest Samsung release 2017-04-18 03:43:52 +02:00
exofs
exportfs This is the 3.10.84 stable release 2015-09-30 13:25:40 +05:30
ext2 it's still short a few helpers, but infrastructure should be OK now... 2018-12-03 11:52:03 +01:00
ext3 posix_acl: Clear SGID bit when setting file permissions 2019-07-27 21:42:52 +02:00
ext4 ext4: force inode writes when nfsd calls commit_metadata() 2019-07-27 21:53:31 +02:00
f2fs f2fs: move dir data flush to write checkpoint process 2019-07-27 22:06:03 +02:00
fat fs/fat/fatent.c: add cond_resched() to fat_count_free_clusters() 2019-07-27 21:52:38 +02:00
freevxfs fs: push sync_filesystem() down to the file system's remount_fs() 2015-09-16 18:20:11 +05:30
fscache FS-Cache: fix dereference of NULL user_key_payload 2019-07-27 21:44:20 +02:00
fuse fuse: handle zero sized retrieve correctly 2019-07-27 22:06:05 +02:00
gfs2 posix_acl: Clear SGID bit when setting file permissions 2019-07-27 21:42:52 +02:00
hfs Merge remote-tracking branch 'f2fs/linux-3.10.y' into HEAD 2017-04-18 17:02:28 +02:00
hfsplus Merge remote-tracking branch 'f2fs/linux-3.10.y' into HEAD 2017-04-18 17:02:28 +02:00
hostfs uml: fix hostfs mknod() 2016-03-03 15:06:23 -08:00
hpfs Merge remote-tracking branch 'f2fs/linux-3.10.y' into HEAD 2017-04-18 17:02:28 +02:00
hppfs
hugetlbfs mm: larger stack guard gap, between vmas 2017-07-11 00:00:39 +00:00
isofs isofs: fix timestamps beyond 2027 2019-07-27 21:46:04 +02:00
jbd
jbd2 jbd2: if the journal is aborted then don't allow update of the log tail 2019-07-27 21:52:00 +02:00
jffs2 Merge tag 'LA.BR.1.3.6-03910-8976.0' of https://source.codeaurora.org/quic/la/kernel/msm-3.10 into HEAD 2017-05-26 13:28:48 +02:00
jfs posix_acl: Clear SGID bit when setting file permissions 2019-07-27 21:42:52 +02:00
lockd lockd: create NSM handles per net namespace 2016-03-03 15:06:20 -08:00
logfs
minix it's still short a few helpers, but infrastructure should be OK now... 2018-12-03 11:52:03 +01:00
ncpfs This is the 3.10.84 stable release 2015-09-30 13:25:40 +05:30
nfs This is the 3.10.99 stable release 2017-04-18 17:17:46 +02:00
nfs_common
nfsd nfsd: auth: Fix gid sorting when rootsquash enabled 2019-07-27 21:46:18 +02:00
nilfs2 This is the 3.10.84 stable release 2015-09-30 13:25:40 +05:30
nls
notify fanotify: fix logic of events on child 2019-07-27 21:52:17 +02:00
ntfs fs: push sync_filesystem() down to the file system's remount_fs() 2015-09-16 18:20:11 +05:30
ocfs2 posix_acl: Clear SGID bit when setting file permissions 2019-07-27 21:42:52 +02:00
omfs fs, omfs: add NULL terminator in the end up the token list 2015-06-05 23:19:54 -07:00
openpromfs fs: push sync_filesystem() down to the file system's remount_fs() 2015-09-16 18:20:11 +05:30
proc mm: per-thread vma caching 2019-07-27 22:08:06 +02:00
pstore pstore/ram: Do not treat empty buffers as valid 2019-07-27 21:53:37 +02:00
qnx4 fs: push sync_filesystem() down to the file system's remount_fs() 2015-09-16 18:20:11 +05:30
qnx6 fs: push sync_filesystem() down to the file system's remount_fs() 2015-09-16 18:20:11 +05:30
quota
ramfs
reiserfs posix_acl: Clear SGID bit when setting file permissions 2017-04-28 00:00:11 -07:00
romfs fs: push sync_filesystem() down to the file system's remount_fs() 2015-09-16 18:20:11 +05:30
sdcardfs ANDROID: sdcardfs: Add option to not link obb 2019-07-27 21:53:28 +02:00
sdfat Import T813XXS2BRC2 kernel source changes 2018-05-26 00:39:42 +02:00
squashfs Squashfs: Add LZ4 compression configuration option 2015-09-16 18:20:12 +05:30
sysfs Import latest Samsung release 2017-04-18 03:43:52 +02:00
sysv This is the 3.10.97 stable release 2017-04-18 17:17:20 +02:00
ubifs fs: push sync_filesystem() down to the file system's remount_fs() 2015-09-16 18:20:11 +05:30
udf This is the 3.10.98 stable release 2017-04-18 17:17:24 +02:00
ufs fs: push sync_filesystem() down to the file system's remount_fs() 2015-09-16 18:20:11 +05:30
xfs posix_acl: Clear SGID bit when setting file permissions 2019-07-27 21:42:52 +02:00
yaffs2
Kconfig Initial port of sdcardfs 2018-02-06 13:12:17 +01:00
Kconfig.binfmt
Makefile Initial port of sdcardfs 2018-02-06 13:12:17 +01:00
aio.c fix io_destroy()/aio_complete() race 2019-07-27 21:49:38 +02:00
anon_inodes.c
attr.c vfs: Add setattr2 for filesystems with per mount permissions 2018-02-06 13:12:20 +01:00
bad_inode.c
binfmt_aout.c
binfmt_elf.c binfmt_elf: Respect error return from `regset->active' 2019-07-27 21:51:40 +02:00
binfmt_elf_fdpic.c
binfmt_em86.c
binfmt_flat.c
binfmt_misc.c fs/binfmt_misc.c: do not allow offset overflow 2019-07-27 21:52:50 +02:00
binfmt_script.c
binfmt_som.c
bio-integrity.c
bio.c more bio_map_user_iov() leak fixes 2019-07-27 21:45:37 +02:00
block_dev.c block: protect iterate_bdevs() against concurrent close 2019-07-27 21:42:54 +02:00
buffer.c Import latest Samsung release 2017-04-18 03:43:52 +02:00
char_dev.c
compat.c constify ->actor 2015-09-16 18:20:09 +05:30
compat_binfmt_elf.c binfmt_elf: add ELF_HWCAP2 to compat auxv entries 2015-03-19 14:52:32 -07:00
compat_ioctl.c
coredump.c coredump: fix unfreezable coredumping task 2019-07-27 21:42:15 +02:00
coredump.h
dcache.c fs: take_dentry_name_snapshot: avoid kfree under spinlock fixup 2019-07-27 21:45:27 +02:00
dcookies.c
direct-io.c direct-io: Prevent NULL pointer access in submit_page_section 2019-07-27 21:44:19 +02:00
drop_caches.c Import latest Samsung release 2017-04-18 03:43:52 +02:00
eventfd.c
eventpoll.c fs/epoll: drop ovflist branch prediction 2019-07-27 22:06:04 +02:00
exec.c mm: per-thread vma caching 2019-07-27 22:08:06 +02:00
fcntl.c vfs: add missing check for __O_TMPFILE in fcntl_init() 2018-12-03 11:52:41 +01:00
fhandle.c vfs: read file_handle only once in handle_to_path. 2015-07-22 07:25:30 -07:00
file.c
file_table.c get rid of s_files and files_lock 2015-07-03 19:48:08 -07:00
filesystems.c
fs-writeback.c bdi: Fix oops in wb_workfn() 2019-07-27 21:52:12 +02:00
fs_struct.c sdcardfs: override umask on mkdir and create 2018-02-06 13:12:18 +01:00
generic_acl.c tmpfs: clear S_ISGID when setting posix ACLs 2017-04-22 23:02:57 +02:00
inode.c Fix up non-directory creation in SGID directories 2019-07-27 21:51:41 +02:00
internal.h vfs: Allow filesystems to access their private mount data 2018-02-06 13:12:19 +01:00
ioctl.c
ioprio.c block: fix use-after-free in sys_ioprio_get() 2016-11-19 20:01:20 -08:00
libfs.c move d_rcu from overlapping d_child to overlapping d_alias 2015-04-29 10:34:00 +02:00
locks.c locks: fix unlock when fcntl_setlk races with a close 2016-03-09 15:31:53 -08:00
mbcache.c
mount.h
mpage.c
namei.c VFS: Properly free dentry name snapshots in vfs_rename2 2019-07-27 21:46:08 +02:00
namespace.c Don't leak MNT_INTERNAL away from internal mounts 2019-07-27 21:52:13 +02:00
no-block.c
open.c fs: Fix file mode for O_TMPFILE 2018-12-03 11:52:40 +01:00
pipe.c pipe: read buffer limits atomically 2019-07-27 21:49:46 +02:00
pnode.c BACKPORT: smarter propagate_mnt() 2019-07-27 21:51:52 +02:00
pnode.h BACKPORT: smarter propagate_mnt() 2019-07-27 21:51:52 +02:00
posix_acl.c posix_acl: Clear SGID bit when setting file permissions 2019-07-27 21:42:52 +02:00
proc_namespace.c vfs: Allow filesystems to access their private mount data 2018-02-06 13:12:19 +01:00
read_write.c fs: Workaround the compiler's bad optimization 2016-02-04 13:23:34 +05:30
readdir.c fs: readdir: Fix su hide patch for non-iterate filesystems 2017-07-14 21:04:43 +02:00
select.c
seq_file.c Make file credentials available to the seqfile interfaces 2019-07-27 22:05:58 +02:00
signalfd.c signalfd: fix information leak in signalfd_copyinfo 2015-08-16 20:51:42 -07:00
splice.c vfs: fix uninitialized flags in splice_to_pipe() 2019-07-27 21:43:53 +02:00
stack.c
stat.c
statfs.c
super.c vfs: Allow filesystems to access their private mount data 2018-02-06 13:12:19 +01:00
sync.c Import T813XXS2BRC2 kernel source changes 2018-05-26 00:39:42 +02:00
timerfd.c timerfd: Protect the might cancel mechanism proper 2017-11-08 05:33:07 -08:00
utimes.c vfs: Add setattr2 for filesystems with per mount permissions 2018-02-06 13:12:20 +01:00
xattr.c getxattr: use correct xattr length 2019-07-27 21:51:26 +02:00
xattr_acl.c