android_kernel_google_msm/mm
Linus Torvalds 82a3f4741a mm: remove gup_flags FOLL_WRITE games from __get_user_pages()
commit 19be0eaffa3ac7d8eb6784ad9bdbc7d67ed8e619 upstream.

This is an ancient bug that was actually attempted to be fixed once
(badly) by me eleven years ago in commit 4ceb5db975 ("Fix
get_user_pages() race for write access") but that was then undone due to
problems on s390 by commit f33ea7f404 ("fix get_user_pages bug").

In the meantime, the s390 situation has long been fixed, and we can now
fix it by checking the pte_dirty() bit properly (and do it better).  The
s390 dirty bit was implemented in abf09bed3c ("s390/mm: implement
software dirty bits") which made it into v3.9.  Earlier kernels will
have to look at the page state itself.

Also, the VM has become more scalable, and what used a purely
theoretical race back then has become easier to trigger.

To fix it, we introduce a new internal FOLL_COW flag to mark the "yes,
we already did a COW" rather than play racy games with FOLL_WRITE that
is very fundamental, and then use the pte dirty flag to validate that
the FOLL_COW flag is still valid.

Change-Id: Ifcb16e37ceb6b8845d7a77a97f5fda6670d08378
Reported-and-tested-by: Phil "not Paul" Oester <kernel@linuxace.com>
Acked-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Michal Hocko <mhocko@suse.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[wt: s/gup.c/memory.c; s/follow_page_pte/follow_page_mask;
     s/faultin_page/__get_user_page]
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit 59747d5d21)
2016-10-31 05:37:50 -07:00
..
backing-dev.c bdi: use deferable timer for sync_supers task 2013-02-27 18:16:50 -08:00
bootmem.c mm: sparse: fix usemap allocation above node descriptor section 2016-10-29 23:12:12 +08:00
bounce.c
cleancache.c
compaction.c cma: fix watermark checking 2013-03-15 17:06:38 -07:00
debug-pagealloc.c
dmapool.c
fadvise.c
failslab.c
filemap.c fs: introduce inode operation ->update_time 2015-07-13 11:17:49 -07:00
filemap_xip.c fs: introduce inode operation ->update_time 2015-07-13 11:17:49 -07:00
fremap.c
highmem.c
huge_memory.c
hugetlb.c
hwpoison-inject.c
init-mm.c
internal.h cma: fix watermark checking 2013-03-15 17:06:38 -07:00
Kconfig mm: mmzone: MIGRATE_CMA migration type added 2013-02-27 18:14:01 -08:00
Kconfig.debug
kmemcheck.c
kmemleak-test.c
kmemleak.c
ksm.c ksm: Provide support to use deferred timers for scanner thread 2016-10-29 23:12:17 +08:00
maccess.c
madvise.c mm: add a field to store names for private anonymous memory 2013-10-11 10:02:06 -07:00
Makefile mm: compaction: export some of the functions 2013-02-27 18:13:58 -08:00
memblock.c
memcontrol.c
memory-failure.c mm: page_isolation: MIGRATE_CMA isolation functions added 2013-02-27 18:14:02 -08:00
memory.c mm: remove gup_flags FOLL_WRITE games from __get_user_pages() 2016-10-31 05:37:50 -07:00
memory_hotplug.c mm: page_isolation: MIGRATE_CMA isolation functions added 2013-02-27 18:14:02 -08:00
mempolicy.c mm: fix anon vma naming 2016-10-29 23:12:35 +08:00
mempool.c
migrate.c
mincore.c
mlock.c mm: reorder can_do_mlock to fix audit denial 2015-06-16 23:08:46 -07:00
mm_init.c
mmap.c FROMLIST: mm: mmap: Add new /proc tunable for mmap_base ASLR. 2016-10-29 23:12:40 +08:00
mmu_context.c
mmu_notifier.c
mmzone.c
mprotect.c mm: add a field to store names for private anonymous memory 2013-10-11 10:02:06 -07:00
mremap.c
msync.c
nobootmem.c
nommu.c
oom_kill.c mm, oom: make dump_tasks public 2014-11-18 15:13:25 -08:00
page-writeback.c mm: fix calculation of dirtyable memory 2016-10-29 23:12:16 +08:00
page_alloc.c mm: workaround for widevine playback failed 2013-05-22 07:57:36 +00:00
page_cgroup.c
page_io.c
page_isolation.c mm: page_isolation: MIGRATE_CMA isolation functions added 2013-02-27 18:14:02 -08:00
pagewalk.c
percpu-km.c
percpu-vm.c
percpu.c
pgtable-generic.c
prio_tree.c
process_vm_access.c
quicklist.c
readahead.c mm: change initial readahead window size calculation 2016-10-29 23:12:18 +08:00
rmap.c
shmem.c
slab.c
slob.c
slub.c slub: fix a memory leak in get_partial_node() 2013-03-15 17:09:26 -07:00
sparse-vmemmap.c
sparse.c
swap.c
swap_state.c
swapfile.c
thrash.c
truncate.c
util.c nick kvfree() from apparmor 2014-11-18 15:13:23 -08:00
vmalloc.c
vmscan.c mm: vmscan: clear kswapd's special reclaim powers before exiting 2016-10-29 23:12:33 +08:00
vmstat.c mm: make counts of CMA free pages correct 2013-03-07 15:23:58 -08:00