android_kernel_google_msm/mm
Mel Gorman e9c235998f mm: hugetlbfs: skip shared VMAs when unmapping private pages to satisfy a fault
commit 2f84a8990ebbe235c59716896e017c6b2ca1200f upstream.

SunDong reported the following on

  https://bugzilla.kernel.org/show_bug.cgi?id=103841

	I think I find a linux bug, I have the test cases is constructed. I
	can stable recurring problems in fedora22(4.0.4) kernel version,
	arch for x86_64.  I construct transparent huge page, when the parent
	and child process with MAP_SHARE, MAP_PRIVATE way to access the same
	huge page area, it has the opportunity to lead to huge page copy on
	write failure, and then it will munmap the child corresponding mmap
	area, but then the child mmap area with VM_MAYSHARE attributes, child
	process munmap this area can trigger VM_BUG_ON in set_vma_resv_flags
	functions (vma - > vm_flags & VM_MAYSHARE).

There were a number of problems with the report (e.g.  it's hugetlbfs that
triggers this, not transparent huge pages) but it was fundamentally
correct in that a VM_BUG_ON in set_vma_resv_flags() can be triggered that
looks like this

	 vma ffff8804651fd0d0 start 00007fc474e00000 end 00007fc475e00000
	 next ffff8804651fd018 prev ffff8804651fd188 mm ffff88046b1b1800
	 prot 8000000000000027 anon_vma           (null) vm_ops ffffffff8182a7a0
	 pgoff 0 file ffff88106bdb9800 private_data           (null)
	 flags: 0x84400fb(read|write|shared|mayread|maywrite|mayexec|mayshare|dontexpand|hugetlb)
	 ------------
	 kernel BUG at mm/hugetlb.c:462!
	 SMP
	 Modules linked in: xt_pkttype xt_LOG xt_limit [..]
	 CPU: 38 PID: 26839 Comm: map Not tainted 4.0.4-default #1
	 Hardware name: Dell Inc. PowerEdge R810/0TT6JF, BIOS 2.7.4 04/26/2012
	 set_vma_resv_flags+0x2d/0x30

The VM_BUG_ON is correct because private and shared mappings have
different reservation accounting but the warning clearly shows that the
VMA is shared.

When a private COW fails to allocate a new page then only the process
that created the VMA gets the page -- all the children unmap the page.
If the children access that data in the future then they get killed.

The problem is that the same file is mapped shared and private.  During
the COW, the allocation fails, the VMAs are traversed to unmap the other
private pages but a shared VMA is found and the bug is triggered.  This
patch identifies such VMAs and skips them.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reported-by: SunDong <sund_sky@126.com>
Reviewed-by: Michal Hocko <mhocko@suse.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: David Rientjes <rientjes@google.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Zefan Li <lizefan@huawei.com>
2016-04-27 18:55:25 +08:00
..
backing-dev.c
bootmem.c
bounce.c
cleancache.c
compaction.c mm: compaction: fix echo 1 > compact_memory return error issue 2013-01-17 08:50:43 -08:00
debug-pagealloc.c
dmapool.c mm: dmapool: use provided gfp flags for all dma_alloc_coherent() calls 2012-12-17 10:37:44 -08:00
fadvise.c mm/fadvise.c: drain all pagevecs if POSIX_FADV_DONTNEED fails to discard all pages 2013-02-28 06:59:01 -08:00
failslab.c
filemap.c
filemap_xip.c
fremap.c
highmem.c mm: highmem: don't treat PKMAP_ADDR(LAST_PKMAP) as a highmem address 2014-06-11 12:04:22 -07:00
huge_memory.c mm, thp: fix collapsing of hugepages on madvise 2015-02-02 17:05:07 +08:00
hugetlb.c mm: hugetlbfs: skip shared VMAs when unmapping private pages to satisfy a fault 2016-04-27 18:55:25 +08:00
hwpoison-inject.c
init-mm.c
internal.h mm: setup pageblock_order before it's used by sparsemem 2014-02-20 10:45:32 -08:00
Kconfig
Kconfig.debug
kmemcheck.c
kmemleak-test.c
kmemleak.c mm: kmemleak: allow safe memory scanning during kmemleak disabling 2015-10-22 09:20:06 +08:00
ksm.c vm: add VM_FAULT_SIGSEGV handling support 2015-04-14 17:33:57 +08:00
maccess.c
madvise.c
Makefile
memblock.c x86, mm: Trim memory in memblock to be page aligned 2012-10-31 10:02:56 -07:00
memcontrol.c memcg: fix multiple large threshold notifications 2013-09-26 17:15:50 -07:00
memory-failure.c mm/memory-failure: call shake_page() when error hits thp tail page 2015-09-18 09:20:36 +08:00
memory.c mm: avoid setting up anonymous pages into file mapping 2016-03-21 09:17:40 +08:00
memory_hotplug.c mm/hotplug: correctly add new zone to all other nodes' zone lists 2014-03-11 16:10:04 -07:00
mempolicy.c slab/mempolicy: always use local policy from interrupt context 2014-09-25 11:49:17 +08:00
mempool.c
migrate.c mm: migrate: Close race between migration completion and mprotect 2014-12-01 18:02:40 +08:00
mincore.c
mlock.c mm: try_to_unmap_cluster() should lock_page() before mlocking 2014-08-07 12:00:11 -07:00
mm_init.c
mmap.c mm/mmap.c: fix arithmetic overflow in __vm_enough_memory() 2015-06-19 11:40:15 +08:00
mmu_context.c
mmu_notifier.c mm: mmu_notifier: re-fix freed page still mapped in secondary MMU 2013-06-07 12:49:25 -07:00
mmzone.c
mprotect.c
mremap.c
msync.c
nobootmem.c
nommu.c mm/nommu.c: fix arithmetic overflow in __vm_enough_memory() 2015-06-19 11:40:15 +08:00
oom_kill.c OOM, PM: OOM killed task shouldn't escape PM suspend 2015-02-02 17:04:55 +08:00
page-writeback.c writeback: use |1 instead of +1 to protect against div by zero 2015-06-19 11:40:34 +08:00
page_alloc.c OOM, PM: OOM killed task shouldn't escape PM suspend 2015-02-02 17:04:55 +08:00
page_cgroup.c cgroup/kmemleak: add kmemleak_free() for cgroup deallocations. 2015-02-02 17:05:07 +08:00
page_io.c
page_isolation.c
pagewalk.c mm/pagewalk.c: walk_page_range should avoid VM_PFNMAP areas 2013-06-07 12:49:28 -07:00
percpu-km.c
percpu-vm.c percpu: perform tlb flush after pcpu_map_pages() failure 2014-12-01 18:02:23 +08:00
percpu.c Revert "percpu: free percpu allocation info for uniprocessor system" 2015-02-02 17:04:38 +08:00
pgtable-generic.c
prio_tree.c
process_vm_access.c Fix: compat_rw_copy_check_uvector() misuse in aio, readv, writev, and security keys 2013-03-14 11:29:51 -07:00
quicklist.c
readahead.c
rmap.c mm: fix anon_vma->degree underflow in anon_vma endless growing prevention 2015-04-14 17:34:04 +08:00
shmem.c shmem: fix nlink for rename overwrite directory 2014-12-01 18:02:39 +08:00
slab.c cpuset: PF_SPREAD_PAGE and PF_SPREAD_SLAB should be atomic flags 2014-12-01 18:02:38 +08:00
slob.c
slub.c slub: refactoring unfreeze_partials() 2015-06-19 11:40:35 +08:00
sparse-vmemmap.c
sparse.c mm: setup pageblock_order before it's used by sparsemem 2014-02-20 10:45:32 -08:00
swap.c mm: hugetlbfs: fix hugetlbfs optimization 2014-02-06 11:05:46 -08:00
swap_state.c swap: avoid read_swap_cache_async() race to deadlock while waiting on discard I/O completion 2013-06-20 11:58:45 -07:00
swapfile.c
thrash.c
truncate.c mm: Remove false WARN_ON from pagecache_isize_extended() 2015-02-02 17:05:24 +08:00
util.c
vmalloc.c mm: kmemleak: avoid false negatives on vmalloc'ed objects 2014-07-31 12:54:53 -07:00
vmscan.c mm: vmscan: clear kswapd's special reclaim powers before exiting 2014-06-30 20:01:31 -07:00
vmstat.c