android_kernel_samsung_msm8976/mm
Mel Gorman 9834213fa8 mm: hugetlbfs: skip shared VMAs when unmapping private pages to satisfy a fault
commit 2f84a8990ebbe235c59716896e017c6b2ca1200f upstream.

SunDong reported the following on

  https://bugzilla.kernel.org/show_bug.cgi?id=103841

	I think I find a linux bug, I have the test cases is constructed. I
	can stable recurring problems in fedora22(4.0.4) kernel version,
	arch for x86_64.  I construct transparent huge page, when the parent
	and child process with MAP_SHARE, MAP_PRIVATE way to access the same
	huge page area, it has the opportunity to lead to huge page copy on
	write failure, and then it will munmap the child corresponding mmap
	area, but then the child mmap area with VM_MAYSHARE attributes, child
	process munmap this area can trigger VM_BUG_ON in set_vma_resv_flags
	functions (vma - > vm_flags & VM_MAYSHARE).

There were a number of problems with the report (e.g.  it's hugetlbfs that
triggers this, not transparent huge pages) but it was fundamentally
correct in that a VM_BUG_ON in set_vma_resv_flags() can be triggered that
looks like this

	 vma ffff8804651fd0d0 start 00007fc474e00000 end 00007fc475e00000
	 next ffff8804651fd018 prev ffff8804651fd188 mm ffff88046b1b1800
	 prot 8000000000000027 anon_vma           (null) vm_ops ffffffff8182a7a0
	 pgoff 0 file ffff88106bdb9800 private_data           (null)
	 flags: 0x84400fb(read|write|shared|mayread|maywrite|mayexec|mayshare|dontexpand|hugetlb)
	 ------------
	 kernel BUG at mm/hugetlb.c:462!
	 SMP
	 Modules linked in: xt_pkttype xt_LOG xt_limit [..]
	 CPU: 38 PID: 26839 Comm: map Not tainted 4.0.4-default #1
	 Hardware name: Dell Inc. PowerEdge R810/0TT6JF, BIOS 2.7.4 04/26/2012
	 set_vma_resv_flags+0x2d/0x30

The VM_BUG_ON is correct because private and shared mappings have
different reservation accounting but the warning clearly shows that the
VMA is shared.

When a private COW fails to allocate a new page then only the process
that created the VMA gets the page -- all the children unmap the page.
If the children access that data in the future then they get killed.

The problem is that the same file is mapped shared and private.  During
the COW, the allocation fails, the VMAs are traversed to unmap the other
private pages but a shared VMA is found and the bug is triggered.  This
patch identifies such VMAs and skips them.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reported-by: SunDong <sund_sky@126.com>
Reviewed-by: Michal Hocko <mhocko@suse.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: David Rientjes <rientjes@google.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22 14:37:50 -07:00
..
Kconfig mmKconfig: add an option to disable bounce 2013-04-29 15:54:40 -07:00
Kconfig.debug
Makefile memcg: add memory.pressure_level events 2013-04-29 15:54:38 -07:00
backing-dev.c bdi: avoid oops on device removal 2014-04-26 17:15:35 -07:00
balloon_compaction.c
bootmem.c
bounce.c mm/bounce.c: fix a regression where MS_SNAP_STABLE (stable pages snapshotting) was ignored 2013-10-13 16:08:33 -07:00
cleancache.c mm: cleancache: clean up cleancache_enabled 2013-04-30 17:04:01 -07:00
compaction.c mm/compaction: fix wrong order check in compact_finished() 2015-03-18 13:22:28 +01:00
debug-pagealloc.c
dmapool.c
fadvise.c teach SYSCALL_DEFINE<n> how to deal with long long/unsigned long long 2013-03-03 22:46:22 -05:00
failslab.c
filemap.c mm: memcg: handle non-error OOM situations more gracefully 2014-11-21 09:22:56 -08:00
filemap_xip.c lift sb_start_write() out of ->write() 2013-04-09 14:12:56 -04:00
fremap.c mm: fix use-after-free in sys_remap_file_pages 2014-01-09 12:24:24 -08:00
frontswap.c mm: frontswap: invalidate expired data on a dup-store failure 2014-12-16 09:09:41 -08:00
highmem.c
huge_memory.c mm: numa: Do not mark PTEs pte_numa when splitting huge pages 2014-10-09 12:18:42 -07:00
hugetlb.c mm: hugetlbfs: skip shared VMAs when unmapping private pages to satisfy a fault 2015-10-22 14:37:50 -07:00
hugetlb_cgroup.c
hwpoison-inject.c
init-mm.c
internal.h mm: accelerate munlock() treatment of THP pages 2013-02-27 19:10:09 -08:00
interval_tree.c
kmemcheck.c
kmemleak-test.c
kmemleak.c hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
ksm.c vm: add VM_FAULT_SIGSEGV handling support 2015-04-29 10:34:00 +02:00
maccess.c
madvise.c mm: madvise: complete input validation before taking lock 2013-04-29 15:54:37 -07:00
memblock.c memblock: fix missing comment of memblock_insert_region() 2013-04-29 15:54:38 -07:00
memcontrol.c mm: memcg: handle non-error OOM situations more gracefully 2014-11-21 09:22:56 -08:00
memory-failure.c mm/hwpoison: fix page refcount of unknown non LRU page 2015-09-13 09:07:59 -07:00
memory.c mm: avoid setting up anonymous pages into file mapping 2015-08-10 12:20:29 -07:00
memory_hotplug.c mm/memory_hotplug.c: set zone->wait_table to null after freeing it 2015-06-22 16:55:54 -07:00
mempolicy.c cpuset,mempolicy: fix sleeping function called from invalid context 2014-07-17 15:58:00 -07:00
mempool.c
migrate.c mm: numa: avoid unnecessary work on the failure path 2014-01-09 12:24:23 -08:00
mincore.c
mlock.c mm: try_to_unmap_cluster() should lock_page() before mlocking 2014-05-06 07:55:32 -07:00
mm_init.c
mmap.c mm/mmap.c: fix arithmetic overflow in __vm_enough_memory() 2015-03-18 13:22:27 +01:00
mmu_context.c mm: remove old aio use_mm() comment 2013-05-07 18:38:27 -07:00
mmu_notifier.c mm: mmu_notifier: re-fix freed page still mapped in secondary MMU 2013-05-24 16:22:51 -07:00
mmzone.c
mprotect.c mm: fix TLB flush race between migration, and change_protection_range 2014-01-09 12:24:23 -08:00
mremap.c mm, thp: close race between mremap() and split_huge_page() 2014-06-07 13:25:31 -07:00
msync.c
nobootmem.c mm, nobootmem: do memset() after memblock_reserve() 2013-04-29 15:54:39 -07:00
nommu.c mm/nommu.c: fix arithmetic overflow in __vm_enough_memory() 2015-03-18 13:22:28 +01:00
oom_kill.c mm: memcg: handle non-error OOM situations more gracefully 2014-11-21 09:22:56 -08:00
page-writeback.c writeback: fix possible underflow in write bandwidth calculation 2015-04-19 10:10:48 +02:00
page_alloc.c OOM, PM: OOM killed task shouldn't escape PM suspend 2014-11-14 08:47:58 -08:00
page_cgroup.c cgroup/kmemleak: add kmemleak_free() for cgroup deallocations. 2014-11-14 08:47:59 -08:00
page_io.c Merge branch 'for-3.10/core' of git://git.kernel.dk/linux-block 2013-05-08 10:13:35 -07:00
page_isolation.c
pagewalk.c mm: pagewalk: call pte_hole() for VM_PFNMAP during walk_page_range 2015-02-11 14:48:16 +08:00
percpu-km.c
percpu-vm.c percpu: perform tlb flush after pcpu_map_pages() failure 2014-10-05 14:54:13 -07:00
percpu.c Revert "percpu: free percpu allocation info for uniprocessor system" 2014-11-14 08:47:53 -08:00
pgtable-generic.c mm: fix TLB flush race between migration, and change_protection_range 2014-01-09 12:24:23 -08:00
process_vm_access.c Fix: compat_rw_copy_check_uvector() misuse in aio, readv, writev, and security keys 2013-03-12 11:05:45 -07:00
quicklist.c
readahead.c teach SYSCALL_DEFINE<n> how to deal with long long/unsigned long long 2013-03-03 22:46:22 -05:00
rmap.c mm: fix sleeping function warning from __put_anon_vma 2014-06-30 20:09:42 -07:00
shmem.c shmem: fix nlink for rename overwrite directory 2014-10-05 14:54:11 -07:00
slab.c slab: fix init_lock_keys 2013-07-21 18:21:26 -07:00
slab.h memcg: check that kmem_cache has memcg_params before accessing it 2013-09-07 22:09:58 -07:00
slab_common.c slab_common: fix the check for duplicate slab names 2014-07-31 12:53:50 -07:00
slob.c
slub.c slub: Fix calculation of cpu slabs 2014-02-13 13:48:00 -08:00
sparse-vmemmap.c sparse-vmemmap: specify vmemmap population range in bytes 2013-04-29 15:54:35 -07:00
sparse.c mm, hotplug: avoid compiling memory hotremove functions when disabled 2013-04-29 15:54:37 -07:00
swap.c mm: close PageTail race 2014-04-03 12:01:05 -07:00
swap_state.c swap: avoid read_swap_cache_async() race to deadlock while waiting on discard I/O completion 2013-06-12 16:29:45 -07:00
swapfile.c frontswap: fix incorrect zeroing and allocation size for frontswap_map 2013-06-12 16:29:46 -07:00
truncate.c mm: Remove false WARN_ON from pagecache_isize_extended() 2014-11-14 08:48:00 -08:00
util.c vm_is_stack: use for_each_thread() rather then buggy while_each_thread() 2014-10-05 14:54:16 -07:00
vmalloc.c mm/vmalloc.c: fix an overflow bug in alloc_vmap_area() 2013-11-13 12:05:34 +09:00
vmpressure.c memcg: add memory.pressure_level events 2013-04-29 15:54:38 -07:00
vmscan.c vmscan: fix increasing nr_isolated incurred by putback unevictable pages 2015-10-01 12:07:32 +02:00
vmstat.c mm: numa: return the number of base pages altered by protection changes 2013-12-08 07:29:27 -08:00