android_kernel_google_msm/mm
Michal Hocko 80357219ed mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make any progress
commit 373ccbe5927034b55bdc80b0f8b54d6e13fe8d12 upstream.

Tetsuo Handa has reported that the system might basically livelock in
OOM condition without triggering the OOM killer.

The issue is caused by internal dependency of the direct reclaim on
vmstat counter updates (via zone_reclaimable) which are performed from
the workqueue context.  If all the current workers get assigned to an
allocation request, though, they will be looping inside the allocator
trying to reclaim memory but zone_reclaimable can see stalled numbers so
it will consider a zone reclaimable even though it has been scanned way
too much.  WQ concurrency logic will not consider this situation as a
congested workqueue because it relies that worker would have to sleep in
such a situation.  This also means that it doesn't try to spawn new
workers or invoke the rescuer thread if the one is assigned to the
queue.

In order to fix this issue we need to do two things.  First we have to
let wq concurrency code know that we are in trouble so we have to do a
short sleep.  In order to prevent from issues handled by 0e093d9976
("writeback: do not sleep on the congestion queue if there are no
congested BDIs or if significant congestion is not being encountered in
the current zone") we limit the sleep only to worker threads which are
the ones of the interest anyway.

The second thing to do is to create a dedicated workqueue for vmstat and
mark it WQ_MEM_RECLAIM to note it participates in the reclaim and to
have a spare worker thread for it.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Tejun Heo <tj@kernel.org>
Cc: Cristopher Lameter <clameter@sgi.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Arkadiusz Miskiewicz <arekm@maven.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li <lizefan@huawei.com>
2016-10-26 23:15:37 +08:00
..
backing-dev.c mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make any progress 2016-10-26 23:15:37 +08:00
bootmem.c
bounce.c
cleancache.c
compaction.c mm: compaction: fix echo 1 > compact_memory return error issue 2013-01-17 08:50:43 -08:00
debug-pagealloc.c
dmapool.c mm: dmapool: use provided gfp flags for all dma_alloc_coherent() calls 2012-12-17 10:37:44 -08:00
fadvise.c mm/fadvise.c: drain all pagevecs if POSIX_FADV_DONTNEED fails to discard all pages 2013-02-28 06:59:01 -08:00
failslab.c
filemap.c mm: make sendfile(2) killable 2016-04-27 18:55:29 +08:00
filemap_xip.c
fremap.c
highmem.c mm: highmem: don't treat PKMAP_ADDR(LAST_PKMAP) as a highmem address 2014-06-11 12:04:22 -07:00
huge_memory.c mm, thp: fix collapsing of hugepages on madvise 2015-02-02 17:05:07 +08:00
hugetlb.c mm: hugetlbfs: skip shared VMAs when unmapping private pages to satisfy a fault 2016-04-27 18:55:25 +08:00
hwpoison-inject.c
init-mm.c
internal.h mm: setup pageblock_order before it's used by sparsemem 2014-02-20 10:45:32 -08:00
Kconfig
Kconfig.debug
kmemcheck.c
kmemleak-test.c
kmemleak.c mm: kmemleak: allow safe memory scanning during kmemleak disabling 2015-10-22 09:20:06 +08:00
ksm.c vm: add VM_FAULT_SIGSEGV handling support 2015-04-14 17:33:57 +08:00
maccess.c
madvise.c
Makefile
memblock.c
memcontrol.c memcg: fix multiple large threshold notifications 2013-09-26 17:15:50 -07:00
memory-failure.c mm/memory-failure: call shake_page() when error hits thp tail page 2015-09-18 09:20:36 +08:00
memory.c mm: avoid setting up anonymous pages into file mapping 2016-03-21 09:17:40 +08:00
memory_hotplug.c mm/hotplug: correctly add new zone to all other nodes' zone lists 2014-03-11 16:10:04 -07:00
mempolicy.c slab/mempolicy: always use local policy from interrupt context 2014-09-25 11:49:17 +08:00
mempool.c
migrate.c mm: migrate: Close race between migration completion and mprotect 2014-12-01 18:02:40 +08:00
mincore.c
mlock.c mm: try_to_unmap_cluster() should lock_page() before mlocking 2014-08-07 12:00:11 -07:00
mm_init.c
mmap.c mm/mmap.c: fix arithmetic overflow in __vm_enough_memory() 2015-06-19 11:40:15 +08:00
mmu_context.c
mmu_notifier.c mm: mmu_notifier: re-fix freed page still mapped in secondary MMU 2013-06-07 12:49:25 -07:00
mmzone.c
mprotect.c
mremap.c
msync.c
nobootmem.c
nommu.c mm/nommu.c: fix arithmetic overflow in __vm_enough_memory() 2015-06-19 11:40:15 +08:00
oom_kill.c OOM, PM: OOM killed task shouldn't escape PM suspend 2015-02-02 17:04:55 +08:00
page-writeback.c writeback: use |1 instead of +1 to protect against div by zero 2015-06-19 11:40:34 +08:00
page_alloc.c OOM, PM: OOM killed task shouldn't escape PM suspend 2015-02-02 17:04:55 +08:00
page_cgroup.c cgroup/kmemleak: add kmemleak_free() for cgroup deallocations. 2015-02-02 17:05:07 +08:00
page_io.c
page_isolation.c
pagewalk.c mm/pagewalk.c: walk_page_range should avoid VM_PFNMAP areas 2013-06-07 12:49:28 -07:00
percpu-km.c
percpu-vm.c percpu: perform tlb flush after pcpu_map_pages() failure 2014-12-01 18:02:23 +08:00
percpu.c Revert "percpu: free percpu allocation info for uniprocessor system" 2015-02-02 17:04:38 +08:00
pgtable-generic.c
prio_tree.c
process_vm_access.c Fix: compat_rw_copy_check_uvector() misuse in aio, readv, writev, and security keys 2013-03-14 11:29:51 -07:00
quicklist.c
readahead.c
rmap.c mm: fix anon_vma->degree underflow in anon_vma endless growing prevention 2015-04-14 17:34:04 +08:00
shmem.c shmem: fix nlink for rename overwrite directory 2014-12-01 18:02:39 +08:00
slab.c cpuset: PF_SPREAD_PAGE and PF_SPREAD_SLAB should be atomic flags 2014-12-01 18:02:38 +08:00
slob.c
slub.c slub: refactoring unfreeze_partials() 2015-06-19 11:40:35 +08:00
sparse-vmemmap.c
sparse.c mm: setup pageblock_order before it's used by sparsemem 2014-02-20 10:45:32 -08:00
swap.c mm: hugetlbfs: fix hugetlbfs optimization 2014-02-06 11:05:46 -08:00
swap_state.c swap: avoid read_swap_cache_async() race to deadlock while waiting on discard I/O completion 2013-06-20 11:58:45 -07:00
swapfile.c
thrash.c
truncate.c mm: Remove false WARN_ON from pagecache_isize_extended() 2015-02-02 17:05:24 +08:00
util.c
vmalloc.c mm: kmemleak: avoid false negatives on vmalloc'ed objects 2014-07-31 12:54:53 -07:00
vmscan.c mm: vmscan: clear kswapd's special reclaim powers before exiting 2014-06-30 20:01:31 -07:00
vmstat.c mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make any progress 2016-10-26 23:15:37 +08:00