android_kernel_google_msm/mm
Johannes Weiner 162692c2f8 mm: vmscan: clear kswapd's special reclaim powers before exiting
commit 71abdc15ad upstream.

When kswapd exits, it can end up taking locks that were previously held
by allocating tasks while they waited for reclaim.  Lockdep currently
warns about this:

On Wed, May 28, 2014 at 06:06:34PM +0800, Gu Zheng wrote:
>  inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-R} usage.
>  kswapd2/1151 [HC0[0]:SC0[0]:HE1:SE1] takes:
>   (&sig->group_rwsem){+++++?}, at: exit_signals+0x24/0x130
>  {RECLAIM_FS-ON-W} state was registered at:
>     mark_held_locks+0xb9/0x140
>     lockdep_trace_alloc+0x7a/0xe0
>     kmem_cache_alloc_trace+0x37/0x240
>     flex_array_alloc+0x99/0x1a0
>     cgroup_attach_task+0x63/0x430
>     attach_task_by_pid+0x210/0x280
>     cgroup_procs_write+0x16/0x20
>     cgroup_file_write+0x120/0x2c0
>     vfs_write+0xc0/0x1f0
>     SyS_write+0x4c/0xa0
>     tracesys+0xdd/0xe2
>  irq event stamp: 49
>  hardirqs last  enabled at (49):  _raw_spin_unlock_irqrestore+0x36/0x70
>  hardirqs last disabled at (48):  _raw_spin_lock_irqsave+0x2b/0xa0
>  softirqs last  enabled at (0):  copy_process.part.24+0x627/0x15f0
>  softirqs last disabled at (0):            (null)
>
>  other info that might help us debug this:
>   Possible unsafe locking scenario:
>
>         CPU0
>         ----
>    lock(&sig->group_rwsem);
>    <Interrupt>
>      lock(&sig->group_rwsem);
>
>   *** DEADLOCK ***
>
>  no locks held by kswapd2/1151.
>
>  stack backtrace:
>  CPU: 30 PID: 1151 Comm: kswapd2 Not tainted 3.10.39+ #4
>  Call Trace:
>    dump_stack+0x19/0x1b
>    print_usage_bug+0x1f7/0x208
>    mark_lock+0x21d/0x2a0
>    __lock_acquire+0x52a/0xb60
>    lock_acquire+0xa2/0x140
>    down_read+0x51/0xa0
>    exit_signals+0x24/0x130
>    do_exit+0xb5/0xa50
>    kthread+0xdb/0x100
>    ret_from_fork+0x7c/0xb0

This is because the kswapd thread is still marked as a reclaimer at the
time of exit.  But because it is exiting, nobody is actually waiting on
it to make reclaim progress anymore, and it's nothing but a regular
thread at this point.  Be tidy and strip it of all its powers
(PF_MEMALLOC, PF_SWAPWRITE, PF_KSWAPD, and the lockdep reclaim state)
before returning from the thread function.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30 20:01:31 -07:00
..
backing-dev.c backing-dev: fix wakeup timer races with bdi_unregister() 2012-02-01 16:52:49 +08:00
bootmem.c mm: sparse: fix usemap allocation above node descriptor section 2012-10-02 10:30:36 -07:00
bounce.c mm: remove the second argument of k[un]map_atomic() 2012-03-20 21:48:27 +08:00
cleancache.c mm: cleancache: Use __read_mostly as appropiate. 2012-01-23 16:08:09 -05:00
compaction.c mm: compaction: fix echo 1 > compact_memory return error issue 2013-01-17 08:50:43 -08:00
debug-pagealloc.c
dmapool.c mm: dmapool: use provided gfp flags for all dma_alloc_coherent() calls 2012-12-17 10:37:44 -08:00
fadvise.c mm/fadvise.c: drain all pagevecs if POSIX_FADV_DONTNEED fails to discard all pages 2013-02-28 06:59:01 -08:00
failslab.c switch debugfs to umode_t 2012-01-03 22:54:56 -05:00
filemap.c radix-tree: use iterators in find_get_pages* functions 2012-03-28 17:14:37 -07:00
filemap_xip.c mm/filemap_xip.c: fix race condition in xip_file_fault() 2012-02-03 16:16:41 -08:00
fremap.c
highmem.c mm: highmem: don't treat PKMAP_ADDR(LAST_PKMAP) as a highmem address 2014-06-11 12:04:22 -07:00
huge_memory.c mm/huge_memory.c: fix potential NULL pointer dereference 2013-09-26 17:15:51 -07:00
hugetlb.c mm/hugetlb.c: add cond_resched_lock() in return_unused_surplus_pages() 2014-06-07 16:01:57 -07:00
hwpoison-inject.c HWPOISON: Clean up memory_failure() vs. __memory_failure() 2012-01-03 12:06:32 -08:00
init-mm.c
internal.h mm: setup pageblock_order before it's used by sparsemem 2014-02-20 10:45:32 -08:00
Kconfig
Kconfig.debug mm: more intensive memory corruption debugging 2012-01-10 16:30:42 -08:00
kmemcheck.c
kmemleak-test.c
kmemleak.c kmemleak: Disable early logging when kmemleak is off by default 2012-01-20 16:57:05 +00:00
ksm.c ksm: cleanup: introduce find_mergeable_vma() 2012-03-21 17:54:59 -07:00
maccess.c
madvise.c mm: Hold a file reference in madvise_remove 2012-07-16 09:04:43 -07:00
Makefile
memblock.c x86, mm: Trim memory in memblock to be page aligned 2012-10-31 10:02:56 -07:00
memcontrol.c memcg: fix multiple large threshold notifications 2013-09-26 17:15:50 -07:00
memory-failure.c mm/memory-failure.c: don't let collect_procs() skip over processes for MF_ACTION_REQUIRED 2014-06-30 20:01:31 -07:00
memory.c mm: make fixup_user_fault() check the vma access rights too 2014-06-07 16:02:00 -07:00
memory_hotplug.c mm/hotplug: correctly add new zone to all other nodes' zone lists 2014-03-11 16:10:04 -07:00
mempolicy.c tmpfs mempolicy: fix /proc/mounts corrupting memory 2013-01-11 09:06:49 -08:00
mempool.c mempool: fix first round failure behavior 2012-01-10 16:30:45 -08:00
migrate.c mm: migration: add migrate_entry_wait_huge() 2013-06-20 11:58:46 -07:00
mincore.c mm: thp: fix pmd_bad() triggering in code paths holding mmap_sem read mode 2012-03-21 17:54:54 -07:00
mlock.c vm: avoid using find_vma_prev() unnecessarily 2012-03-06 18:23:36 -08:00
mm_init.c
mmap.c mm: do not grow the stack vma just because of an overrun on preceding vma 2013-10-22 09:02:25 +01:00
mmu_context.c mm, counters: remove task argument to sync_mm_rss() and __sync_task_rss_stat() 2012-03-21 17:54:59 -07:00
mmu_notifier.c mm: mmu_notifier: re-fix freed page still mapped in secondary MMU 2013-06-07 12:49:25 -07:00
mmzone.c
mprotect.c Merge branch 'akpm' (Andrew's patch-bomb) 2012-03-22 09:04:48 -07:00
mremap.c mm: collapse security_vm_enough_memory() variants into a single function 2012-02-14 10:45:39 +11:00
msync.c
nobootmem.c memblock: free allocated memblock_reserved_regions later 2012-07-16 09:04:45 -07:00
nommu.c vm: add no-mmu vm_iomap_memory() stub 2013-08-20 08:26:27 -07:00
oom_kill.c mm, memcg: give exiting processes access to memory reserves 2013-10-05 07:06:54 -07:00
page-writeback.c mm: __set_page_dirty_nobuffers() uses spin_lock_irqsave() instead of spin_lock_irq() 2014-02-20 10:45:32 -08:00
page_alloc.c mm: setup pageblock_order before it's used by sparsemem 2014-02-20 10:45:32 -08:00
page_cgroup.c page_cgroup: fix horrid swap accounting regression 2012-03-06 08:18:23 -08:00
page_io.c
page_isolation.c
pagewalk.c mm/pagewalk.c: walk_page_range should avoid VM_PFNMAP areas 2013-06-07 12:49:28 -07:00
percpu-km.c
percpu-vm.c percpu: use bitmap_clear 2012-01-20 09:23:16 -08:00
percpu.c percpu: make pcpu_alloc_chunk() use pcpu_mem_free() instead of kfree() 2014-06-07 16:02:03 -07:00
pgtable-generic.c thp: add HPAGE_PMD_* definitions for !CONFIG_TRANSPARENT_HUGEPAGE 2012-03-21 17:55:02 -07:00
prio_tree.c
process_vm_access.c Fix: compat_rw_copy_check_uvector() misuse in aio, readv, writev, and security keys 2013-03-14 11:29:51 -07:00
quicklist.c
readahead.c
rmap.c mm: fix sleeping function warning from __put_anon_vma 2014-06-30 20:01:31 -07:00
shmem.c tmpfs: fix use-after-free of mempolicy object 2013-02-28 06:59:01 -08:00
slab.c slab: fix the DEADLOCK issue on l3 alien lock 2012-10-13 05:38:37 +09:00
slob.c
slub.c slub: Fix calculation of cpu slabs 2014-02-13 11:51:09 -08:00
sparse-vmemmap.c
sparse.c mm: setup pageblock_order before it's used by sparsemem 2014-02-20 10:45:32 -08:00
swap.c mm: hugetlbfs: fix hugetlbfs optimization 2014-02-06 11:05:46 -08:00
swap_state.c swap: avoid read_swap_cache_async() race to deadlock while waiting on discard I/O completion 2013-06-20 11:58:45 -07:00
swapfile.c swap: fix shmem swapping when more than 8 areas 2012-06-22 11:36:55 -07:00
thrash.c
truncate.c mm: fix invalidate_complete_page2() lock ordering 2012-10-13 05:38:51 +09:00
util.c procfs: mark thread stack correctly in proc/<pid>/maps 2012-03-21 17:54:58 -07:00
vmalloc.c mm: fix faulty initialization in vmalloc_init() 2012-06-10 00:36:06 +09:00
vmscan.c mm: vmscan: clear kswapd's special reclaim powers before exiting 2014-06-30 20:01:31 -07:00
vmstat.c mm: fix up the vmscan stat in vmstat 2012-04-25 21:26:33 -07:00