android_kernel_google_msm/mm
Hugh Dickins 6aa083b195 shmem: fix negative rss in memcg memory.stat
When adding the page_private checks before calling shmem_replace_page(), I
did realize that there is a further race, but thought it too unlikely to
need a hurried fix.

But independently I've been chasing why a mem cgroup's memory.stat
sometimes shows negative rss after all tasks have gone: I expected it to
be a stats gathering bug, but actually it's shmem swapping's fault.

It's an old surprise, that when you lock_page(lookup_swap_cache(swap)),
the page may have been removed from swapcache before getting the lock; or
it may have been freed and reused and be back in swapcache; and it can
even be using the same swap location as before (page_private same).

The swapoff case is already secure against this (swap cannot be reused
until the whole area has been swapped off, and a new swapped on); and
shmem_getpage_gfp() is protected by shmem_add_to_page_cache()'s check for
the expected radix_tree entry - but a little too late.

By that time, we might have already decided to shmem_replace_page(): I
don't know of a problem from that, but I'd feel more at ease not to do so
spuriously.  And we have already done mem_cgroup_cache_charge(), on
perhaps the wrong mem cgroup: and this charge is not then undone on the
error path, because PageSwapCache ends up preventing that.

It's this last case which causes the occasional negative rss in
memory.stat: the page is charged here as cache, but (sometimes) found to
be anon when eventually it's uncharged - and in between, it's an
undeserved charge on the wrong memcg.

Fix this by adding an earlier check on the radix_tree entry: it's
inelegant to descend the tree twice, but swapping is not the fast path,
and a better solution would need a pair (try+commit) of memcg calls, and a
rework of shmem_replace_page() to keep out of the swapcache.

We can use the added shmem_confirm_swap() function to replace the
find_get_page+page_cache_release we were already doing on the error path.
And add a comment on that -EEXIST: it seems a peculiar errno to be using,
but originates from its use in radix_tree_insert().

[It can be surprising to see positive rss left in a memcg's memory.stat
after all tasks have gone, since it is supposed to count anonymous but not
shmem.  Aside from sharing anon pages via fork with a task in some other
memcg, it often happens after swapping: because a swap page can't be freed
while under writeback, nor while locked.  So it's not an error, and these
residual pages are easily freed once pressure demands.]

Change-Id: I8bba4e6d7e901058b31438371ed26384aa3f8ca4
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-07 21:01:18 +03:00
..
backing-dev.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
bootmem.c mm: sparse: fix usemap allocation above node descriptor section 2016-10-29 23:12:12 +08:00
bounce.c
cleancache.c
compaction.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
debug-pagealloc.c
dmapool.c mm: dmapool: use provided gfp flags for all dma_alloc_coherent() calls 2012-12-17 10:37:44 -08:00
fadvise.c mm/fadvise.c: drain all pagevecs if POSIX_FADV_DONTNEED fails to discard all pages 2013-02-28 06:59:01 -08:00
failslab.c
filemap.c mm: kill vma flag VM_CAN_NONLINEAR 2020-11-29 16:11:40 +03:00
filemap_xip.c mm: kill vma flag VM_CAN_NONLINEAR 2020-11-29 16:11:40 +03:00
fremap.c mm: kill vma flag VM_CAN_NONLINEAR 2020-11-29 16:11:40 +03:00
highmem.c mm: highmem: don't treat PKMAP_ADDR(LAST_PKMAP) as a highmem address 2014-06-11 12:04:22 -07:00
huge_memory.c mm, thp: fix collapsing of hugepages on madvise 2015-02-02 17:05:07 +08:00
hugetlb.c Fix incomplete backport of commit 0f792cf949a0 2016-10-26 23:15:44 +08:00
hwpoison-inject.c
init-mm.c
internal.h Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
Kconfig BACKPORT: mm/zsmalloc: add statistics support 2018-01-01 21:27:09 +03:00
Kconfig.debug
kmemcheck.c
kmemleak-test.c
kmemleak.c mm: kmemleak: allow safe memory scanning during kmemleak disabling 2015-10-22 09:20:06 +08:00
ksm.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
maccess.c
madvise.c mm/fs: route MADV_REMOVE to FALLOC_FL_PUNCH_HOLE 2020-12-07 21:00:58 +03:00
Makefile BACKPORT: mm/zpool: implement common zpool api to zbud/zsmalloc 2018-01-01 21:27:00 +03:00
memblock.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
memcontrol.c shmem: replace page if mapping excludes its zone 2020-12-07 20:57:06 +03:00
memory-failure.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
memory.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
memory_hotplug.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
mempolicy.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
mempool.c
migrate.c BACKPORT: Sanitize 'move_pages()' permission checks 2018-01-13 17:13:40 +03:00
mincore.c swap: make each swap partition have one address_space 2018-01-01 22:02:05 +03:00
mlock.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
mm_init.c
mmap.c mm: kill vma flag VM_CAN_NONLINEAR 2020-11-29 16:11:40 +03:00
mmu_context.c
mmu_notifier.c mm: mmu_notifier: re-fix freed page still mapped in secondary MMU 2013-06-07 12:49:25 -07:00
mmzone.c
mprotect.c mm: add a field to store names for private anonymous memory 2013-10-11 10:02:06 -07:00
mremap.c
msync.c
nobootmem.c memblock: free allocated memblock_reserved_regions later 2012-07-16 09:04:45 -07:00
nommu.c mm: kill vma flag VM_CAN_NONLINEAR 2020-11-29 16:11:40 +03:00
oom_kill.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
page-writeback.c mm: fix calculation of dirtyable memory 2016-10-29 23:12:16 +08:00
page_alloc.c mm/page_alloc.c: use '__paginginit' instead of '__init' 2017-12-27 17:13:39 +03:00
page_cgroup.c cgroup/kmemleak: add kmemleak_free() for cgroup deallocations. 2015-02-02 17:05:07 +08:00
page_io.c
page_isolation.c mm: page_isolation: MIGRATE_CMA isolation functions added 2013-02-27 18:14:02 -08:00
pagewalk.c mm/pagewalk.c: walk_page_range should avoid VM_PFNMAP areas 2013-06-07 12:49:28 -07:00
percpu-km.c
percpu-vm.c percpu: perform tlb flush after pcpu_map_pages() failure 2014-12-01 18:02:23 +08:00
percpu.c Revert "percpu: free percpu allocation info for uniprocessor system" 2015-02-02 17:04:38 +08:00
pgtable-generic.c
prio_tree.c
process_vm_access.c Fix: compat_rw_copy_check_uvector() misuse in aio, readv, writev, and security keys 2013-03-14 11:29:51 -07:00
quicklist.c
readahead.c mm: change initial readahead window size calculation 2016-10-29 23:12:18 +08:00
rmap.c mm: fix anon_vma->degree underflow in anon_vma endless growing prevention 2015-04-14 17:34:04 +08:00
shmem.c shmem: fix negative rss in memcg memory.stat 2020-12-07 21:01:18 +03:00
slab.c cpuset: PF_SPREAD_PAGE and PF_SPREAD_SLAB should be atomic flags 2014-12-01 18:02:38 +08:00
slob.c
slub.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
sparse-vmemmap.c
sparse.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
swap.c swap: make each swap partition have one address_space 2018-01-01 22:02:05 +03:00
swap_state.c swap: make each swap partition have one address_space 2018-01-01 22:02:05 +03:00
swapfile.c vfs: make path_openat take a struct filename pointer 2018-12-07 22:28:48 +04:00
thrash.c
truncate.c mm/fs: remove truncate_range 2020-12-07 20:57:30 +03:00
util.c swap: make each swap partition have one address_space 2018-01-01 22:02:05 +03:00
vmalloc.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
vmscan.c mm: new shrinker API 2020-11-29 16:11:30 +03:00
vmstat.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
zpool.c BACKPORT: mm/zpool: add name argument to create zpool 2018-01-01 21:27:09 +03:00
zsmalloc.c UPSTREAM: zsmalloc: fix a null pointer dereference in destroy_handle_cache() 2018-01-01 21:27:14 +03:00