android_kernel_google_msm/mm
Russell King 8564b84a62 mm: list_lru: fix almost infinite loop causing effective livelock
I've seen a fair number of issues with kswapd and other processes
appearing to get stuck in v3.12-rc.  Using sysrq-p many times seems to
indicate that it gets stuck somewhere in list_lru_walk_node(), called
from prune_icache_sb() and super_cache_scan().

I never seem to be able to trigger a calltrace for functions above that
point.

So I decided to add the following to super_cache_scan():

    @@ -81,10 +81,14 @@ static unsigned long super_cache_scan(struct shrinker *shrink,
            inodes = list_lru_count_node(&sb->s_inode_lru, sc->nid);
            dentries = list_lru_count_node(&sb->s_dentry_lru, sc->nid);
            total_objects = dentries + inodes + fs_objects + 1;
    +printk("%s:%u: %s: dentries %lu inodes %lu total %lu\n", current->comm, current->pid, __func__, dentries, inodes, total_objects);

            /* proportion the scan between the caches */
            dentries = mult_frac(sc->nr_to_scan, dentries, total_objects);
            inodes = mult_frac(sc->nr_to_scan, inodes, total_objects);
    +printk("%s:%u: %s: dentries %lu inodes %lu\n", current->comm, current->pid, __func__, dentries, inodes);
    +BUG_ON(dentries == 0);
    +BUG_ON(inodes == 0);

            /*
             * prune the dcache first as the icache is pinned by it, then
    @@ -99,7 +103,7 @@ static unsigned long super_cache_scan(struct shrinker *shrink,
                    freed += sb->s_op->free_cached_objects(sb, fs_objects,
                                                           sc->nid);
            }
    -
    +printk("%s:%u: %s: dentries %lu inodes %lu freed %lu\n", current->comm, current->pid, __func__, dentries, inodes, freed);
            drop_super(sb);
            return freed;
     }

and shortly thereafter, having applied some pressure, I got this:

    update-apt-xapi:1616: super_cache_scan: dentries 25632 inodes 2 total 25635
    update-apt-xapi:1616: super_cache_scan: dentries 1023 inodes 0
    ------------[ cut here ]------------
    Kernel BUG at c0101994 [verbose debug info unavailable]
    Internal error: Oops - BUG: 0 [#3] SMP ARM
    Modules linked in: fuse rfcomm bnep bluetooth hid_cypress
    CPU: 0 PID: 1616 Comm: update-apt-xapi Tainted: G      D      3.12.0-rc7+ #154
    task: daea1200 ti: c3bf8000 task.ti: c3bf8000
    PC is at super_cache_scan+0x1c0/0x278
    LR is at trace_hardirqs_on+0x14/0x18
    Process update-apt-xapi (pid: 1616, stack limit = 0xc3bf8240)
    ...
    Backtrace:
      (super_cache_scan) from [<c00cd69c>] (shrink_slab+0x254/0x4c8)
      (shrink_slab) from [<c00d09a0>] (try_to_free_pages+0x3a0/0x5e0)
      (try_to_free_pages) from [<c00c59cc>] (__alloc_pages_nodemask+0x5)
      (__alloc_pages_nodemask) from [<c00e07c0>] (__pte_alloc+0x2c/0x13)
      (__pte_alloc) from [<c00e3a70>] (handle_mm_fault+0x84c/0x914)
      (handle_mm_fault) from [<c001a4cc>] (do_page_fault+0x1f0/0x3bc)
      (do_page_fault) from [<c001a7b0>] (do_translation_fault+0xac/0xb8)
      (do_translation_fault) from [<c000840c>] (do_DataAbort+0x38/0xa0)
      (do_DataAbort) from [<c00133f8>] (__dabt_usr+0x38/0x40)

Notice that we had a very low number of inodes, which were reduced to
zero my mult_frac().

Now, prune_icache_sb() calls list_lru_walk_node() passing that number of
inodes (0) into that as the number of objects to scan:

    long prune_icache_sb(struct super_block *sb, unsigned long nr_to_scan,
                         int nid)
    {
            LIST_HEAD(freeable);
            long freed;

            freed = list_lru_walk_node(&sb->s_inode_lru, nid, inode_lru_isolate,
                                           &freeable, &nr_to_scan);

which does:

    unsigned long
    list_lru_walk_node(struct list_lru *lru, int nid, list_lru_walk_cb isolate,
                       void *cb_arg, unsigned long *nr_to_walk)
    {

            struct list_lru_node    *nlru = &lru->node[nid];
            struct list_head *item, *n;
            unsigned long isolated = 0;

            spin_lock(&nlru->lock);
    restart:
            list_for_each_safe(item, n, &nlru->list) {
                    enum lru_status ret;

                    /*
                     * decrement nr_to_walk first so that we don't livelock if we
                     * get stuck on large numbesr of LRU_RETRY items
                     */
                    if (--(*nr_to_walk) == 0)
                            break;

So, if *nr_to_walk was zero when this function was entered, that means
we're wanting to operate on (~0UL)+1 objects - which might as well be
infinite.

Clearly this is not correct behaviour.  If we think about the behaviour
of this function when *nr_to_walk is 1, then clearly it's wrong - we
decrement first and then test for zero - which results in us doing
nothing at all.  A post-decrement would give the desired behaviour -
we'd try to walk one object and one object only if *nr_to_walk were one.

It also gives the correct behaviour for zero - we exit at this point.

Fixes: 5cedf721a7 ("list_lru: fix broken LRU_RETRY behaviour")
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
[ Modified to make sure we never underflow the count: this function gets
  called in a loop, so the 0 -> ~0ul transition is dangerous  - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Change-Id: I8c53bcc4c70ed978e6cf81a6f38fb06a59cc64ce
2021-11-26 22:02:17 +01:00
..
backing-dev.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
bootmem.c mm: sparse: fix usemap allocation above node descriptor section 2016-10-29 23:12:12 +08:00
bounce.c
cleancache.c
compaction.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
debug-pagealloc.c
dmapool.c mm: dmapool: use provided gfp flags for all dma_alloc_coherent() calls 2012-12-17 10:37:44 -08:00
fadvise.c mm/fadvise.c: drain all pagevecs if POSIX_FADV_DONTNEED fails to discard all pages 2013-02-28 06:59:01 -08:00
failslab.c
filemap.c lib/radix-tree.c: make radix_tree_node_alloc() work correctly within interrupt 2020-12-07 21:02:05 +03:00
filemap_xip.c mm: kill vma flag VM_CAN_NONLINEAR 2020-11-29 16:11:40 +03:00
fremap.c mm: kill vma flag VM_CAN_NONLINEAR 2020-11-29 16:11:40 +03:00
highmem.c mm: highmem: don't treat PKMAP_ADDR(LAST_PKMAP) as a highmem address 2014-06-11 12:04:22 -07:00
huge_memory.c mm, thp: fix collapsing of hugepages on madvise 2015-02-02 17:05:07 +08:00
hugetlb.c Fix incomplete backport of commit 0f792cf949a0 2016-10-26 23:15:44 +08:00
hwpoison-inject.c
init-mm.c
internal.h Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
Kconfig BACKPORT: mm/zsmalloc: add statistics support 2018-01-01 21:27:09 +03:00
Kconfig.debug
kmemcheck.c
kmemleak-test.c
kmemleak.c mm: kmemleak: allow safe memory scanning during kmemleak disabling 2015-10-22 09:20:06 +08:00
ksm.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
list_lru.c mm: list_lru: fix almost infinite loop causing effective livelock 2021-11-26 22:02:17 +01:00
maccess.c
madvise.c mm/fs: route MADV_REMOVE to FALLOC_FL_PUNCH_HOLE 2020-12-07 21:00:58 +03:00
Makefile list: add a new LRU list type 2021-11-26 21:56:07 +01:00
memblock.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
memcontrol.c shmem: replace page if mapping excludes its zone 2020-12-07 20:57:06 +03:00
memory-failure.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
memory.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
memory_hotplug.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
mempolicy.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
mempool.c
migrate.c BACKPORT: Sanitize 'move_pages()' permission checks 2018-01-13 17:13:40 +03:00
mincore.c swap: make each swap partition have one address_space 2018-01-01 22:02:05 +03:00
mlock.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
mm_init.c
mmap.c mm: allow drivers to prevent new writable mappings 2020-12-07 21:08:09 +03:00
mmu_context.c
mmu_notifier.c mm: mmu_notifier: re-fix freed page still mapped in secondary MMU 2013-06-07 12:49:25 -07:00
mmzone.c
mprotect.c mm: add a field to store names for private anonymous memory 2013-10-11 10:02:06 -07:00
mremap.c
msync.c
nobootmem.c memblock: free allocated memblock_reserved_regions later 2012-07-16 09:04:45 -07:00
nommu.c mm: kill vma flag VM_CAN_NONLINEAR 2020-11-29 16:11:40 +03:00
oom_kill.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
page-writeback.c mm: fix calculation of dirtyable memory 2016-10-29 23:12:16 +08:00
page_alloc.c mm: export NR_SHMEM via sysinfo(2) / si_meminfo() interfaces 2020-12-01 19:08:36 +01:00
page_cgroup.c cgroup/kmemleak: add kmemleak_free() for cgroup deallocations. 2015-02-02 17:05:07 +08:00
page_io.c
page_isolation.c mm: page_isolation: MIGRATE_CMA isolation functions added 2013-02-27 18:14:02 -08:00
pagewalk.c mm/pagewalk.c: walk_page_range should avoid VM_PFNMAP areas 2013-06-07 12:49:28 -07:00
percpu-km.c
percpu-vm.c percpu: perform tlb flush after pcpu_map_pages() failure 2014-12-01 18:02:23 +08:00
percpu.c Revert "percpu: free percpu allocation info for uniprocessor system" 2015-02-02 17:04:38 +08:00
pgtable-generic.c
prio_tree.c
process_vm_access.c Fix: compat_rw_copy_check_uvector() misuse in aio, readv, writev, and security keys 2013-03-14 11:29:51 -07:00
quicklist.c
readahead.c mm: change initial readahead window size calculation 2016-10-29 23:12:18 +08:00
rmap.c mm: fix anon_vma->degree underflow in anon_vma endless growing prevention 2015-04-14 17:34:04 +08:00
shmem.c shmem: update memory reservation on truncate 2020-12-23 16:15:47 +03:00
slab.c cpuset: PF_SPREAD_PAGE and PF_SPREAD_SLAB should be atomic flags 2014-12-01 18:02:38 +08:00
slob.c
slub.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
sparse-vmemmap.c
sparse.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
swap.c swap: make each swap partition have one address_space 2018-01-01 22:02:05 +03:00
swap_state.c mm: allow drivers to prevent new writable mappings 2020-12-07 21:08:09 +03:00
swapfile.c vfs: make path_openat take a struct filename pointer 2018-12-07 22:28:48 +04:00
thrash.c
truncate.c mm/fs: remove truncate_range 2020-12-07 20:57:30 +03:00
util.c swap: make each swap partition have one address_space 2018-01-01 22:02:05 +03:00
vmalloc.c mm/vmalloc.c: fix kernel BUG at mm/vmalloc.c:512! 2020-12-01 19:08:45 +01:00
vmscan.c mm: new shrinker API 2020-11-29 16:11:30 +03:00
vmstat.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
zpool.c BACKPORT: mm/zpool: add name argument to create zpool 2018-01-01 21:27:09 +03:00
zsmalloc.c UPSTREAM: zsmalloc: fix a null pointer dereference in destroy_handle_cache() 2018-01-01 21:27:14 +03:00