android_kernel_samsung_msm8976/mm
Mel Gorman 0c4bceb3dc mm: page_alloc: use word-based accesses for get/set pageblock bitmaps
The test_bit operations in get/set pageblock flags are expensive.  This
patch reads the bitmap on a word basis and use shifts and masks to isolate
the bits of interest.  Similarly masks are used to set a local copy of the
bitmap and then use cmpxchg to update the bitmap if there have been no
other changes made in parallel.

In a test running dd onto tmpfs the overhead of the pageblock-related
functions went from 1.27% in profiles to 0.5%.

In addition to the performance benefits, this patch closes races that are
possible between:

a) get_ and set_pageblock_migratetype(), where get_pageblock_migratetype()
   reads part of the bits before and other part of the bits after
   set_pageblock_migratetype() has updated them.

b) set_pageblock_migratetype() and set_pageblock_skip(), where the non-atomic
   read-modify-update set bit operation in set_pageblock_skip() will cause
   lost updates to some bits changed in the set_pageblock_migratetype().

Joonsoo Kim first reported the case a) via code inspection.  Vlastimil
Babka's testing with a debug patch showed that either a) or b) occurs
roughly once per mmtests' stress-highalloc benchmark (although not
necessarily in the same pageblock).  Furthermore during development of
unrelated compaction patches, it was observed that frequent calls to
{start,undo}_isolate_page_range() the race occurs several thousands of
times and has resulted in NULL pointer dereferences in move_freepages()
and free_one_page() in places where free_list[migratetype] is
manipulated by e.g.  list_move().  Further debugging confirmed that
migratetype had invalid value of 6, causing out of bounds access to the
free_list array.

That confirmed that the race exist, although it may be extremely rare,
and currently only fatal where page isolation is performed due to
memory hot remove.  Races on pageblocks being updated by
set_pageblock_migratetype(), where both old and new migratetype are
lower MIGRATE_RESERVE, currently cannot result in an invalid value
being observed, although theoretically they may still lead to
unexpected creation or destruction of MIGRATE_RESERVE pageblocks.
Furthermore, things could get suddenly worse when memory isolation is
used more, or when new migratetypes are added.

After this patch, the race has no longer been observed in testing.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Reported-and-tested-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: Ibbcf2ba494831b5f29039ef82be629cb5eacb906
Git-commit: e58469bafd0524e848c3733bc3918d854595e20f
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
[vinmenon@codeaurora.org: resolve trivial merge conflicts]
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2015-10-24 14:15:12 -07:00
..
kasan kasan, module, vmalloc: rework shadow allocation for modules 2015-05-04 14:03:58 -07:00
backing-dev.c arch: Mass conversion of smp_mb__*() 2014-08-15 11:45:28 -07:00
balloon_compaction.c
bootmem.c mm: concentrate modification of totalram_pages into the mm core 2014-02-07 13:49:40 -08:00
bounce.c
cleancache.c
compaction.c mm: page_alloc: add kasan hooks on alloc and free paths 2015-05-04 14:03:54 -07:00
debug-pagealloc.c mm/debug-pagealloc.c: print page physical address for 2015-08-23 23:19:22 -07:00
dmapool.c
early_ioremap.c mm: create generic early_ioremap() support 2014-08-15 11:45:23 -07:00
fadvise.c
failslab.c
filemap.c mm: memcg: handle non-error OOM situations more gracefully 2014-11-21 09:22:56 -08:00
filemap_xip.c
fremap.c
frontswap.c mm: frontswap: invalidate expired data on a dup-store failure 2014-12-16 09:09:41 -08:00
highmem.c
huge_memory.c mm: numa: Do not mark PTEs pte_numa when splitting huge pages 2014-10-09 12:18:42 -07:00
hugetlb.c mm/hugetlb: add migration entry check in __unmap_hugepage_range 2015-03-18 13:22:27 +01:00
hugetlb_cgroup.c
hwpoison-inject.c
init-mm.c
internal.h mm: Enhance per process reclaim to consider shared pages 2015-04-16 10:14:27 -07:00
interval_tree.c
Kconfig mm: Support address range reclaim 2015-04-16 10:15:20 -07:00
Kconfig.debug defconfig: 8994: enable CONFIG_DEBUG_SLUB_PANIC_ON 2014-10-21 14:00:18 -07:00
kmemcheck.c
kmemleak-test.c
kmemleak.c mm: kmemleak: allow safe memory scanning during kmemleak disabling 2015-06-22 10:47:32 +05:30
ksm.c This is the 3.10.84 stable release 2015-09-30 13:25:40 +05:30
maccess.c
madvise.c mm: add a field to store names for private anonymous memory 2014-06-13 12:05:14 -07:00
Makefile mm: slub: add kernel address sanitizer support for slub allocator 2015-05-04 14:03:56 -07:00
memblock.c mm/memblock: add memblock_get_current_limit 2014-04-08 09:51:10 -07:00
memcontrol.c This is the 3.10.67 stable release 2015-04-24 18:04:40 -07:00
memory-failure.c This is the 3.10.84 stable release 2015-09-30 13:25:40 +05:30
memory.c This is the 3.10.84 stable release 2015-09-30 13:25:40 +05:30
memory_hotplug.c This is the 3.10.84 stable release 2015-09-30 13:25:40 +05:30
mempolicy.c Merge upstream tag 'v3.10.49' into msm-3.10 2014-08-20 13:23:09 -07:00
mempool.c
memtest.c memtest: use phys_addr_t for physical addresses 2015-04-01 09:27:43 -07:00
migrate.c mm: Enhance per process reclaim to consider shared pages 2015-04-16 10:14:27 -07:00
mincore.c
mlock.c mm: reorder can_do_mlock to fix audit denial 2015-09-16 18:20:13 +05:30
mm_init.c
mmap.c This is the 3.10.73 stable release 2015-04-24 18:14:57 -07:00
mmu_context.c
mmu_notifier.c
mmzone.c
mprotect.c mm: add a field to store names for private anonymous memory 2014-06-13 12:05:14 -07:00
mremap.c mm, thp: close race between mremap() and split_huge_page() 2014-06-07 13:25:31 -07:00
msync.c
nobootmem.c mm/nobootmem.c: Drop __init annotation from free_bootmem_late 2014-04-21 15:28:38 -07:00
nommu.c This is the 3.10.73 stable release 2015-04-24 18:14:57 -07:00
oom_kill.c This is the 3.10.67 stable release 2015-04-24 18:04:40 -07:00
page-writeback.c This is the 3.10.84 stable release 2015-09-30 13:25:40 +05:30
page_alloc.c mm: page_alloc: use word-based accesses for get/set pageblock bitmaps 2015-10-24 14:15:12 -07:00
page_cgroup.c cgroup/kmemleak: add kmemleak_free() for cgroup deallocations. 2014-11-14 08:47:59 -08:00
page_io.c
page_isolation.c mm/page_alloc: Call kernel_map_pages in unset_migrateype_isolate 2015-03-19 11:34:36 -07:00
pageowner.c debugging: keep track of page owners 2014-03-28 13:33:08 -07:00
pagewalk.c mm: pagewalk: call pte_hole() for VM_PFNMAP during walk_page_range 2015-02-11 14:48:16 +08:00
percpu-km.c
percpu-vm.c percpu: perform tlb flush after pcpu_map_pages() failure 2014-10-05 14:54:13 -07:00
percpu.c Revert "percpu: free percpu allocation info for uniprocessor system" 2014-11-14 08:47:53 -08:00
pgtable-generic.c
process_reclaim.c mm: process_reclaim: use unbounded cpu workqueue 2015-06-11 12:23:28 +05:30
process_vm_access.c
quicklist.c
readahead.c
rmap.c mm: Enhance per process reclaim to consider shared pages 2015-04-16 10:14:27 -07:00
shmem.c This is the 3.10.67 stable release 2015-04-24 18:04:40 -07:00
showmem.c mm: showmem: make the notifiers atomic 2015-06-05 13:54:50 +05:30
slab.c mm: slub: add kernel address sanitizer support for slub allocator 2015-05-04 14:03:56 -07:00
slab.h
slab_common.c mm: slub: add kernel address sanitizer support for slub allocator 2015-05-04 14:03:56 -07:00
slob.c
slub.c mm: slub: add kernel address sanitizer support for slub allocator 2015-05-04 14:03:56 -07:00
sparse-vmemmap.c
sparse.c
swap.c mm: close PageTail race 2014-04-03 12:01:05 -07:00
swap_state.c Merge "lowmemorykiller: Don't count swap cache pages twice" 2015-04-09 06:38:02 -07:00
swapfile.c mm/swapfile.c: do not skip lowest_bit in scan_swap_map() scan loop 2015-09-12 04:57:54 -07:00
truncate.c mm: Remove false WARN_ON from pagecache_isize_extended() 2014-11-14 08:48:00 -08:00
util.c This is the 3.10.67 stable release 2015-04-24 18:04:40 -07:00
vmalloc.c bludgeon the flounder kernel until it builds on i386 for qemu testing 2015-09-16 18:20:19 +05:30
vmpressure.c mm: vmpressure: account allocstalls only on higher pressures 2015-08-25 18:34:09 -07:00
vmscan.c vmscan: fix increasing nr_isolated incurred by putback unevictable pages 2015-08-30 20:50:58 -07:00
vmstat.c Revert "Revert "mm: add cma pcp list"" 2015-06-25 13:10:46 -07:00
zbud.c
zswap.c mm, zswap: Fix CPU hotplug callback registration 2014-07-03 09:55:28 -07:00