1
0
Fork 0
mirror of https://github.com/followmsi/android_kernel_google_msm.git synced 2024-11-06 23:17:41 +00:00
android_kernel_google_msm/mm
Khalid Aziz b07ef01645 mm: fix aio performance regression for database caused by THP
commit 7cb2ef56e6 upstream.

I am working with a tool that simulates oracle database I/O workload.
This tool (orion to be specific -
<http://docs.oracle.com/cd/E11882_01/server.112/e16638/iodesign.htm#autoId24>)
allocates hugetlbfs pages using shmget() with SHM_HUGETLB flag.  It then
does aio into these pages from flash disks using various common block
sizes used by database.  I am looking at performance with two of the most
common block sizes - 1M and 64K.  aio performance with these two block
sizes plunged after Transparent HugePages was introduced in the kernel.
Here are performance numbers:

		pre-THP		2.6.39		3.11-rc5
1M read		8384 MB/s	5629 MB/s	6501 MB/s
64K read	7867 MB/s	4576 MB/s	4251 MB/s

I have narrowed the performance impact down to the overheads introduced by
THP in __get_page_tail() and put_compound_page() routines.  perf top shows
>40% of cycles being spent in these two routines.  Every time direct I/O
to hugetlbfs pages starts, kernel calls get_page() to grab a reference to
the pages and calls put_page() when I/O completes to put the reference
away.  THP introduced significant amount of locking overhead to get_page()
and put_page() when dealing with compound pages because hugepages can be
split underneath get_page() and put_page().  It added this overhead
irrespective of whether it is dealing with hugetlbfs pages or transparent
hugepages.  This resulted in 20%-45% drop in aio performance when using
hugetlbfs pages.

Since hugetlbfs pages can not be split, there is no reason to go through
all the locking overhead for these pages from what I can see.  I added
code to __get_page_tail() and put_compound_page() to bypass all the
locking code when working with hugetlbfs pages.  This improved performance
significantly.  Performance numbers with this patch:

		pre-THP		3.11-rc5	3.11-rc5 + Patch
1M read		8384 MB/s	6501 MB/s	8371 MB/s
64K read	7867 MB/s	4251 MB/s	6510 MB/s

Performance with 64K read is still lower than what it was before THP, but
still a 53% improvement.  It does mean there is more work to be done but I
will take a 53% improvement for now.

Please take a look at the following patch and let me know if it looks
reasonable.

[akpm@linux-foundation.org: tweak comments]
Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Pravin B Shelar <pshelar@nicira.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-13 12:01:49 +09:00
..
backing-dev.c backing-dev: fix wakeup timer races with bdi_unregister() 2012-02-01 16:52:49 +08:00
bootmem.c mm: sparse: fix usemap allocation above node descriptor section 2012-10-02 10:30:36 -07:00
bounce.c mm: remove the second argument of k[un]map_atomic() 2012-03-20 21:48:27 +08:00
cleancache.c mm: cleancache: Use __read_mostly as appropiate. 2012-01-23 16:08:09 -05:00
compaction.c mm: compaction: fix echo 1 > compact_memory return error issue 2013-01-17 08:50:43 -08:00
debug-pagealloc.c mm, x86: Remove debug_pagealloc_enabled 2011-12-06 09:24:07 +01:00
dmapool.c mm: dmapool: use provided gfp flags for all dma_alloc_coherent() calls 2012-12-17 10:37:44 -08:00
fadvise.c mm/fadvise.c: drain all pagevecs if POSIX_FADV_DONTNEED fails to discard all pages 2013-02-28 06:59:01 -08:00
failslab.c switch debugfs to umode_t 2012-01-03 22:54:56 -05:00
filemap.c radix-tree: use iterators in find_get_pages* functions 2012-03-28 17:14:37 -07:00
filemap_xip.c mm/filemap_xip.c: fix race condition in xip_file_fault() 2012-02-03 16:16:41 -08:00
fremap.c
highmem.c Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux 2011-11-06 19:44:47 -08:00
huge_memory.c mm/huge_memory.c: fix potential NULL pointer dereference 2013-09-26 17:15:51 -07:00
hugetlb.c futex: Take hugepages into account when generating futex_key 2013-08-20 08:26:28 -07:00
hwpoison-inject.c HWPOISON: Clean up memory_failure() vs. __memory_failure() 2012-01-03 12:06:32 -08:00
init-mm.c
internal.h mm: thp: tail page refcounting fix 2011-11-02 16:06:57 -07:00
Kconfig Merge branch 'master' into x86/memblock 2011-11-28 09:46:22 -08:00
Kconfig.debug mm: more intensive memory corruption debugging 2012-01-10 16:30:42 -08:00
kmemcheck.c
kmemleak-test.c
kmemleak.c kmemleak: Disable early logging when kmemleak is off by default 2012-01-20 16:57:05 +00:00
ksm.c ksm: cleanup: introduce find_mergeable_vma() 2012-03-21 17:54:59 -07:00
maccess.c
madvise.c mm: Hold a file reference in madvise_remove 2012-07-16 09:04:43 -07:00
Makefile
memblock.c x86, mm: Trim memory in memblock to be page aligned 2012-10-31 10:02:56 -07:00
memcontrol.c memcg: fix multiple large threshold notifications 2013-09-26 17:15:50 -07:00
memory-failure.c mm: soft offline: split thp at the beginning of soft_offline_page() 2012-12-10 10:59:39 -08:00
memory.c vm: add vm_iomap_memory() helper function 2013-04-25 21:19:56 -07:00
memory_hotplug.c memory hotplug: fix section info double registration bug 2012-10-02 10:30:06 -07:00
mempolicy.c tmpfs mempolicy: fix /proc/mounts corrupting memory 2013-01-11 09:06:49 -08:00
mempool.c mempool: fix first round failure behavior 2012-01-10 16:30:45 -08:00
migrate.c mm: migration: add migrate_entry_wait_huge() 2013-06-20 11:58:46 -07:00
mincore.c mm: thp: fix pmd_bad() triggering in code paths holding mmap_sem read mode 2012-03-21 17:54:54 -07:00
mlock.c vm: avoid using find_vma_prev() unnecessarily 2012-03-06 18:23:36 -08:00
mm_init.c
mmap.c mm: do not grow the stack vma just because of an overrun on preceding vma 2013-10-22 09:02:25 +01:00
mmu_context.c mm, counters: remove task argument to sync_mm_rss() and __sync_task_rss_stat() 2012-03-21 17:54:59 -07:00
mmu_notifier.c mm: mmu_notifier: re-fix freed page still mapped in secondary MMU 2013-06-07 12:49:25 -07:00
mmzone.c
mprotect.c Merge branch 'akpm' (Andrew's patch-bomb) 2012-03-22 09:04:48 -07:00
mremap.c mm: collapse security_vm_enough_memory() variants into a single function 2012-02-14 10:45:39 +11:00
msync.c
nobootmem.c memblock: free allocated memblock_reserved_regions later 2012-07-16 09:04:45 -07:00
nommu.c vm: add no-mmu vm_iomap_memory() stub 2013-08-20 08:26:27 -07:00
oom_kill.c mm, memcg: give exiting processes access to memory reserves 2013-10-05 07:06:54 -07:00
page-writeback.c writeback: fix negative bdi max pause 2013-11-04 04:23:42 -08:00
page_alloc.c mm, show_mem: suppress page counts in non-blockable contexts 2013-10-13 15:42:49 -07:00
page_cgroup.c page_cgroup: fix horrid swap accounting regression 2012-03-06 08:18:23 -08:00
page_io.c
page_isolation.c
pagewalk.c mm/pagewalk.c: walk_page_range should avoid VM_PFNMAP areas 2013-06-07 12:49:28 -07:00
percpu-km.c
percpu-vm.c percpu: use bitmap_clear 2012-01-20 09:23:16 -08:00
percpu.c kmemleak: Fix the kmemleak tracking of the percpu areas with !SMP 2012-05-09 10:13:29 -07:00
pgtable-generic.c thp: add HPAGE_PMD_* definitions for !CONFIG_TRANSPARENT_HUGEPAGE 2012-03-21 17:55:02 -07:00
prio_tree.c
process_vm_access.c Fix: compat_rw_copy_check_uvector() misuse in aio, readv, writev, and security keys 2013-03-14 11:29:51 -07:00
quicklist.c
readahead.c
rmap.c mm: fix XFS oops due to dirty pages without buffers on s390 2012-10-31 10:02:56 -07:00
shmem.c tmpfs: fix use-after-free of mempolicy object 2013-02-28 06:59:01 -08:00
slab.c slab: fix the DEADLOCK issue on l3 alien lock 2012-10-13 05:38:37 +09:00
slob.c
slub.c slub: fix a memory leak in get_partial_node() 2012-06-10 00:36:11 +09:00
sparse-vmemmap.c
sparse.c mm/vmemmap: fix wrong use of virt_to_page 2012-12-10 10:59:39 -08:00
swap.c mm: fix aio performance regression for database caused by THP 2013-11-13 12:01:49 +09:00
swap_state.c swap: avoid read_swap_cache_async() race to deadlock while waiting on discard I/O completion 2013-06-20 11:58:45 -07:00
swapfile.c swap: fix shmem swapping when more than 8 areas 2012-06-22 11:36:55 -07:00
thrash.c
truncate.c mm: fix invalidate_complete_page2() lock ordering 2012-10-13 05:38:51 +09:00
util.c procfs: mark thread stack correctly in proc/<pid>/maps 2012-03-21 17:54:58 -07:00
vmalloc.c mm: fix faulty initialization in vmalloc_init() 2012-06-10 00:36:06 +09:00
vmscan.c mm: bugfix: set current->reclaim_state to NULL while returning from kswapd() 2012-11-26 11:37:19 -08:00
vmstat.c mm: fix up the vmscan stat in vmstat 2012-04-25 21:26:33 -07:00