Commit Graph

7148 Commits

Author SHA1 Message Date
Al Viro 2f4ffad120 cope with potentially long ->d_dname() output for shmem/hugetlb
dynamic_dname() is both too much and too little for those - the
output may be well in excess of 64 bytes dynamic_dname() assumes
to be enough (thanks to ashmem feeding really long names to
shmem_file_setup()) and vsnprintf() is an overkill for those
guys.

Change-Id: Ibbcb570e6634e34016cd3c8d07f817f01be59210
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Git-commit: 118b23022512eb2f41ce42db70dc0568d00be4ba
Git-repo: https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git
Signed-off-by: Subbaraman Narayanamurthy <subbaram@codeaurora.org>
2014-01-09 16:35:41 -08:00
Laura Abbott 3976901d62 mm: make is_vmalloc_addr lockless
is_vmalloc_addr currently takes the vmap_area_lock and walks
the list of vmalloc areas to determine if an address is actually
a vmalloc address. Unfortunately, the current locking structure
with vmap_area_lock does not disable irqs which means that it may
be possible to recusively take the vmap_area_lock if an irq occurs
while walking the tree list. Considering the list of possible
vmalloc vs. not vmalloc ranges is not going to change after bootup,
skip the tree walking and just keep a bitmap of which virtual
addresses are set aside for vmalloc and which are not.

Change-Id: I0c159e09dc17b5c6641d08dcf630e6116b991cd5
CRs-Fixed: 591797
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
2013-12-20 06:02:21 -08:00
Linux Build Service Account 759c97a6c1 Merge "mm: change freepage state correctly in __isolate_free_page" 2013-12-10 00:07:47 -08:00
Linux Build Service Account a077397550 Merge "add extra free kbytes tunable" 2013-12-05 18:32:54 -08:00
Laura Abbott 509af94cee mm: change freepage state correctly in __isolate_free_page
Commit 2e30abd173
(mm: cma: skip watermarks check for already isolated blocks
in split_free_page()) changed the ordering of where the watermark
checks go for isolated pages. There was already an 'enhancement'
present in __isolate_free_page to skip the watermark checks for
CMA pages to increase success. The merging of the enhancment and
the aforementioned commit was done incorrectly, resulting in
the free page state never being modified for CMA pages even if
the CMA page was removed from the free list. Add the enhancement
properly by only checking for CMA pages at the watermark level
and allow the page state to be modfied for CMA pages as well.

Change-Id: Iabea982108d98150f54e5c42b7dbf30f0743653a
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
2013-12-02 16:24:38 -08:00
Rik van Riel 9297315361 add extra free kbytes tunable
Add a userspace visible knob to tell the VM to keep an extra amount
of memory free, by increasing the gap between each zone's min and
low watermarks.

This is useful for realtime applications that call system
calls and have a bound on the number of allocations that happen
in any short time period.  In this application, extra_free_kbytes
would be left at an amount equal to or larger than than the
maximum number of allocations that happen in any burst.

It may also be useful to reduce the memory use of virtual
machines (temporarily?), in a way that does not cause memory
fragmentation like ballooning does.

[ccross]
Revived for use on old kernels where no other solution exists.
The tunable will be removed on kernels that do better at avoiding
direct reclaim.

Change-Id: I765a42be8e964bfd3e2886d1ca85a29d60c3bb3e
Signed-off-by: Rik van Riel<riel@redhat.com>
Signed-off-by: Colin Cross <ccross@android.com>
Git-commit: 92189d47f66c67e5fd92eafaa287e153197a454f
Git-repo: https://android.googlesource.com/kernel/common/
[bhargavuln@codeaurora.org: resolve trivial merge conflicts]
Signed-off-by: Bhargav Upperla <bhargavuln@codeaurora.org>
2013-11-25 11:56:24 -08:00
Chen LinX b3a8b72c74 mm/pagewalk.c: fix walk_page_range() access of wrong PTEs
When walk_page_range walk a memory map's page tables, it'll skip
VM_PFNMAP area, then variable 'next' will to assign to vma->vm_end, it
maybe larger than 'end'.  In next loop, 'addr' will be larger than
'next'.  Then in /proc/XXXX/pagemap file reading procedure, the 'addr'
will growing forever in pagemap_pte_range, pte_to_pagemap_entry will
access the wrong pte.

  BUG: Bad page map in process procrank  pte:8437526f pmd:785de067
  addr:9108d000 vm_flags:00200073 anon_vma:f0d99020 mapping:  (null) index:9108d
  CPU: 1 PID: 4974 Comm: procrank Tainted: G    B   W  O 3.10.1+ #1
  Call Trace:
    dump_stack+0x16/0x18
    print_bad_pte+0x114/0x1b0
    vm_normal_page+0x56/0x60
    pagemap_pte_range+0x17a/0x1d0
    walk_page_range+0x19e/0x2c0
    pagemap_read+0x16e/0x200
    vfs_read+0x84/0x150
    SyS_read+0x4a/0x80
    syscall_call+0x7/0xb

Change-Id: Ife5d6781a0b3b8daa1d9060d6d9496b4a3b2f23f
Signed-off-by: Liu ShuoX <shuox.liu@intel.com>
Signed-off-by: Chen LinX <linx.z.chen@intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: <stable@vger.kernel.org>	[3.10.x+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: 3017f079efd6af199b0852b5c425364513db460e
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Matt Wagantall <mattw@codeaurora.org>
2013-11-21 20:09:01 -08:00
Linux Build Service Account 032d0fe64f Merge "mm: make is_vmalloc_addr work properly." 2013-11-20 06:16:01 -08:00
Laura Abbott acce1041cd mm: make is_vmalloc_addr work properly.
There was a typo in the config guard for CONFIG_ENABLE_VMALLOC_SAVING
which meant that the code was never actually being compiled. As a
result, it was never noticed that the code had major flaws. Fix
the code to actually work as intended.

Change-Id: Ief3c00d16cf54e3b945ffb1bfde6b1fea2fa142e
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
2013-11-19 11:15:05 -08:00
Olav Haugan 72ab04b8d1 mm: swap: Rate limit swap write errors
If an error occurs in the swap subsystem when writing pages
out to a swap device the system might get overflowed with
logging errors because the mm subsystem will continously try
to swap out pages even in the event of error on the swap
device.

Reduce the amount of logging to prevent the kernel log
from being filled with these error messages.

Change-Id: Iddaf6c32ae87132817482cc4cb68b909a1c527e6
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2013-11-14 09:27:11 -08:00
Olav Haugan 4c2fde4241 mm: vmscan: Move pages that fail swapout to LRU active list
Move pages that fail swapout to the LRU active list to reduce
pressure on swap device when swapping out is already failing.
This helps when using a pseudo swap device such as zram which
starts failing when memory is low.

Change-Id: Ib136cd0a744378aa93d837a24b9143ee818c80b3
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2013-11-14 09:00:00 -08:00
Lisa Du 3c72e1f71a mm: vmscan: fix do_try_to_free_pages() livelock
This patch is based on KOSAKI's work and I add a little more description,
please refer https://lkml.org/lkml/2012/6/14/74.

Currently, I found system can enter a state that there are lots of free
pages in a zone but only order-0 and order-1 pages which means the zone is
heavily fragmented, then high order allocation could make direct reclaim
path's long stall(ex, 60 seconds) especially in no swap and no compaciton
enviroment.  This problem happened on v3.4, but it seems issue still lives
in current tree, the reason is do_try_to_free_pages enter live lock:

kswapd will go to sleep if the zones have been fully scanned and are still
not balanced.  As kswapd thinks there's little point trying all over again
to avoid infinite loop.  Instead it changes order from high-order to
0-order because kswapd think order-0 is the most important.  Look at
73ce02e9 in detail.  If watermarks are ok, kswapd will go back to sleep
and may leave zone->all_unreclaimable =3D 0.  It assume high-order users
can still perform direct reclaim if they wish.

Direct reclaim continue to reclaim for a high order which is not a
COSTLY_ORDER without oom-killer until kswapd turn on
zone->all_unreclaimble= .  This is because to avoid too early oom-kill.
So it means direct_reclaim depends on kswapd to break this loop.

In worst case, direct-reclaim may continue to page reclaim forever when
kswapd sleeps forever until someone like watchdog detect and finally kill
the process.  As described in:
http://thread.gmane.org/gmane.linux.kernel.mm/103737

We can't turn on zone->all_unreclaimable from direct reclaim path because
direct reclaim path don't take any lock and this way is racy.  Thus this
patch removes zone->all_unreclaimable field completely and recalculates
zone reclaimable state every time.

Note: we can't take the idea that direct-reclaim see zone->pages_scanned
directly and kswapd continue to use zone->all_unreclaimable.  Because, it
is racy.  commit 929bea7c71 (vmscan: all_unreclaimable() use
zone->all_unreclaimable as a name) describes the detail.

Change-Id: I28cffd677bc9c2d8521849b1a16e211ed24b6d3f
[akpm@linux-foundation.org: uninline zone_reclaimable_pages() and zone_reclaimable()]
Cc: Aaditya Kumar <aaditya.kumar.30@gmail.com>
Cc: Ying Han <yinghan@google.com>
Cc: Nick Piggin <npiggin@gmail.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Bob Liu <lliubbo@gmail.com>
Cc: Neil Zhang <zhangwm@marvell.com>
Cc: Russell King - ARM Linux <linux@arm.linux.org.uk>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Lisa Du <cldu@marvell.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[lauraa@codeaurora.org: Minor context fixup in mm/vmscan.c]
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Git-commit: 6e543d5780e36ff5ee56c44d7e2e30db3457a7ed
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
2013-11-07 18:39:56 -08:00
Linux Build Service Account 9bd6248c0f Merge "zswap: add to mm/" 2013-11-04 16:09:03 -08:00
Linux Build Service Account c369ce5d5d Merge "zbud: add to mm/" 2013-11-04 16:09:02 -08:00
Minchan Kim d0c1a40082 mm: remove compressed copy from zram in-memory
Swap subsystem does lazy swap slot free with expecting the page would be
swapped out again so we can avoid unnecessary write.

But the problem in in-memory swap(ex, zram) is that it consumes memory
space until vm_swap_full(ie, used half of all of swap device) condition
meet.  It could be bad if we use multiple swap device, small in-memory
swap and big storage swap or in-memory swap alone.

This patch makes swap subsystem free swap slot as soon as swap-read is
completed and make the swapcache page dirty so the page should be
written out the swap device to reclaim it.  It means we never lose it.

I tested this patch with kernel compile workload.

1. before

   compile time : 9882.42
   zram max wasted space by fragmentation: 13471881 byte
   memory space consumed by zram: 174227456 byte
   the number of slot free notify: 206684

2. after

   compile time : 9653.90
   zram max wasted space by fragmentation: 11805932 byte
   memory space consumed by zram: 154001408 byte
   the number of slot free notify: 426972

Change-Id: Ida7dcf6fbd67408e2429483c28c75f59f83310d3
[akpm@linux-foundation.org: tweak comment text]
[artem.savkov@gmail.com: fix BUG due to non-swapcache pages in end_swap_bio_read()]
[akpm@linux-foundation.org: invert unlikely() test, augment comment, 80-col cleanup]
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Artem Savkov <artem.savkov@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Konrad Rzeszutek Wilk <konrad@darnok.org>
Cc: Shaohua Li <shli@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: b430e9d1c6d416306d44dbf3aa3148be7af78abc
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2013-10-28 15:11:30 -07:00
Seth Jennings 60f645ab5a zswap: add to mm/
zswap is a thin backend for frontswap that takes pages that are in the
process of being swapped out and attempts to compress them and store
them in a RAM-based memory pool.  This can result in a significant I/O
reduction on the swap device and, in the case where decompressing from
RAM is faster than reading from the swap device, can also improve
workload performance.

It also has support for evicting swap pages that are currently
compressed in zswap to the swap device on an LRU(ish) basis.  This
functionality makes zswap a true cache in that, once the cache is full,
the oldest pages can be moved out of zswap to the swap device so newer
pages can be compressed and stored in zswap.

This patch adds the zswap driver to mm/

Change-Id: I448d18c19f6c61c2ddeb9b764c44a7730e6015e0
Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
Cc: Robert Jennings <rcj@linux.vnet.ibm.com>
Cc: Jenifer Hopper <jhopper@us.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dave Hansen <dave@sr71.net>
Cc: Joe Perches <joe@perches.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Cody P Schafer <cody@linux.vnet.ibm.com>
Cc: Hugh Dickens <hughd@google.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: 2b2811178e85553405b86e3fe78357b9b95889ce
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
[venkatg@codeaurora.org: keep msm-3.10 changes, add zswap cfg]
Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
2013-10-18 18:26:02 -07:00
Seth Jennings 1b9306af36 zbud: add to mm/
zbud is an special purpose allocator for storing compressed pages.  It
is designed to store up to two compressed pages per physical page.
While this design limits storage density, it has simple and
deterministic reclaim properties that make it preferable to a higher
density approach when reclaim will be used.

zbud works by storing compressed pages, or "zpages", together in pairs
in a single memory page called a "zbud page".  The first buddy is "left
justifed" at the beginning of the zbud page, and the last buddy is
"right justified" at the end of the zbud page.  The benefit is that if
either buddy is freed, the freed buddy space, coalesced with whatever
slack space that existed between the buddies, results in the largest
possible free region within the zbud page.

zbud also provides an attractive lower bound on density.  The ratio of
zpages to zbud pages can not be less than 1.  This ensures that zbud can
never "do harm" by using more pages to store zpages than the
uncompressed zpages would have used on their own.

This implementation is a rewrite of the zbud allocator internally used
by zcache in the driver/staging tree.  The rewrite was necessary to
remove some of the zcache specific elements that were ingrained
throughout and provide a generic allocation interface that can later be
used by zsmalloc and others.

This patch adds zbud to mm/ for later use by zswap.

Change-Id: I5120b1acd22f15c5dc3d2a0e6f1a34a73f97be3a
Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
Cc: Robert Jennings <rcj@linux.vnet.ibm.com>
Cc: Jenifer Hopper <jhopper@us.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dave Hansen <dave@sr71.net>
Cc: Joe Perches <joe@perches.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Cody P Schafer <cody@linux.vnet.ibm.com>
Cc: Hugh Dickens <hughd@google.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: 4e2e2770b1529edc5849c86b29a6febe27e2f083
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
[venkatg@codeaurora.org: keep msm-3.10 changes, add zbud cfg]
Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
2013-10-18 18:24:55 -07:00
Linux Build Service Account 46cf0fca4a Merge "mm, sched: Allow uaccess in atomic with pagefault_disable()" 2013-10-07 14:59:39 -07:00
Linux Build Service Account f256b4a989 Merge "mm, sched: Drop voluntary schedule from might_fault()" 2013-10-07 14:59:36 -07:00
Michael S. Tsirkin 34320dfe70 mm, sched: Allow uaccess in atomic with pagefault_disable()
This changes might_fault() so that it does not
trigger a false positive diagnostic for e.g. the following
sequence:

	spin_lock_irqsave()
	pagefault_disable()
	copy_to_user()
	pagefault_enable()
	spin_unlock_irqrestore()

In particular vhost wants to do this, to call
socket ops from under a lock.

There are 3 cases to consider:

 - CONFIG_PROVE_LOCKING - might_fault is non-inline
   so it's easy to move the in_atomic test to fix
   up the false positive warning.

 - CONFIG_DEBUG_ATOMIC_SLEEP - might_fault
   is currently inline, but we are calling a
   non-inline __might_sleep anyway,
   so let's use the non-line version of might_fault
   that does the right thing.

 - !CONFIG_DEBUG_ATOMIC_SLEEP && !CONFIG_PROVE_LOCKING
   __might_sleep is a nop so might_fault is a nop.

Make this explicit.

Change-Id: I5a0cad174b796eddeb9d239b7e114ed3348699bf
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1369577426-26721-11-git-send-email-mst@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Git-commit: 662bbcb2747c2422cf98d3d97619509379eee466
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
2013-10-04 14:13:02 -07:00
Michael S. Tsirkin 244bcc2312 mm, sched: Drop voluntary schedule from might_fault()
might_fault() is called from functions like copy_to_user()
which most callers expect to be very fast, like a couple of
instructions.

So functions like memcpy_toiovec() call them many times in a loop.

But might_fault() calls might_sleep() and with CONFIG_PREEMPT_VOLUNTARY
this results in a function call.

Let's not do this - just call __might_sleep() that produces
a diagnostic for sleep within atomic, but drop
might_preempt().

Here's a test sending traffic between the VM and the host,
host is built with CONFIG_PREEMPT_VOLUNTARY:

 before:
	incoming: 7122.77   Mb/s
	outgoing: 8480.37   Mb/s

 after:
	incoming: 8619.24   Mb/s
	outgoing: 9455.42   Mb/s

As a side effect, this fixes an issue pointed
out by Ingo: might_fault might schedule differently
depending on PROVE_LOCKING. Now there's no
preemption point in both cases, so it's consistent.

Change-Id: Ic27fa27635c6f0e76ca348a9e71a21d57531394b
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1369577426-26721-10-git-send-email-mst@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Git-commit: 114276ac0a3beb9c391a410349bd770653e185ce
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
2013-10-04 14:13:01 -07:00
Subbaraman Narayanamurthy f13a476269 debug-pagealloc: Panic on pagealloc corruption
Currently, we just print the pagealloc corruption warnings and
proceed. Sometimes, we are getting multiple errors printed down
the line. It will be good to get the device state as early as
possible when we get the first pagealloc error.

Change-Id: I79155ac8a039b30a3a98d5dd1384d3923082712f
Signed-off-by: Subbaraman Narayanamurthy <subbaram@codeaurora.org>
2013-10-01 11:45:57 -07:00
Pushkar Joshi 03d472dcc2 mm: panic on the first bad page table entry access
Sometimes having a number of bad page table entries precipitates in
a crash much later. Because of this, we do not have any context for
the point at which the first bad pte entry was encountered. Hence,
panic on first such instance to help gather context for debug.

Change-Id: Idddf2b977214eb1463d08e16630e98264b9af487
Signed-off-by: Pushkar Joshi <pushkarj@codeaurora.org>
2013-09-13 18:37:25 -07:00
Laura Abbott f3fff09b29 mm: Update is_vmalloc_addr to account for vmalloc savings
is_vmalloc_addr current assumes that all vmalloc addresses
exist between VMALLOC_START and VMALLOC_END. This may not be
the case when interleaving vmalloc and lowmem. Update the
is_vmalloc_addr to properly check for this.

Change-Id: I5def3d6ae1a4de59ea36f095b8c73649a37b1f36
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
2013-09-05 14:51:24 -07:00
Stephen Boyd 25e50d41af Revert "mm: sync vmalloc address space page tables in alloc_vm_area()"
This reverts commit d2affbaaa2.

This commit is already upstream.

Change-Id: Ie730e53279e6edc565d3ac4292d7d11553a93a9f
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
2013-09-05 14:50:58 -07:00
Liam Mark 06e8520b10 android/lowmemorykiller: Selectively count free CMA pages
In certain memory configurations there can be a large number of
CMA pages which are not suitable to satisfy certain memory
requests.

This large number of unsuitable pages can cause the
lowmemorykiller to not kill any tasks because the
lowmemorykiller counts all free pages.
In order to ensure the lowmemorykiller properly evaluates the
free memory only count the free CMA pages if they are suitable
for satisfying the memory request.

Change-Id: I7f06d53e2d8cfe7439e5561fe6e5209ce73b1c90
CRs-fixed: 437016
Signed-off-by: Liam Mark <lmark@codeaurora.org>
2013-09-04 16:15:40 -07:00
Olav Haugan dcfaf6cafa Revert "android/lowmemorykiller: Only consider gfp_mask free pages"
This change is not needed anymore due to a new and improved change to the
android low memory killer that is incompatible with this change. To be able
to apply the new change this change needs to be removed.

This reverts commit cc4baf70665f54305182bfaf98c6236d129e6dad.

Change-Id: I33801faf9ac7371f46cf409009939d853adee95e
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2013-09-04 16:14:29 -07:00
Liam Mark 8ee2256602 ion: tracing: add ftrace events for ion allocations
Add ftrace events for ion allocations to make it easier to profile
their performance.

Change-Id: I9f32e076cd50d7d3a145353dfcef74f0f6cdf8a0
Signed-off-by: Liam Mark <lmark@codeaurora.org>
2013-09-04 15:52:50 -07:00
Liam Mark bb5dc6e84c android/lowmemorykiller: Only consider gfp_mask free pages
In certain memory configurations there can be a large number of
CMA pages which are not suitable to satisfy certain memory
requests.
This large number of unsuitable pages can cause the
lowmemorykiller to not kill any tasks because the
lowmemorykiller counts all free pages.

In order to ensure the lowmemorykiller properly evaluates the
free memory only count the free pages which are suitable for
satisfying the memory request.

Change-Id: Iabc3bebc54be9dec7fcbffb94a2fbf18dd684669
Signed-off-by: Liam Mark <lmark@codeaurora.org>
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
2013-09-04 15:45:04 -07:00
Greg Reid 4f9702536f kernel: Add hooks for user-accessible timers in the kernel.
Hooks for user-accessible timers allow implementation of a
more efficient gettimeofday in user-space.

Change-Id: If2f63d010c1cf142eb84f3745617e756913e46f7
Signed-off-by: Brent DeGraaf <bdegraaf@codeaurora.org>
2013-09-04 15:36:37 -07:00
Neeti Desai 85952d1249 msm: 8974: Add driver to enable/disable memblock-remove feature
The msm_mem_hole driver is introduced to enable/disable
memblock-remove features for device tree nodes that set
compatible="qcom,msm-mem-hole"

Change-Id: I2aeb3725b74e9ff33992f70995767f790fc729c5
Signed-off-by: Neeti Desai <neetid@codeaurora.org>
2013-09-04 15:31:57 -07:00
Stephen Boyd 38d8910730 Merge branch 'qandroid-3.10' into msm-3.10
* qandroid-3.10: (636 commits)
  netfilter: xt_qtaguid: Protect iface list access with necessary lock
  HID: magicmouse: Fix build warning
  USB: gadget: mtp: Fix OUT endpoint request length usage in read
  USB: gadget: f_mtp: Fix using tx buffer pointer
  msm: Fix race condition in domain lookup
  msm: Add null-pointer checks for domains
  base: sync: increase size of sync_timeline name
  USB: gadget: mtp: Add module parameters for Tx transfer length
  msm: iommu: Lock the genpool allocation
  gpu: ion: fix page offset in dma_buf_kmap()
  gpu: ion: Fix bug in ion_system_heap map_user
  gpu: ion: Only map as much of the vma as the user requested
  gpu: ion: use vmalloc to allocate page array to map kernel
  gpu: ion: Remove dead comments
  gpu: ion: Minimize allocation fallback delay
  mmc: sd: Set the card removed if card detect fails
  gpu: ion: don't fault in individual pages for the CP heap
  gpu: ion: do not ask for compound pages in system heap
  gpu: ion: Modify the system heap to try to allocate large/huge pages
  gpu: ion: Set the dma_address of the sg list at alloc time
  ...

Conflicts:
	arch/arm/Kconfig
	arch/arm/include/asm/hardware/cache-l2x0.h
	arch/arm/mm/cache-l2x0.c
	drivers/mmc/card/block.c
	drivers/usb/gadget/udc-core.c
2013-09-04 14:46:18 -07:00
Laura Abbott 25307f602e mm: Remove __init annotations from free_bootmem_late
free_bootmem_late is currently set up to only be used in init
functions. Some clients need to use this function past initcalls.
The functions themselves have no restrictions on being used later
minus the __init annotations so remove the annotation.

Change-Id: I7c7e15cf2780a8843ebb4610da5b633c9abb0b3d
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
2013-08-22 18:09:32 -07:00
Neeti Desai c19beb14ff msm: Increase the kernel virtual area to include lowmem
Even though lowmem is accounted for in vmalloc space, allocation
comes only from the region bounded by VMALLOC_START and VMALLOC_END.
The kernel virtual area can now allocate from any unmapped region
starting from PAGE_OFFSET.

Change-Id: I291b9eb443d3f7445fd979bd7b09e9241ff22ba3
Signed-off-by: Neeti Desai <neetid@codeaurora.org>
2013-08-22 18:09:31 -07:00
Neeti Desai 679ece4533 msm: Allow lowmem to be non contiguous and mixed.
Any image that is expected to have a lifetime of
the entire system can give the virtual address
space back for use in vmalloc.

Change-Id: I81ce848cd37e8573d706fa5d1aa52147b3c8da12
Signed-off-by: Neeti Desai <neetid@codeaurora.org>
2013-08-22 18:09:30 -07:00
Lee Susman a4fe692df6 mm: pass readahead info down to the i/o scheduler
Some i/o schedulers (i.e. row-iosched, cfq-iosched) deploy an idling
algorithm in order to be better synced with the readahead algorithm.
Idling is a prediction algorithm for incoming read requests.

In this patch we mark pages which are part of a readahead window, by
setting a newly introduced flag. With this flag, the i/o scheduler can
identify a request which is associated with a readahead page. This
enables the i/o scheduler's idling mechanism to be en-sync with the
readahead mechanism and, in turn, can increase read throughput.

Change-Id: I0654f23315b6d19d71bcc9cc029c6b281a44b196
Signed-off-by: Lee Susman <lsusman@codeaurora.org>
2013-08-22 18:08:28 -07:00
Laura Abbott b02da723ff mm: Always split CMA pages
The page allocator (correctly) checks watermarks when determining
if a page should be split to avoid fragmentation. For general
migration, this is acceptable but this may lead to high rates of
CMA failure under memory pressure. Since CMA migration failures
have a very high cost from a user perspective, skip the watermark
checks and always split the page. This may result in more
fragmentation in the zone, but it's worth it given the benefit to
CMA allocations.

Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
2013-08-22 18:08:27 -07:00
Lee Susman d8ff3a8819 mm: change initial readahead window size calculation
Change the logic which determines the initial readahead window size
such that for small requests (one page) the initial window size
will be x4 the size of the original request, regardless of the
VM_MAX_READAHEAD value. This prevents a rapid ramp-up
that could be caused due to increasing VM_MAX_READAHEAD.

Change-Id: I93d59c515d7e6c6d62348790980ff7bd4f434997
Signed-off-by: Lee Susman <lsusman@codeaurora.org>
2013-08-22 18:07:45 -07:00
Laura Abbott 54089f1c73 mm: Retry original migrate type if CMA failed
Currently, __rmqueue_cma will disregard the original migrate type
and only try MIGRATE_CMA for allocations. If the MIGRATE_CMA
allocation fails, the fallback types of the original migrate type
are used. Note that in this current path we never try to actually
allocate from the original migrate type. If the only pages left
in the system are the original migrate type, we will fail the
allocation since we never actually try the original migrate type.
This may lead to infinite looping since the system still (correctly)
calculates there are pages available for allocation and will keep
trying to allocate pages. Fix this degenerate case by allocating
from the original migrate type if the MIGRATE_CMA allocation fails.

Change-Id: I62ab293dc694955eaf88e790131a8565395ba8cb
CRs-Fixed: 470615
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
2013-08-22 18:07:44 -07:00
Stephen Boyd 3dd5eb15e1 mm: vmscan: remove unused name array
The name array is not used, remove it.

Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
2013-07-08 06:32:08 -07:00
Laura Abbott 033c0bcda0 mm: Don't put CMA pages on per cpu lists
CMA allocations rely on being able to migrate pages out
quickly to fulfill the allocations. Most use cases for
movable allocations meet this requirement. File system
allocations may take an unaccpetably long time to
migrate, which creates delays from CMA. Prevent CMA
pages from ending up on the per-cpu lists to avoid
code paths grabbing CMA pages on the fast path. CMA
pages can still be allocated as a fallback under tight
memory pressure.

CRs-Fixed: 452508
Change-Id: I79a28f697275a2a1870caabae53c8ea345b4b47d
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
2013-07-08 05:55:13 -07:00
Heesub Shin c05d5692ed cma: redirect page allocation to CMA
CMA pages are designed to be used as fallback for movable allocations
and cannot be used for non-movable allocations. If CMA pages are
utilized poorly, non-movable allocations may end up getting starved if
all regular movable pages are allocated and the only pages left are
CMA. Always using CMA pages first creates unacceptable performance
problems. As a midway alternative, use CMA pages for certain
userspace allocations. The userspace pages can be migrated or dropped
quickly which giving decent utilization.

Change-Id: I6165dda01b705309eebabc6dfa67146b7a95c174
CRs-Fixed: 452508
[lauraa@codeaurora.org: Missing CONFIG_CMA guards, add commit text]
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
2013-07-08 05:55:12 -07:00
Laura Abbott 8c909be407 Revert "mm: cma: on movable allocations try MIGRATE_CMA first"
This reverts commit b5662d64fa5ee483b985b351dec993402422fee3.

Using CMA pages first creates good utilization but has some
unfortunate side effects. Many movable allocations come from
the filesystem layer which can hold on to pages for long periods
of time which causes high allocation times (~200ms) and high
rates of failure. Revert this patch and use alternate allocation
strategies to get better utilization.

Change-Id: I917e137d5fb292c9f8282506f71a799a6451ccfa
CRs-Fixed: 452508
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
2013-07-08 05:55:12 -07:00
Michal Nazarewicz 93f6b2ee94 mm: cma: on movable allocations try MIGRATE_CMA first
It has been observed that system tends to keep a lot of CMA free pages
even in very high memory pressure use cases.  The CMA fallback for
movable pages is used very rarely, only when system is completely
pruned from MOVABLE pages.  This means that the out-of-memory is
triggered for unmovable allocations even when there are many CMA pages
available.  This problem was not observed previously since movable
pages were used as a fallback for unmovable allocations.

To avoid such situation this commit changes the allocation order so
that on movable allocations the MIGRATE_CMA pageblocks are used first.

This change means that the MIGRATE_CMA can be removed from fallback
path of the MIGRATE_MOVABLE type.  This means that the
__rmqueue_fallback() function will never deal with CMA pages and thus
all the checks around MIGRATE_CMA can be removed from that function.

Change-Id: Ie13312d62a6af12d7aa78b4283ed25535a6d49fd
CRs-Fixed: 435287
Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
2013-07-08 05:55:04 -07:00
Heesub Shin 77dd82c55f cma: fix race condition on a page
cruel, brute-force method for letting cma/migration to
finish its job without stealing the lock
migration_entry_wait() and creating a live-lock on the
faulted page. This patch solves the case of
page->_count == 2 migration failure.

Change-Id: Ia94542a80e44a213831291af289bbf5ee6880bfd
Signed-off-by: Heesub Shin <heesub.shin@samsung.com>
Reviewed-on: http://165.213.202.130:8080/39341
Tested-by: System S/W SCM <scm.systemsw@samsung.com>
Tested-by: Dongjun Shin <d.j.shin@samsung.com>
Reviewed-by: Hyunju Ahn <hyunju.ahn@samsung.com>
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
2013-07-08 05:55:02 -07:00
Laura Abbott 6d13f649c9 mm: Don't use CMA pages for writes
If CMA pages are used for writes, the writes may not complete
fast enough for CMA to be allocated within a reasonable amount
of time. If we get a CMA page, get another one to use instead.

Change-Id: I19d8ba655da7525d68d5947337d500566998971c
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
2013-07-08 05:55:01 -07:00
Laura Abbott 27f6fe53fb mm: Add is_cma_pageblock definition
Bring back the is_cma_pageblock definition for determining if a
page is CMA or not.

Change-Id: I39fd546e22e240b752244832c79514f109c8e84b
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
2013-07-08 05:55:01 -07:00
Liam Mark 23c3b6c01d mm: split_free_page ignore memory watermarks for CMA
Memory watermarks were sometimes preventing CMA allocations
in low memory.

Change-Id: I550ec987cbd6bc6dadd72b4a764df20cd0758479
Signed-off-by: Liam Mark <lmark@codeaurora.org>
2013-07-08 05:54:59 -07:00
Shashank Mittal 3570c98b6d mm: Fix a compiler warning.
Fix compiler warning for a variable not initialized.

Change-Id: Ieedeb1cfb5a22eb5f671e6bfd1361315347a49af
Signed-off-by: Shashank Mittal <mittals@codeaurora.org>
2013-07-08 05:52:25 -07:00
Larry Bassel 17dfb1cddc mm: make physical memory offline work
In recent versions, the platform specific physical
offline returns the number of bytes offlined, so
a value of 0 indicates an error, not success as in
older versions. Make sure that the memory
for the original memory resource nodes is not
freed via kfree, as this memory was obtained
from alloc_bootmem very early in the system's life.

Change-Id: Iffcdd8be4483e043d7605fce596ed438b15f3e02
Signed-off-by: Larry Bassel <lbassel@codeaurora.org>
(cherry picked from commit 2421717cb10a06814d7bdb431485aa3a5e364f36)
2013-07-08 05:52:01 -07:00