I've seen a fair number of issues with kswapd and other processes
appearing to get stuck in v3.12-rc. Using sysrq-p many times seems to
indicate that it gets stuck somewhere in list_lru_walk_node(), called
from prune_icache_sb() and super_cache_scan().
I never seem to be able to trigger a calltrace for functions above that
point.
So I decided to add the following to super_cache_scan():
@@ -81,10 +81,14 @@ static unsigned long super_cache_scan(struct shrinker *shrink,
inodes = list_lru_count_node(&sb->s_inode_lru, sc->nid);
dentries = list_lru_count_node(&sb->s_dentry_lru, sc->nid);
total_objects = dentries + inodes + fs_objects + 1;
+printk("%s:%u: %s: dentries %lu inodes %lu total %lu\n", current->comm, current->pid, __func__, dentries, inodes, total_objects);
/* proportion the scan between the caches */
dentries = mult_frac(sc->nr_to_scan, dentries, total_objects);
inodes = mult_frac(sc->nr_to_scan, inodes, total_objects);
+printk("%s:%u: %s: dentries %lu inodes %lu\n", current->comm, current->pid, __func__, dentries, inodes);
+BUG_ON(dentries == 0);
+BUG_ON(inodes == 0);
/*
* prune the dcache first as the icache is pinned by it, then
@@ -99,7 +103,7 @@ static unsigned long super_cache_scan(struct shrinker *shrink,
freed += sb->s_op->free_cached_objects(sb, fs_objects,
sc->nid);
}
-
+printk("%s:%u: %s: dentries %lu inodes %lu freed %lu\n", current->comm, current->pid, __func__, dentries, inodes, freed);
drop_super(sb);
return freed;
}
and shortly thereafter, having applied some pressure, I got this:
update-apt-xapi:1616: super_cache_scan: dentries 25632 inodes 2 total 25635
update-apt-xapi:1616: super_cache_scan: dentries 1023 inodes 0
------------[ cut here ]------------
Kernel BUG at c0101994 [verbose debug info unavailable]
Internal error: Oops - BUG: 0 [#3] SMP ARM
Modules linked in: fuse rfcomm bnep bluetooth hid_cypress
CPU: 0 PID: 1616 Comm: update-apt-xapi Tainted: G D 3.12.0-rc7+ #154
task: daea1200 ti: c3bf8000 task.ti: c3bf8000
PC is at super_cache_scan+0x1c0/0x278
LR is at trace_hardirqs_on+0x14/0x18
Process update-apt-xapi (pid: 1616, stack limit = 0xc3bf8240)
...
Backtrace:
(super_cache_scan) from [<c00cd69c>] (shrink_slab+0x254/0x4c8)
(shrink_slab) from [<c00d09a0>] (try_to_free_pages+0x3a0/0x5e0)
(try_to_free_pages) from [<c00c59cc>] (__alloc_pages_nodemask+0x5)
(__alloc_pages_nodemask) from [<c00e07c0>] (__pte_alloc+0x2c/0x13)
(__pte_alloc) from [<c00e3a70>] (handle_mm_fault+0x84c/0x914)
(handle_mm_fault) from [<c001a4cc>] (do_page_fault+0x1f0/0x3bc)
(do_page_fault) from [<c001a7b0>] (do_translation_fault+0xac/0xb8)
(do_translation_fault) from [<c000840c>] (do_DataAbort+0x38/0xa0)
(do_DataAbort) from [<c00133f8>] (__dabt_usr+0x38/0x40)
Notice that we had a very low number of inodes, which were reduced to
zero my mult_frac().
Now, prune_icache_sb() calls list_lru_walk_node() passing that number of
inodes (0) into that as the number of objects to scan:
long prune_icache_sb(struct super_block *sb, unsigned long nr_to_scan,
int nid)
{
LIST_HEAD(freeable);
long freed;
freed = list_lru_walk_node(&sb->s_inode_lru, nid, inode_lru_isolate,
&freeable, &nr_to_scan);
which does:
unsigned long
list_lru_walk_node(struct list_lru *lru, int nid, list_lru_walk_cb isolate,
void *cb_arg, unsigned long *nr_to_walk)
{
struct list_lru_node *nlru = &lru->node[nid];
struct list_head *item, *n;
unsigned long isolated = 0;
spin_lock(&nlru->lock);
restart:
list_for_each_safe(item, n, &nlru->list) {
enum lru_status ret;
/*
* decrement nr_to_walk first so that we don't livelock if we
* get stuck on large numbesr of LRU_RETRY items
*/
if (--(*nr_to_walk) == 0)
break;
So, if *nr_to_walk was zero when this function was entered, that means
we're wanting to operate on (~0UL)+1 objects - which might as well be
infinite.
Clearly this is not correct behaviour. If we think about the behaviour
of this function when *nr_to_walk is 1, then clearly it's wrong - we
decrement first and then test for zero - which results in us doing
nothing at all. A post-decrement would give the desired behaviour -
we'd try to walk one object and one object only if *nr_to_walk were one.
It also gives the correct behaviour for zero - we exit at this point.
Fixes: 5cedf721a7 ("list_lru: fix broken LRU_RETRY behaviour")
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
[ Modified to make sure we never underflow the count: this function gets
called in a loop, so the 0 -> ~0ul transition is dangerous - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: I8c53bcc4c70ed978e6cf81a6f38fb06a59cc64ce
Previously, page cache radix tree nodes were freed after reclaim emptied
out their page pointers. But now reclaim stores shadow entries in their
place, which are only reclaimed when the inodes themselves are
reclaimed. This is problematic for bigger files that are still in use
after they have a significant amount of their cache reclaimed, without
any of those pages actually refaulting. The shadow entries will just
sit there and waste memory. In the worst case, the shadow entries will
accumulate until the machine runs out of memory.
To get this under control, the VM will track radix tree nodes
exclusively containing shadow entries on a per-NUMA node list. Per-NUMA
rather than global because we expect the radix tree nodes themselves to
be allocated node-locally and we want to reduce cross-node references of
otherwise independent cache workloads. A simple shrinker will then
reclaim these nodes on memory pressure.
A few things need to be stored in the radix tree node to implement the
shadow node LRU and allow tree deletions coming from the list:
1. There is no index available that would describe the reverse path
from the node up to the tree root, which is needed to perform a
deletion. To solve this, encode in each node its offset inside the
parent. This can be stored in the unused upper bits of the same
member that stores the node's height at no extra space cost.
2. The number of shadow entries needs to be counted in addition to the
regular entries, to quickly detect when the node is ready to go to
the shadow node LRU list. The current entry count is an unsigned
int but the maximum number of entries is 64, so a shadow counter
can easily be stored in the unused upper bits.
3. Tree modification needs tree lock and tree root, which are located
in the address space, so store an address_space backpointer in the
node. The parent pointer of the node is in a union with the 2-word
rcu_head, so the backpointer comes at no extra cost as well.
4. The node needs to be linked to an LRU list, which requires a list
head inside the node. This does increase the size of the node, but
it does not change the number of objects that fit into a slab page.
[akpm@linux-foundation.org: export the right function]
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Metin Doslu <metin@citusdata.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Ozgun Erdogan <ozgun@citusdata.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <klamm@yandex-team.ru>
Cc: Ryan Mallon <rmallon@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: I6da4642d842f91615747957e7f54a5c2d4593427
We currently use a compile-time constant to size the node array for the
list_lru structure. Due to this, we don't need to allocate any memory at
initialization time. But as a consequence, the structures that contain
embedded list_lru lists can become way too big (the superblock for
instance contains two of them).
This patch aims at ameliorating this situation by dynamically allocating
the node arrays with the firmware provided nr_node_ids.
Change-Id: If8f8d671d505709d22918b023ed1935b12c06c89
Signed-off-by: Glauber Costa <glommer@openvz.org>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The list_lru implementation has one function, list_lru_dispose_all, with
only one user (the dentry code). At first, such function appears to make
sense because we are really not interested in the result of isolating each
dentry separately - all of them are going away anyway. However, it's
implementation is buggy in the following way:
When we call list_lru_dispose_all in fs/dcache.c, we scan all dentries
marking them with DCACHE_SHRINK_LIST. However, this is done without the
nlru->lock taken. The imediate result of that is that someone else may
add or remove the dentry from the LRU at the same time. When list_lru_del
happens in that scenario we will see an element that is not yet marked
with DCACHE_SHRINK_LIST (even though it will be in the future) and
obviously remove it from an lru where the element no longer is. Since
list_lru_dispose_all will in effect count down nlru's nr_items and
list_lru_del will do the same, this will lead to an imbalance.
The solution for this would not be so simple: we can obviously just keep
the lru_lock taken, but then we have no guarantees that we will be able to
acquire the dentry lock (dentry->d_lock). To properly solve this, we need
a communication mechanism between the lru and dentry code, so they can
coordinate this with each other.
Such mechanism already exists in the form of the list_lru_walk_cb
callback. So it is possible to construct a dcache-side prune function
that does the right thing only by calling list_lru_walk in a loop until no
more dentries are available.
With only one user, plus the fact that a sane solution for the problem
would involve boucing between dcache and list_lru anyway, I see little
justification to keep the special case list_lru_dispose_all in tree.
Change-Id: I7cbc4646a323aae9605dac32e0a1591340493245
Signed-off-by: Glauber Costa <glommer@openvz.org>
Cc: Michal Hocko <mhocko@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This patch adapts the list_lru API to accept an optional node argument, to
be used by NUMA aware shrinking functions. Code that does not care about
the NUMA placement of objects can still call into the very same functions
as before. They will simply iterate over all nodes.
Change-Id: I32b543728b73c134137ebe9e502ef6d8a5bd45b3
Signed-off-by: Glauber Costa <glommer@openvz.org>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
To test whether an address is aligned to PAGE_SIZE.
Change-Id: Id956f67b1a5efc271ab29819e5cd04d4b7cddaa0
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>,
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
... except for one in android, where the check is different
and already done in caller. No need to recalculate rlimit
many times in alloc_fd() either.
Change-Id: Ia6eb7e1af1047f4d4f188d89deb70d708fa9110a
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Similar situation to that of __alloc_fd(); do not use unless you
really have to. You should not touch any descriptor table other
than your own; it's a sure sign of a really bad API design.
As with __alloc_fd(), you *must* use a first-class reference to
struct files_struct; something obtained by get_files_struct(some task)
(let alone direct task->files) will not do. It must be either
current->files, or obtained by get_files_struct(current) by the
owner of that sucker and given to you.
Change-Id: Ia326b598ba7e1b315188ecea21250064433ae620
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Essentially, alloc_fd() in a files_struct we own a reference to.
Most of the time wanting to use it is a sign of lousy API
design (such as android/binder). It's *not* a general-purpose
interface; better that than open-coding its guts, but again,
playing with other process' descriptor table is a sign of bad
design.
Change-Id: I0a62c0a1a9162d6e5961878d7dc7ff8ffcf82b56
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
non-rcu variant of list_first_or_null_rcu
Change-Id: I7b446cbcd2262e134d148fdb5977dd61362fb0ab
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Paul Keith <javelinanddart@gmail.com>
Signed-off-by: Samuel Pascua <pascua.samuel.14@gmail.com>
The fact that volatile allows for atomic load/stores is a special case
not a requirement for {READ,WRITE}_ONCE(). Their primary purpose is to
force the compiler to emit load/stores _once_.
Change-Id: I713b57e95c81b5d49a04e5562f13ad46a7b2341d
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Git-commit: 7bd3e239d6c6d1cad276e8f130b386df4234dcd7
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Trilok Soni <tsoni@codeaurora.org>
[ Upstream commit 43239cbe79fc369f5d2160bd7f69e28b5c50a58c ]
Feedback has shown that WRITE_ONCE(x, val) is easier to use than
ASSIGN_ONCE(val,x).
There are no in-tree users yet, so lets change it for 3.19.
Change-Id: I6903079f06bb16b1bde71124920d055b1fb4f0bf
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
The LRU_RETRY code assumes that the list traversal status after we have
dropped and regained the list lock. Unfortunately, this is not a valid
assumption, and that can lead to racing traversals isolating objects that
the other traversal expects to be the next item on the list.
This is causing problems with the inode cache shrinker isolation, with
races resulting in an inode on a dispose list being "isolated" because a
racing traversal still thinks it is on the LRU. The inode is then never
reclaimed and that causes hangs if a subsequent lookup on that inode
occurs.
Fix it by always restarting the list walk on a LRU_RETRY return from the
isolate callback. Avoid the possibility of livelocks the current code was
trying to avoid by always decrementing the nr_to_walk counter on retries
so that even if we keep hitting the same item on the list we'll eventually
stop trying to walk and exit out of the situation causing the problem.
Change-Id: I87924c4e3a2d777eaded50ffb303728c370f7d80
Reported-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Cc: Glauber Costa <glommer@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Now that we have an LRU list API, we can start to enhance the
implementation. This splits the single LRU list into per-node lists and
locks to enhance scalability. Items are placed on lists according to the
node the memory belongs to. To make scanning the lists efficient, also
track whether the per-node lists have entries in them in a active
nodemask.
Note: We use a fixed-size array for the node LRU, this struct can be very
big if MAX_NUMNODES is big. If this becomes a problem this is fixable by
turning this into a pointer and dynamically allocating this to
nr_node_ids. This quantity is firwmare-provided, and still would provide
room for all nodes at the cost of a pointer lookup and an extra
allocation. Because that allocation will most likely come from a may very
well fail.
[glommer@openvz.org: fix warnings, added note about node lru]
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Glauber Costa <glommer@openvz.org>
Reviewed-by: Greg Thelen <gthelen@google.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Change-Id: I1de68e5776851014bf23ed016bc5e08d95e2a971
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Several subsystems use the same construct for LRU lists - a list head, a
spin lock and and item count. They also use exactly the same code for
adding and removing items from the LRU. Create a generic type for these
LRU lists.
This is the beginning of generic, node aware LRUs for shrinkers to work
with.
[glommer@openvz.org: enum defined constants for lru. Suggested by gthelen, don't relock over retry]
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Glauber Costa <glommer@openvz.org>
Reviewed-by: Greg Thelen <gthelen@google.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Change-Id: I3d3e3e47989f931d7da3deb1487c8a00e67b650a
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
(cherry picked from commit ec8d7c14ea14922fe21945b458a75e39f11dd832)
Tetsuo has properly noted that mmput slow path might get blocked waiting
for another party (e.g. exit_aio waits for an IO). If that happens the
oom_reaper would be put out of the way and will not be able to process
next oom victim. We should strive for making this context as reliable
and independent on other subsystems as much as possible.
Introduce mmput_async which will perform the slow path from an async
(WQ) context. This will delay the operation but that shouldn't be a
problem because the oom_reaper has reclaimed the victim's address space
for most cases as much as possible and the remaining context shouldn't
bind too much memory anymore. The only exception is when mmap_sem
trylock has failed which shouldn't happen too often.
The issue is only theoretical but not impossible.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Only backports mmput_async.
Change-Id: I5fe54abcc629e7d9eab9fe03908903d1174177f1
Signed-off-by: Arve Hjønnevåg <arve@android.com>
Kbuild supports saving output files in a separate directory.
But the build directory must be created beforehand. For example,
$ mkdir -p dir/to/store/output/files
$ make O=dir/to/store/output/files defconfig
Creating a build directory automatically would be useful.
Signed-off-by: Masahiro Yamada <yamada.m@jp.panasonic.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Michal Marek <mmarek@suse.cz>
Change-Id: Ibfbf1509a2b001261234f421babb621106345e5b
Drop AMSDU subframes if AMSDU subframe header's DA
is equal to broadcast address.
Change-Id: I21f2b95b45fb150a857d23ba158a0f9df15d5c46
CRs-Fixed: 2897293
Drop inalid EAPOL packets in SAP mode which are not
destined to self mac address.
Change-Id: I9754dddf580e60bd88ddc6e28355162499a8d125
CRs-Fixed: 2868054
Currently during roaming for LFR make before break feature under
stress testing RX thread is stuck in while loop resulting in host
RX low resource and firmware watch dog bite. In this change refactor
the code to check for null termination of the received frames rather
than checking for the local variable pointer assigned to the input
received frames.
Change-Id: I47b40566d52134b58304541c708cd87263fabfc6
CRs-Fixed: 2009414
Propagate from cld-3.0 to prima.
When beacon report request action frame is received,
rrmProcessBeaconReportReq() is called and num_channels value
is calculated from the action frame directly from user. This
value is assigned to pSmeBcnReportReq->channelList.numChannels
and this num channels value along with the channel list is
posted to sme for further processing. The sme function
sme_RrmProcessBeaconReportReqInd() processes this sme
message eWNI_SME_BEACON_REPORT_REQ_IND. In this function,
the channels in channel list are looped through the received
value pBeaconReq->channelList.numChannels and is copied to the
destination pSmeRrmContext->channelList array from the
pBeaconReq->channelList.channelNumber[] array.
The maximum possible number of channels in channel list
BeaconReq->channelList.channelNumber[] allocated statically
in the definition of tSirChannelList is
SIR_ESE_MAX_MEAS_IE_REQS (8).
So when the pBeaconReq->channelList.numChannels, possible OOB
read occurs.
Validate the value of pBeaconReq->channelList.numChannels
received from the action frame against the maximum supported
number of channels in channel list SIR_ESE_MAX_MEAS_IE_REQS (8).
Place this validation inside the function
sme_RrmProcessBeaconReportReqInd() instead of validating it
at rrmProcessBeaconReportReq() so that it defends from other
caller sme_SetEseBeaconRequest() which is from user space
command through IOCTL.
Change-Id: I2074b04081328ceab7eeb29c33631a635e9d93c3
CRs-Fixed: 2462152
In the function sirConvertAddtsRsp2Struct, iterator j is
assigned with the value pAddTs->numTclas + addts.num_WMMTCLAS.
The j value is used as the index to the array pAddTs->tclasInfo.
Maximum limit on pAddTs->tclasInfo entries is 2. So when the
value of j exceeds 2, then a possible buffer overflow could
occur.
Validate the value of j against SIR_MAC_TCLASIE_MAXNUM(2).
Change-Id: Icc723380ed4ccd51c729194d509e288be0e0712c
CRs-Fixed: 2449899
Propagation from cld2.0 to prima
In the API limProcessDeauthFrame, the reason-code is
fetched from the payload, and it may happen that the
payload received is empty, and the MPDU just contains the
header, so the driver may access the memory not allocated
to the frame, thus resulting in a OOB read.
Fix is to have a min length check of 16 bits for the
reason code before accessing it.
Change-Id: I7e7a435ba049356c13fb10240f4abb9bf6219af4
CRs-Fixed: 2341590
Fix Out-of-bound access in sapInterferenceRssiCount, by checking
the limit of start address for channel info and end address for
channel info.
Change-Id: If21e09d0f11bd655a8e04139ccf55d3682734b17
CRs-Fixed: 2149350
There is no check for the return value of dot11fUnpackIeRSN API
in hdd_ProcessGENIE API, which may cause stack overflow if
pmkid_count is returned as more than the PMKIDCache size.
Add a check for return value of dot11fUnpackIeRSN to avoid possible
stack overflow.
Change-Id: I56424c706de121b18b8d3f2c4a35089ec0434452
CRs-Fixed: 2149187
Allocation of memory for ric data fails
when ric data length is zero and error message
is displayed.
Fix is to allocate memory only when ric data length
is greater than zero.
Change-Id: I7c8825a5d287e13d660b0b1173c6c520f75ad3ef
CRs-Fixed: 2065221
Currently the key sequence counter received from userspace is not
propagated to SME, so add logic to propagate it.
Change-Id: I5371700003744eb967c578c44e4d130628efcdc8
CRs-Fixed: 2129237
In function ProcSetReqInternal, valueLen is obtained from the
message buffer pParam. This valueLen is used as argument to the
function GetStrValue where the contents of the buffer pParam is
copied to pMac->cfg.gSBuffer for valueLen number of bytes. However
the array pMac->cfg.gSBuffer is a static array of size CFG_MAX_STR_LEN.
If the value of valueLen exceeds CFG_MAX_STR_LEN, a buffer overwrite
will occur in GetStrValue.
Add Sanity check to make sure valueLen does not exceed CFG_MAX_STR_LEN.
Change-Id: Id16d4c4b8d2414c00a0fae8f8292f011d0763b84
CRs-Fixed: 2143847
The commit "qcacld-2.0: Fix incorrect length of encrypted auth frame" is
already allocating and setting memory for encrAuthFrame. Don't allocate and
set the memory twice.
Change-Id: Id5c30d4213b9e41040bca303d42f990b0a9932c9
WPA RSN IE is copied from source without a check on the given IE length.
A malicious IE length can cause buffer overflow.
Add maximum bound check on WPA RSN IE length.
Change-Id: Id159d307e8f9c1de720d4553a7c29f23cbd28571
CRs-Fixed: 2033213
STA is not able to connect to AP configured with WEP shared
due to incorrect frame length of encrypted auth frame.
Fix this by using the correct frame length.
Bug: 67754642
Change-Id: Ida8d78b512ecf79314200a7c96f5b5c293e5474e
Signed-off-by: Srinivas Girigowda <sgirigow@codeaurora.org>
Memory for encrypted auth frame is allocated based on macro
SIR_MAC_AUTH_CHALLENGE_LENGTH. SIR_MAC_AUTH_CHALLENGE_LENGTH
was updated to 253 from 128. Auth failure is observed on
receiving challenge text of length 128.
Fix is to use length based on the challenge text received.
Change-Id: I9a8b1a05d36421cfab2bf699fe38c50e150cf464
CRs-Fixed: 2100554
Bug: 67030205
Signed-off-by: Srinivas Girigowda <sgirigow@codeaurora.org>
An incorrect IE length can overflow the remaining length variable
and make IE parsing logic perform a buffer over-read.
Check on IE length to avoid buffer over-read.
Bug: 63868629
Change-Id: I20ef6a0136c7a5b602ad15a2fb725f20807b81d0
CRs-Fixed: 2033195
Signed-off-by: Ecco Park <eccopark@google.com>
qcacld-3.0 to qcacld-2.0 propagation
Add check for buffer length in function sme_set_ft_ies.
Bug: 64431968
Change-Id: I7adc56e23316c0ceb193a5bdf8c4c0b5f4fbd20a
CRs-Fixed: 2070583
Signed-off-by: Ecco Park <eccopark@google.com>
Fix CVE-2017-11035
qcacld-3.0 to qcacld-2.0 propagation.
Fix incorrect processing of encrypted auth frame by allocating
appropriate local buffer and using correct type for frame length.
Change-Id: I87d6f4c3c43dd332d5b1877ddf4b3b46a717468b
CRs-Fixed: 2082544
Fix CVE-2017-11015
Change-Id: I7cb934fa97e0250fdc62eec74000f0dd5b323633
Currently limProcessAuthFrame stack frame size exceeds 1024 and causes
build failures for 32 bit platforms.
Move multiple variables from local to dynamic allocation to reduce the
frame size of limProcessAuthFrame.
Change-Id: I83cf5ab24693e0ce012894d808ac79bf37fa9a08
CRs-Fixed: 2083572
Fix CVE-2017-11015
Change-Id: Ifb1971d07ba99705f14d693a6d9a484f71a48c67
qcacld-3.0 to prima propagation
In function rrmProcessBeaconReportReq, add bound check before
writing to channel list which is of fixed size.
Change-Id: I3c80974bba84a96f7b85e4ce62bbb01c23b4babf
CRs-Fixed: 2072774
Fix CVE-2017-11014
Change-Id: Ie5ec655f449093b8b5042a398d94b8342df60e3e