Commit graph

307086 commits

Author SHA1 Message Date
Yunlei He
3bdb893816 f2fs: Fix a system panic caused by f2fs_follow_link
In linux 3.10, we can not make sure the return value of nd_get_link function
is valid. So this patch add a check before use it.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Shuoran Liu <liushuoran@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:19 +08:00
Matt Wagantall
9a7dcc468f msm: cpufreq: Relax constraints on "msm-cpufreq" workqueue
This workqueue is not used in memory reclaim paths, so the
WQ_MEM_RECLAIM flag is not needed. Additionally, there is no
need to restrict the queue to having one in-flight work item.
Remove these constraints.

Change-Id: I9edde40917d3ec885ce061133de20680634321d0
Signed-off-by: Matt Wagantall <mattw@codeaurora.org>
2016-10-29 23:12:19 +08:00
Mahesh Sivasubramanian
1538aee4f7 msm: cpufreq: Configure WQ for higer priority
When the workqueue runs at a lower priority, it gets starved by higher
priority threads. Bump the WQ priority to high to ensure cpufreq work
queue gets a fair scheduling chance

Change-Id: I9e994da94a347dceb884e72ec3dd3da468a4471d
Signed-off-by: Mahesh Sivasubramanian <msivasub@codeaurora.org>
2016-10-29 23:12:19 +08:00
Narayanan Gopalakrishnan
498b49280a msm: cpufreq: increase priority of thread that increases frequencies
When the cpufreq governors request for increase in frequency from kworker
threads, there is chance that the kworker thread is cpu starved due to
other high priority threads, leading to undesirable increase in frequency
ramp-up latency. Add the calling thread to SCHED_FIFO policy when
requesting for increase in frequency and restore back to original policy
after ramp-up is completed.

Change-Id: Ie4fa199b7f9087717cb94dfcc2eddea27cc0012b
Signed-off-by: Narayanan Gopalakrishnan <nargop@codeaurora.org>
2016-10-29 23:12:19 +08:00
Mahesh Sivasubramanian
ba057921bf msm: rpm-smd: Configure WQ for higer priority
When the workqueue runs at a lower priority, it gets starved by higher
priority threads. Bump the WQ priority to high to ensure rpm smd work
queue gets a fair chance

Change-Id: I06b864611cf45afe6931d6030327806032894663
Signed-off-by: Mahesh Sivasubramanian <msivasub@codeaurora.org>
2016-10-29 23:12:19 +08:00
Srivatsa Vaddagiri
05b6d32aeb sched: Set MC (multi-core) sched domain's busy_factor attribute to 1
busy_factor attribute of a scheduler domain causes busy CPUs (CPUs
that are not idle) to load balance less frequently in that domain,
which could impact performance by increasing scheduling latency for
tasks.

As an example, consider MC scheduler domain's attribute values of
max_interval = 4ms and busy_factor = 64. Further consider
max_load_balance_interval = 100 (HZ/10). In this case, a non-idle CPU
could put off load balance check in MC domain by 100ms. This
effectively means that a CPU running a single task in its queue could
fail to notice increased load on another CPU (that is in same MC
domain) for upto 100ms, before picking up load from the overloaded
CPU. Needless to say, this leads to increased scheduling latency for
tasks, affecting performance adversely.

By setting MC domain's busy_factor value to 1, we limit maximum
interval that busy CPU can put off load balance checks to 4ms
(effectively 10ms, given HZ value of 100).

Change-Id: Id45869d06f5556ea8eec602b65c2ffd2143fe060
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-10-29 23:12:19 +08:00
Hong-Mei Li
2ace2c4bbc arm: lib: Fix makefile issue
Fix the typo introduced by commit fb7051a183cba3925f1f00c9bdadc2adcc682020.

Change-Id: Ia23c80f903ed431e60e4ba17c9c6d527e6febd87
Signed-off-by: Hong-Mei Li <a21834@motorola.com>
Reviewed-on: http://gerrit.pcs.mot.com/527915
SLT-Approved: Gerrit Code Review <gerrit-blurdev@motorola.com>
Tested-by: Jira Key <jirakey@motorola.com>
Reviewed-by: Christopher Fries <qcf001@motorola.com>
Submit-Approved: Jira Key <jirakey@motorola.com>
2016-10-29 23:12:18 +08:00
Chris Fries
60135ec5f7 msm: memutils: memcpy, memmove, copy_page optimization
Preload farther to take advantage of the memory bus, and assume
64-byte cache lines.  Unroll some pairs of ldm/stm as well, for
unexplainable reasons.

Future enhancements should include,

- #define for how far to preload, possibly defined separately for
  memcpy, copy_*_user
- Tuning for misaligned buffers
- Tuning for memmove
- Tuning for small buffers
- Understanding mechanism behind ldm/stm unroll causing some gains
  in copy_to_user

BASELINE (msm8960pro):
======================================================================
memcpy 1000MB at 5MB       : took 808850 usec, bandwidth 1236.236 MB/s
copy_to_user 1000MB at 5MB : took 810071 usec, bandwidth 1234.234 MB/s
copy_from_user 1000MB at 5M: took 942926 usec, bandwidth 1060.060 MB/s
memmove 1000GB at 5MB      : took 848588 usec, bandwidth 1178.178 MB/s
copy_to_user 1000GB at 4kB : took 847916 usec, bandwidth 1179.179 MB/s
copy_from_user 1000GB at 4k: took 935113 usec, bandwidth 1069.069 MB/s
copy_page 1000GB at 4kB    : took 779459 usec, bandwidth 1282.282 MB/s

THIS PATCH:
======================================================================
memcpy 1000MB at 5MB       : took 346223 usec, bandwidth 2888.888 MB/s
copy_to_user 1000MB at 5MB : took 348084 usec, bandwidth 2872.872 MB/s
copy_from_user 1000MB at 5M: took 348176 usec, bandwidth 2872.872 MB/s
memmove 1000GB at 5MB      : took 348267 usec, bandwidth 2871.871 MB/s
copy_to_user 1000GB at 4kB : took 377018 usec, bandwidth 2652.652 MB/s
copy_from_user 1000GB at 4k: took 371829 usec, bandwidth 2689.689 MB/s
copy_page 1000GB at 4kB    : took 383763 usec, bandwidth 2605.605 MB/s

Change-Id: I5e7605d4fb5c8492cfe7a73629d0d0ce7afd1f37
Signed-off-by: Chris Fries <C.Fries@motorola.com>
Reviewed-on: http://gerrit.pcs.mot.com/526804
SLT-Approved: Gerrit Code Review <gerrit-blurdev@motorola.com>
Tested-by: Jira Key <jirakey@motorola.com>
Reviewed-by: Igor Kovalenko <cik009@motorola.com>
Submit-Approved: Jira Key <jirakey@motorola.com>
2016-10-29 23:12:18 +08:00
Eric Dumazet
40df832fc0 softirq: reduce latencies
commit c10d73671a upstream.

In various network workloads, __do_softirq() latencies can be up
to 20 ms if HZ=1000, and 200 ms if HZ=100.

This is because we iterate 10 times in the softirq dispatcher,
and some actions can consume a lot of cycles.

This patch changes the fallback to ksoftirqd condition to :

- A time limit of 2 ms.
- need_resched() being set on current task

When one of this condition is met, we wakeup ksoftirqd for further
softirq processing if we still have pending softirqs.

Using need_resched() as the only condition can trigger RCU stalls,
as we can keep BH disabled for too long.

I ran several benchmarks and got no significant difference in
throughput, but a very significant reduction of latencies (one order
of magnitude) :

In following bench, 200 antagonist "netperf -t TCP_RR" are started in
background, using all available cpus.

Then we start one "netperf -t TCP_RR", bound to the cpu handling the NIC
IRQ (hard+soft)

Before patch :

RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
to 7.7.7.84 () port 0 AF_INET : first burst 0 : cpu bind
RT_LATENCY=550110.424
MIN_LATENCY=146858
MAX_LATENCY=997109
P50_LATENCY=305000
P90_LATENCY=550000
P99_LATENCY=710000
MEAN_LATENCY=376989.12
STDDEV_LATENCY=184046.92

After patch :

RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
to 7.7.7.84 () port 0 AF_INET : first burst 0 : cpu bind
RT_LATENCY=40545.492
MIN_LATENCY=9834
MAX_LATENCY=78366
P50_LATENCY=33583
P90_LATENCY=59000
P99_LATENCY=69000
MEAN_LATENCY=38364.67
STDDEV_LATENCY=12865.26

Change-Id: I94f96a9040a018644d3e2150f54acfd9a080992d
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: David Miller <davem@davemloft.net>
Cc: Tom Herbert <therbert@google.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[xr: Backported to 3.4: Adjust context]
Signed-off-by: Rui Xiang <rui.xiang@huawei.com>
Signed-off-by: Zefan Li <lizefan@huawei.com>
2016-10-29 23:12:18 +08:00
Dave Chinner
a8322514e2 sync: don't block the flusher thread waiting on IO
When sync does it's WB_SYNC_ALL writeback, it issues data Io and
then immediately waits for IO completion. This is done in the
context of the flusher thread, and hence completely ties up the
flusher thread for the backing device until all the dirty inodes
have been synced. On filesystems that are dirtying inodes constantly
and quickly, this means the flusher thread can be tied up for
minutes per sync call and hence badly affect system level write IO
performance as the page cache cannot be cleaned quickly.

We already have a wait loop for IO completion for sync(2), so cut
this out of the flusher thread and delegate it to wait_sb_inodes().
Hence we can do rapid IO submission, and then wait for it all to
complete.

Effect of sync on fsmark before the patch:

FSUse%        Count         Size    Files/sec     App Overhead
.....
     0       640000         4096      35154.6          1026984
     0       720000         4096      36740.3          1023844
     0       800000         4096      36184.6           916599
     0       880000         4096       1282.7          1054367
     0       960000         4096       3951.3           918773
     0      1040000         4096      40646.2           996448
     0      1120000         4096      43610.1           895647
     0      1200000         4096      40333.1           921048

And a single sync pass took:

  real    0m52.407s
  user    0m0.000s
  sys     0m0.090s

After the patch, there is no impact on fsmark results, and each
individual sync(2) operation run concurrently with the same fsmark
workload takes roughly 7s:

  real    0m6.930s
  user    0m0.000s
  sys     0m0.039s

IOWs, sync is 7-8x faster on a busy filesystem and does not have an
adverse impact on ongoing async data write operations.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Change-Id: I9e55d65f5ecb2305497711d4688f0647d9346035
2016-10-29 23:12:18 +08:00
Lee Susman
b91dba25fc mm: change initial readahead window size calculation
Change the logic which determines the initial readahead window size
such that for small requests (one page) the initial window size
will be x4 the size of the original request, regardless of the
VM_MAX_READAHEAD value. This prevents a rapid ramp-up
that could be caused due to increasing VM_MAX_READAHEAD.

Change-Id: I93d59c515d7e6c6d62348790980ff7bd4f434997
Signed-off-by: Lee Susman <lsusman@codeaurora.org>
2016-10-29 23:12:18 +08:00
Thomas Gleixner
c8ad3a7446 tick: Cleanup NOHZ per cpu data on cpu down
commit 4b0c0f294f upstream.

Prarit reported a crash on CPU offline/online. The reason is that on
CPU down the NOHZ related per cpu data of the dead cpu is not cleaned
up. If at cpu online an interrupt happens before the per cpu tick
device is registered the irq_enter() check potentially sees stale data
and dereferences a NULL pointer.

Cleanup the data after the cpu is dead.

Change-Id: I5f2be7afd398bc97b997ab2143f9d71230c44dd5
Reported-by: Prarit Bhargava <prarit@redhat.com>
Cc: Mike Galbraith <bitbucket@online.de>
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1305031451561.2886@ionos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-29 23:12:18 +08:00
Frederic Weisbecker
68a644ad50 nohz: Make tick_nohz_irq_exit() irq safe
commit e5ab012c32 upstream.

As it stands, irq_exit() may or may not be called with
irqs disabled, depending on __ARCH_IRQ_EXIT_IRQS_DISABLED
that the arch can define.

It makes tick_nohz_irq_exit() unsafe. For example two
interrupts can race in tick_nohz_stop_sched_tick(): the inner
most one computes the expiring time on top of the timer list,
then it's interrupted right before reprogramming the
clock. The new interrupt enqueues a new timer list timer,
it reprogram the clock to take it into account and it exits.
The CPUs resumes the inner most interrupt and performs the clock
reprogramming without considering the new timer list timer.

This regression has been introduced by:
     280f06774a
     ("nohz: Separate out irq exit and idle loop dyntick logic")

Let's fix it right now with the appropriate protections.

A saner long term solution will be to remove
__ARCH_IRQ_EXIT_IRQS_DISABLED and mandate that irq_exit() is called
with interrupts disabled.

Change-Id: Ib453b14a1bdaac53b6c685ba3eac92dbd0985b2a
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linuxfoundation.org>
Link: http://lkml.kernel.org/r/1361373336-11337-1-git-send-email-fweisbec@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Lingzhu Xiang <lxiang@redhat.com>
Reviewed-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-29 23:12:18 +08:00
Paul E. McKenney
ed4bddb779 rcu: Stop rcu_do_batch() from multiplexing the "count" variable
Commit b1420f1c (Make rcu_barrier() less disruptive) rearranged the
code in rcu_do_batch(), moving the ->qlen manipulation to follow
the requeueing of the callbacks.  Unfortunately, this rearrangement
clobbered the value of the "count" local variable before the value
of rdp->qlen was adjusted, resulting in the value of rdp->qlen being
inaccurate.  This commit therefore introduces an index variable "i",
avoiding the inadvertent multiplexing.

CRs-fixed: 657837
Change-Id: I1d7eab79a8e4105b407be87be3300129c32172ae
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Git-commit: b41772abeb
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Ramesh Gupta Guntha <rggupt@codeaurora.org>
2016-10-29 23:12:18 +08:00
Paul E. McKenney
7727c23657 timer: Fix mod_timer_pinned() header comment
The mod_timer_pinned() header comment states that it prevents timers
from being migrated to a different CPU.  This is not the case, instead,
it ensures that the timer is posted to the current CPU, but does nothing
to prevent CPU-hotplug operations from migrating the timer.

This commit therefore brings the comment header into alignment with
reality.

CRs-fixed: 657837
Change-Id: I244c4c385cd1c47df216feda7b580b2876fab723
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Git-commit: 048a0e8f5e
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Ramesh Gupta Guntha <rggupt@codeaurora.org>
2016-10-29 23:12:18 +08:00
Paul E. McKenney
e3538e0325 rcu: Make rcu_barrier() less disruptive
The rcu_barrier() primitive interrupts each and every CPU, registering
a callback on every CPU.  Once all of these callbacks have been invoked,
rcu_barrier() knows that every callback that was registered before
the call to rcu_barrier() has also been invoked.

However, there is no point in registering a callback on a CPU that
currently has no callbacks, most especially if that CPU is in a
deep idle state.  This commit therefore makes rcu_barrier() avoid
interrupting CPUs that have no callbacks.  Doing this requires reworking
the handling of orphaned callbacks, otherwise callbacks could slip through
rcu_barrier()'s net by being orphaned from a CPU that rcu_barrier() had
not yet interrupted to a CPU that rcu_barrier() had already interrupted.
This reworking was needed anyway to take a first step towards weaning
RCU from the CPU_DYING notifier's use of stop_cpu().

CRs-fixed: 657837
Change-Id: Icb4392d9dc2cd25d9c9ea05b93ea9e2a99d24bcb
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Git-commit: b1420f1c8b
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Ramesh Gupta Guntha <rggupt@codeaurora.org>
2016-10-29 23:12:18 +08:00
Paul E. McKenney
10dd2ef4aa rcu: Precompute RCU_FAST_NO_HZ timer offsets
When a CPU is entering dyntick-idle mode, tick_nohz_stop_sched_tick()
calls rcu_needs_cpu() see if RCU needs that CPU, and, if not, computes the
next wakeup time based on the timer wheels.  Only later, when actually
entering the idle loop, rcu_prepare_for_idle() will be invoked.  In some
cases, rcu_prepare_for_idle() will post timers to wake the CPU back up.
But all for naught: The next wakeup time for the CPU has already been
computed, and posting a timer afterwards does not force that wakeup
time to be recomputed.  This means that rcu_prepare_for_idle()'s have
no effect.

This is not a problem on a busy system because something else will wake
up the CPU soon enough.  However, on lightly loaded systems, the CPU
might stay asleep for a considerable length of time.  If that CPU has
a callback that the rest of the system is waiting on, the system might
run very slowly or (in theory) even hang.

This commit avoids this problem by having rcu_needs_cpu() give
tick_nohz_stop_sched_tick() an estimate of when RCU will need the CPU
to wake back up, which tick_nohz_stop_sched_tick() takes into account
when programming the CPU's wakeup time.  An alternative approach is
for rcu_prepare_for_idle() to use hrtimers instead of normal timers,
but timers are much more efficient than are hrtimers for frequently
and repeatedly posting and cancelling a given timer, which is exactly
what RCU_FAST_NO_HZ does.

CRs-fixed: 657837
Change-Id: I45c9163678240ca0b6fcac06b025d4cdb140907c
Reported-by: Pascal Chapperon <pascal.chapperon@wanadoo.fr>
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Tested-by: Pascal Chapperon <pascal.chapperon@wanadoo.fr>
Git-commit: aa9b16306e
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Ramesh Gupta Guntha <rggupt@codeaurora.org>
2016-10-29 23:12:18 +08:00
Paul E. McKenney
71ef5ad0f7 rcu: Move RCU_FAST_NO_HZ per-CPU variables to rcu_dynticks structure
The RCU_FAST_NO_HZ code relies on a number of per-CPU variables.
This works, but is hidden from someone scanning the data structures
in rcutree.h.  This commit therefore converts these per-CPU variables
to fields in the per-CPU rcu_dynticks structures.

CRs-fixed: 657837
Change-Id: I03e65c434b0156d4ab035421591aafb697ecd86b
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Tested-by: Pascal Chapperon <pascal.chapperon@wanadoo.fr>
Git-commit: 5955f7eecd
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Ramesh Gupta Guntha <rggupt@codeaurora.org>
2016-10-29 23:12:18 +08:00
Paul E. McKenney
33dc5bfbcc rcu: Update RCU_FAST_NO_HZ tracing for lazy callbacks
In the current code, a short dyntick-idle interval (where there is
at least one non-lazy callback on the CPU) and a long dyntick-idle
interval (where there are only lazy callbacks on the CPU) are traced
identically, which can be less than helpful.  This commit therefore
emits different event traces in these two cases.

CRs-fixed: 657837
Change-Id: I0392bf0b9ffe3e319bdf54e7ff77f86ce5a8f212
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Tested-by: Pascal Chapperon <pascal.chapperon@wanadoo.fr>
Git-commit: fd4b352687
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Ramesh Gupta Guntha <rggupt@codeaurora.org>
2016-10-29 23:12:18 +08:00
Paul E. McKenney
a674b5f9ba rcu: Explicitly initialize RCU_FAST_NO_HZ per-CPU variables
The current initialization of the RCU_FAST_NO_HZ per-CPU variables makes
needless and fragile assumptions about the initial value of things like
the jiffies counter.  This commit therefore explicitly initializes all of
them that are better started with a non-zero value.  It also adds some
comments describing the per-CPU state variables.

CRs-fixed: 657837
Change-Id: Ia82aae8f5441a73be7e427f5fb74ec107144fa6e
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Git-commit: 98248a0e24
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Ramesh Gupta Guntha <rggupt@codeaurora.org>
2016-10-29 23:12:18 +08:00
Paul E. McKenney
9920854d06 rcu: Make RCU_FAST_NO_HZ handle timer migration
The current RCU_FAST_NO_HZ assumes that timers do not migrate unless a
CPU goes offline, in which case it assumes that the CPU will have to come
out of dyntick-idle mode (cancelling the timer) in order to go offline.
This is important because when RCU_FAST_NO_HZ permits a CPU to enter
dyntick-idle mode despite having RCU callbacks pending, it posts a timer
on that CPU to force a wakeup on that CPU.  This wakeup ensures that the
CPU will eventually handle the end of the grace period, including invoking
its RCU callbacks.

However, Pascal Chapperon's test setup shows that the timer handler
rcu_idle_gp_timer_func() really does get invoked in some cases.  This is
problematic because this can cause the CPU that entered dyntick-idle
mode despite still having RCU callbacks pending to remain in
dyntick-idle mode indefinitely, which means that its RCU callbacks might
never be invoked.  This situation can result in grace-period delays or
even system hangs, which matches Pascal's observations of slow boot-up
and shutdown (https://lkml.org/lkml/2012/4/5/142).  See also the bugzilla:

	https://bugzilla.redhat.com/show_bug.cgi?id=806548

This commit therefore causes the "should never be invoked" timer handler
rcu_idle_gp_timer_func() to use smp_call_function_single() to wake up
the CPU for which the timer was intended, allowing that CPU to invoke
its RCU callbacks in a timely manner.

CRs-fixed: 657837
Change-Id: I39cc0c8bf36d2e3aa9c136e3cd399ab939daad2d
Reported-by: Pascal Chapperon <pascal.chapperon@wanadoo.fr>
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Git-commit: 21e52e1566
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Ramesh Gupta Guntha <rggupt@codeaurora.org>
2016-10-29 23:12:17 +08:00
Paul E. McKenney
aa39813d91 rcu: Make exit_rcu() more precise and consolidate
When running preemptible RCU, if a task exits in an RCU read-side
critical section having blocked within that same RCU read-side critical
section, the task must be removed from the list of tasks blocking a
grace period (perhaps the current grace period, perhaps the next grace
period, depending on timing).  The exit() path invokes exit_rcu() to
do this cleanup.

However, the current implementation of exit_rcu() needlessly does the
cleanup even if the task did not block within the current RCU read-side
critical section, which wastes time and needlessly increases the size
of the state space.  Fix this by only doing the cleanup if the current
task is actually on the list of tasks blocking some grace period.

While we are at it, consolidate the two identical exit_rcu() functions
into a single function.

CRs-fixed: 657837
Change-Id: I59504aedb12064fcd6ce03973d441b547749076a
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: 9dd8fb16c3
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
[rggupt@codeaurora.org: resolve trivial merge conflicts]
Signed-off-by: Ramesh Gupta Guntha <rggupt@codeaurora.org>
2016-10-29 23:12:17 +08:00
Paul E. McKenney
334860debc rcu: Ensure that RCU_FAST_NO_HZ timers expire on correct CPU
Timers are subject to migration, which can lead to the following
system-hang scenario when CONFIG_RCU_FAST_NO_HZ=y:

1.	CPU 0 executes synchronize_rcu(), which posts an RCU callback.

2.	CPU 0 then goes idle.  It cannot immediately invoke the callback,
	but there is nothing RCU needs from ti, so it enters dyntick-idle
	mode after posting a timer.

3.	The timer gets migrated to CPU 1.

4.	CPU 0 never wakes up, so the synchronize_rcu() never returns, so
	the system hangs.

This commit fixes this problem by using mod_timer_pinned(), as suggested
by Peter Zijlstra, to ensure that the timer is actually posted on the
running CPU.

CRs-fixed: 657837
Change-Id: Id4b3d586dc08db9b9a80739c3a05163b3be69e72
Reported-by: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Git-commit: f511fc6246
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Ramesh Gupta Guntha <rggupt@codeaurora.org>
2016-10-29 23:12:17 +08:00
Paul E. McKenney
c0c7fc6063 rcu: Add warning for RCU_FAST_NO_HZ timer firing
RCU_FAST_NO_HZ uses a timer to limit the time that a CPU with callbacks
can remain in dyntick-idle mode.  This timer is cancelled when the CPU
exits idle, and therefore should never fire.  However, if the timer
were migrated to some other CPU for whatever reason (1) the timer could
actually fire and (2) firing on some other CPU would fail to wake up the
CPU with callbacks, possibly resulting in sluggishness or a system hang.

This commit therfore adds a WARN_ON_ONCE() to the timer handler in order
to detect this condition.

CRs-fixed: 657837
Change-Id: Ie667bd7ee157668b2f0fca7eaf67cb746be1d674
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Git-commit: 79b9a75fb7
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Ramesh Gupta Guntha <rggupt@codeaurora.org>
2016-10-29 23:12:17 +08:00
Paul E. McKenney
fd7133ac3c rcu: Make RCU_FAST_NO_HZ account for pauses out of idle
Both Steven Rostedt's new idle-capable trace macros and the RCU_NONIDLE()
macro can cause RCU to momentarily pause out of idle without the rest
of the system being involved.  This can cause rcu_prepare_for_idle()
to run through its state machine too quickly, which can in turn result
in needless scheduling-clock interrupts.

This commit therefore adds code to enable rcu_prepare_for_idle() to
distinguish between an initial entry to idle on the one hand (which needs
to advance the rcu_prepare_for_idle() state machine) and an idle reentry
due to idle-capable trace macros and RCU_NONIDLE() on the other hand
(which should avoid advancing the rcu_prepare_for_idle() state machine).
Additional state is maintained to allow the timer to be correctly reposted
when returning after a momentary pause out of idle, and even more state
is maintained to detect when new non-lazy callbacks have been enqueued
(which may require re-evaluation of the approach to idleness).

CRs-fixed: 657837
Change-Id: I7c5ef8183d42982876a50b500da080f15c61b848
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Git-commit: c57afe80db
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Ramesh Gupta Guntha <rggupt@codeaurora.org>
2016-10-29 23:12:17 +08:00
Paul E. McKenney
5cf55e98f7 rcu: Make RCU_FAST_NO_HZ use timer rather than hrtimer
The RCU_FAST_NO_HZ facility uses an hrtimer to wake up a CPU when
it is allowed to go into dyntick-idle mode, which is almost always
cancelled soon after.  This is not what hrtimers are good at, so
this commit switches to the timer wheel.

CRs-fixed: 657837
Change-Id: Ic528055544c929251f239c729c221cfcee539cfa
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Git-commit: 2ee3dc8066
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Ramesh Gupta Guntha <rggupt@codeaurora.org>
2016-10-29 23:12:17 +08:00
Paul E. McKenney
4445aae216 rcu: Add RCU_FAST_NO_HZ tracing for idle exit
Traces of rcu_prep_idle events can be confusing because
rcu_cleanup_after_idle() does no tracing.  This commit therefore adds
this tracing.

CRs-fixed: 657837
Change-Id: I8ce6251d06e25f3ff26fe016aae456a33c40a466
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Git-commit: 2fdbb31b66
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Ramesh Gupta Guntha <rggupt@codeaurora.org>
2016-10-29 23:12:17 +08:00
Antti P Miettinen
c0ba586010 rcu: Add a module parameter to force use of expedited RCU primitives
There have been some embedded applications that would benefit from
use of expedited grace-period primitives.  In some ways, this is
similar to synchronize_net() doing either a normal or an expedited
grace period depending on lock state, but with control outside of
the kernel.

This commit therefore adds rcu_expedited boot and sysfs parameters
that cause the kernel to substitute expedited primitives for the
normal grace-period primitives.

[ paulmck: Add trace/event/rcu.h to kernel/srcu.c to avoid build error.
	   Get rid of infinite loop through contention path.]

CRs-fixed: 634363
Change-Id: I45addcb532fdaa47df3019ada3283e293ed40249
Signed-off-by: Antti P Miettinen <amiettinen@nvidia.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Git-commit: 3705b88db0
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
[ joonwoop@codeaurora.org: resolved merge conflicts.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-10-29 23:12:17 +08:00
Jan Kara
3b1a823ca1 writeback: Fix occasional slow sync(1)
In case when system contains no dirty pages, wakeup_flusher_threads()
will submit WB_SYNC_NONE writeback for 0 pages so wb_writeback() exits
immediately without doing anything. Thus sync(1) will write all the
dirty inodes from a WB_SYNC_ALL writeback pass which is slow.

Fix the problem by using get_nr_dirty_pages() in
wakeup_flusher_threads() instead of calculating number of dirty pages
manually. That function also takes number of dirty inodes into account.

Change-Id: I458027ae08d9a5a93202a7b97ace1f8da7a18a07
CC: stable@vger.kernel.org
Reported-by: Paul Taysom <taysom@chromium.org>
Signed-off-by: Jan Kara <jack@suse.cz>
2016-10-29 23:12:17 +08:00
Junxiao Bi
73af84d854 writeback: fix race that cause writeback hung
There is a race between mark inode dirty and writeback thread, see the
following scenario.  In this case, writeback thread will not run though
there is dirty_io.

__mark_inode_dirty()                                          bdi_writeback_workfn()
	...                                                       	...
	spin_lock(&inode->i_lock);
	...
	if (bdi_cap_writeback_dirty(bdi)) {
	    <<< assume wb has dirty_io, so wakeup_bdi is false.
	    <<< the following inode_dirty also have wakeup_bdi false.
	    if (!wb_has_dirty_io(&bdi->wb))
		    wakeup_bdi = true;
	}
	spin_unlock(&inode->i_lock);
	                                                            <<< assume last dirty_io is removed here.
	                                                            pages_written = wb_do_writeback(wb);
	                                                            ...
	                                                            <<< work_list empty and wb has no dirty_io,
	                                                            <<< delayed_work will not be queued.
	                                                            if (!list_empty(&bdi->work_list) ||
	                                                                (wb_has_dirty_io(wb) && dirty_writeback_interval))
	                                                                queue_delayed_work(bdi_wq, &wb->dwork,
	                                                                    msecs_to_jiffies(dirty_writeback_interval * 10));
	spin_lock(&bdi->wb.list_lock);
	inode->dirtied_when = jiffies;
	<<< new dirty_io is added.
	list_move(&inode->i_wb_list, &bdi->wb.b_dirty);
	spin_unlock(&bdi->wb.list_lock);

	<<< though there is dirty_io, but wakeup_bdi is false,
	<<< so writeback thread will not be waked up and
	<<< the new dirty_io will not be flushed.
	if (wakeup_bdi)
	    bdi_wakeup_thread_delayed(bdi);

Writeback will run until there is a new flush work queued.  This may cause
a lot of dirty pages stay in memory for a long time.

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Francisco Franco <franciscofranco.1990@gmail.com>

Change-Id: I973fcba5381881a003a035ffff48f64348660079
2016-10-29 23:12:17 +08:00
Pushkaraj Patil
5adb2211ff msm: vidc: return error in case of init failure
Return error if error comes in video driver
initialization.

Change-Id: I54c516808587fb7d94c3d517f8c72c6c9aa9d91d
CRs-fixed: 542535
Signed-off-by: Pushkaraj Patil <ppatil@codeaurora.org>
2016-10-29 23:12:17 +08:00
Chintan Pandya
4d97ab65ad ksm: Provide support to use deferred timers for scanner thread
KSM thread to scan pages is getting schedule on definite timeout.
That wakes up CPU from idle state and hence may affect the power
consumption. Provide an optional support to use deferred timer
which suites low-power use-cases.

To enable deferred timers,
$ echo 1 > /sys/kernel/mm/ksm/deferred_timer

Change-Id: I07fe199f97fe1f72f9a9e1b0b757a3ac533719e8
Signed-off-by: Chintan Pandya <cpandya@codeaurora.org>
2016-10-29 23:12:17 +08:00
Stephen Boyd
068766520e ARM: sched_clock: Load cycle count after epoch stabilizes
There is a small race between when the cycle count is read from
the hardware and when the epoch stabilizes. Consider this
scenario:

 CPU0                           CPU1
 ----                           ----
 cyc = read_sched_clock()
 cyc_to_sched_clock()
                                 update_sched_clock()
                                  ...
                                  cd.epoch_cyc = cyc;
  epoch_cyc = cd.epoch_cyc;
  ...
  epoch_ns + cyc_to_ns((cyc - epoch_cyc)

The cyc on cpu0 was read before the epoch changed. But we
calculate the nanoseconds based on the new epoch by subtracting
the new epoch from the old cycle count. Since epoch is most likely
larger than the old cycle count we calculate a large number that
will be converted to nanoseconds and added to epoch_ns, causing
time to jump forward too much.

Fix this problem by reading the hardware after the epoch has
stabilized.

Change-Id: I995133b229b2c2fedd5091406d1dc366d8bfff7b
Cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Git-commit: 336ae1180d
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
[sboyd: reworked for file movement kernel/time -> arm/kernel]
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
2016-10-29 23:12:17 +08:00
Joonsoo Kim
28ca825289 ARM: 7643/1: sched: correct update_sched_clock()
If we want load epoch_cyc and epoch_ns atomically,
we should update epoch_cyc_copy first of all.
This notify reader that updating is in progress.

If we update epoch_cyc first like as current implementation,
there is subtle error case.
Look at the below example.

<Initial Condition>
cyc = 9
ns = 900
cyc_copy = 9

== CASE 1 ==
<CPU A = reader>           <CPU B = updater>
                           write cyc = 10
read cyc = 10
read ns = 900
                           write ns = 1000
                           write cyc_copy = 10
read cyc_copy = 10

output = (10, 900)

== CASE 2 ==
<CPU A = reader>           <CPU B = updater>
read cyc = 9
                           write cyc = 10
                           write ns = 1000
read ns = 1000
read cyc_copy = 9
                           write cyc_copy = 10
output = (9, 1000)

If atomic read is ensured, output should be (9, 900) or (10, 1000).
But, output in example case are not.

So, change updating sequence in order to correct this problem.

Change-Id: Ia9196dd50a519f516f70c3138233624b669ef96a
Cc: <stable@vger.kernel.org>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
CRs-Fixed: 497236
Git-commit: 7c4e9ced42
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-10-29 23:12:17 +08:00
Felipe Balbi 2
56a4076e7a ARM: 7565/1: sched: stop sched_clock() during suspend
The scheduler imposes a requirement to sched_clock()
which is to stop the clock during suspend, if we don't
do that any RT thread will be rescheduled in the future
which might cause any sort of problems.

This became an issue on OMAP when we converted omap-i2c.c
to use threaded IRQs, it turned out that depending on how
much time we spent on suspend, the I2C IRQ thread would
end up being rescheduled so far in the future that I2C
transfers would timeout and, because omap_hsmmc depends
on an I2C-connected device to detect if an MMC card is
inserted in the slot, our rootfs would just vanish.

arch/arm/kernel/sched_clock.c already had an optional
implementation (sched_clock_needs_suspend()) which would
handle scheduler's requirement properly, what this patch
does is simply to make that implementation non-optional.

Note that this has the side-effect that printk timings
won't reflect the actual time spent on suspend so other
methods to measure that will have to be used.

This has been tested with beagleboard XM (OMAP3630) and
pandaboard rev A3 (OMAP4430). Suspend to RAM is now working
after this patch.

Thanks to Kevin Hilman for helping out with debugging.

Change-Id: Ie2f9e3b22eb3d1f3806cf8c598f22e2fa1b8651f
Acked-by: Kevin Hilman <khilman@ti.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
CRs-Fixed: 497236
Git-commit: 6a4dae5e13
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-10-29 23:12:16 +08:00
Colin Cross
e037d74cff ARM: 7486/1: sched_clock: update epoch_cyc on resume
Many clocks that are used to provide sched_clock will reset during
suspend.  If read_sched_clock returns 0 after suspend, sched_clock will
appear to jump forward.  This patch resets cd.epoch_cyc to the current
value of read_sched_clock during resume, which causes sched_clock() just
after suspend to return the same value as sched_clock() just before
suspend.

In addition, during the window where epoch_ns has been updated before
suspend, but epoch_cyc has not been updated after suspend, it is unknown
whether the clock has reset or not, and sched_clock() could return a
bogus value.  Add a suspended flag, and return the pre-suspend epoch_ns
value during this period.

The new behavior is triggered by calling setup_sched_clock_needs_suspend
instead of setup_sched_clock.

Change-Id: I7441ef74dc6802c00eea61f3b8c0a25ac00a724d
Signed-off-by: Colin Cross <ccross@android.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
CRs-Fixed: 497236
Git-commit: 237ec6f2e5
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-10-29 23:12:16 +08:00
Zhao Wei Liew
c3ede8f87a flo: Enable F2FS
F2FS has been shown to provide all-around performance improvements.

Change-Id: Ife324169b292c6dfa2e70fc40306cf453289127e
Signed-off-by: Zhao Wei Liew <zhaoweiliew@gmail.com>
2016-10-29 23:12:16 +08:00
Jaegeuk Kim
d65ad6ea17 f2fs: catch up to v4.4-rc1
The last patch is:

commit beaa57dd986d4f398728c060692fc2452895cfd8
Author: Chao Yu <chao2.yu@samsung.com>
Date:   Thu Oct 22 18:24:12 2015 +0800

    f2fs: fix to skip shrinking extent nodes

    In f2fs_shrink_extent_tree we should stop shrink flow if we have already
    shrunk enough nodes in extent cache.

Change-Id: I7bc76a98ce99412c59435f4573ace38fca604694
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-10-29 23:12:16 +08:00
Zhao Wei Liew
2599253f07 flo: Compress the kernel with XZ
The recovery image is too large when built with GCC 4.9.
Switch to XZ compression to reduce the image size.

Change-Id: I33e04f0daacc2bfac2212721921450f6a7927eb7
Signed-off-by: Zhao Wei Liew <zhaoweiliew@gmail.com>
2016-10-29 23:12:16 +08:00
Roman Gushchin
2b1264df0c fuse: break infinite loop in fuse_fill_write_pages()
I got a report about unkillable task eating CPU. Further
investigation shows, that the problem is in the fuse_fill_write_pages()
function. If iov's first segment has zero length, we get an infinite
loop, because we never reach iov_iter_advance() call.

Fix this by calling iov_iter_advance() before repeating an attempt to
copy data from userspace.

A similar problem is described in 124d3b7041 ("fix writev regression:
pan hanging unkillable and un-straceable"). If zero-length segmend
is followed by segment with invalid address,
iov_iter_fault_in_readable() checks only first segment (zero-length),
iov_iter_copy_from_user_atomic() skips it, fails at second and
returns zero -> goto again without skipping zero-length segment.

Patch calls iov_iter_advance() before goto again: we'll skip zero-length
segment at second iteraction and iov_iter_fault_in_readable() will detect
invalid address.

Special thanks to Konstantin Khlebnikov, who helped a lot with the commit
description.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Maxim Patlasov <mpatlasov@parallels.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Roman Gushchin <klamm@yandex-team.ru>
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Fixes: ea9b9907b8 ("fuse: implement perform_write")
Cc: <stable@vger.kernel.org>

Change-Id: Id37193373294dd43191469389cfe68ca1736a54b
2016-10-29 23:12:16 +08:00
Deepak Verma
54c5b07af9 msm: vidc: Initialize kernel space stack variables
This change initializes kernel space stack variables
that are passed between kernel space and user space
using ioctls.
Non initialization of these variables may lead to
leakage of memory values from the kernel stack to
user space.

Change-Id: Icb195470545ee48b55671ac09798610178e833e1
CRs-fixed: 556771,563420
Signed-off-by: Deepak Verma <dverma@codeaurora.org>
[zhaoweiliew: Remove unported LTR and GET_PERF_LEVEL features]
Signed-off-by: Zhao Wei Liew <zhaoweiliew@gmail.com>
2016-10-29 23:12:16 +08:00
Vishnuvardhan Prodduturi
e6857b53c3 msm: display: Limit dynamic fps feature only to MIPI video panels.
As dynamic fps feature is supported only for MIPI video panels,
return -EINVAL if sysfs node is written for panels other than
MIPI video. This change also removes world writable permissions
for sysfs node.

Change-Id: I72260e6032cb8a758b0457c36a808263a981260f
Signed-off-by: Vishnuvardhan Prodduturi <vproddut@codeaurora.org>
2016-10-29 23:12:16 +08:00
Maheshwar Ajja
42bd0ced12 msm: vidc: Fix possible memory corruption
This change fix the possible memory corruption
by increasing the local array sizes in video
encoder.

Change-Id: If03cee2f428c3cef863178da70bc083159f203f5
Signed-off-by: Maheshwar Ajja <majja@codeaurora.org>
2016-10-29 23:12:16 +08:00
Sonny Rao
b016353abb mm: fix calculation of dirtyable memory
The system uses global_dirtyable_memory() to calculate number of
dirtyable pages/pages that can be allocated to the page cache.  A bug
causes an underflow thus making the page count look like a big unsigned
number.  This in turn confuses the dirty writeback throttling to
aggressively write back pages as they become dirty (usually 1 page at a
time).  This generally only affects systems with highmem because the
underflowed count gets subtracted from the global count of dirtyable
memory.

The problem was introduced with v3.2-4896-gab8fabd

Fix is to ensure we don't get an underflowed total of either highmem or
global dirtyable memory.

Change-Id: Ib0c4a8f99870edc5ded863b88a472f2836c690d2
Signed-off-by: Sonny Rao <sonnyrao@chromium.org>
Signed-off-by: Puneet Kumar <puneetster@chromium.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Tested-by: Damien Wyart <damien.wyart@free.fr>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: c8b74c2f66
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
2016-10-29 23:12:16 +08:00
Srivatsa Vaddagiri
ff3568039f arm: Remove no-longer-required RCU_NONIDLE wrapper
Commit 21111be8 "ARM: Fix negative idle stats for offline cpu" moved call
to cpu_die() to occur outside of rcu_idle_enter()/rcu_idle_exit() section.
As a result, the RCU_NONIDLE() wrapper to complete() call in
arch/arm/kernel/process.c:cpu_die() is no longer required (and is
technically incorrect to have). Removing RCU_NONIDLE() wrapper also removes
this warning seen during CPU offline:

[Note: Below message has been edited to fit 75-char per line limit.
Insignificant portions of warning message has been removed in each line.]

------------[ cut here ]------------
WARNING: at kernel/rcutree.c:456 rcu_idle_exit_common+0x4c/0xe0()
Modules linked in:
(unwind_backtrace+0x0/0x120) from (warn_slowpath_common+0x4c/0x64)
(warn_slowpath_common+0x4c/0x64) from (warn_slowpath_null+0x18/0x1c)
(warn_slowpath_null+0x18/0x1c) from (rcu_idle_exit_common+0x4c/0xe0)
(rcu_idle_exit_common+0x4c/0xe0) from (rcu_idle_exit+0xa8/0xc0)
(rcu_idle_exit+0xa8/0xc0) from (cpu_die+0x24/0x5c)
(cpu_die+0x24/0x5c) from (cpu_idle+0xdc/0xf0)
(cpu_idle+0xdc/0xf0) from (0x8160)
---[ end trace 61bf21937a496a37 ]---

Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
Change-Id: I721e6b20651674e6f6f584cf8d814af00b688c91
2016-10-29 23:12:16 +08:00
Taniya Das
7d5c5d87db ARM: Fix negative idle stats for offline cpu
We see negative idle stats because of cpu dying without
cleaning up idle entry statistics. When a cpu is offline,
the most immediate thing you'd want to do is just call
into cpu_die() and not waste time calling notifier(IDLE_START).

CRs-Fixed: 414554
Change-Id: Iadc6a3ca39997e0ccf65d2a29b004e24b1b211a1
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
Signed-off-by: Taniya Das <tdas@codeaurora.org>
2016-10-29 23:12:16 +08:00
Prakash Kamliya
533a71107c msm: kgsl: Fix spinlock recursion in destroy pagetable
pagetable list is protected by ptlock. Few functions
while iterating over a pagetable list takes same ptlock,
kgsl_destroy_pagetable() also need same lock. This will
cause spinlock recursion if kgsl_destroy_pagetable()
called while iterating over a list. Created two versions
of same function one is with lock and other is without
lock.

CRs-Fixed: 621172
Change-Id: I61440f99022fce8629a57bb5661e2eef9613187b
Signed-off-by: Prakash Kamliya <pkamliya@codeaurora.org>
2016-10-29 23:12:15 +08:00
Utsab Bose
a378127551 msm: dma: Moving queue_work() function within spinlock
Currently we are adding a dma command to the staged list with in a
spinlock and then adding to workqueue using queue_work after unlocking
the spinlock. With this there is chance of executing DMA commands in out
of order in below concurrency case.

Tread1                                Thread2
__msm_dmov_enqueue_cmd_ext
   spin_lock_irqsave(..)
   list_add_tail(..)
   spin_unlock_irqrestore(..)
       --PREEMPT--
                                      __msm_dmov_enqueue_cmd_ext
                                         spin_lock_irqsave(..)
                                         list_add_tail(..)
                                         spin_unlock_irqrestore(..)
                                         queue_work()
                                         ..
   queue_work()
So adding queue_work with in spin_lock will make sure that the
work added in the work_queue is processed in the same order as they
are added in staged_commands.

CRs-Fixed: 423190
Change-Id: I2ffd1327fb5f0cd1f06db7de9c026d1c4997fe4d
Acked-by: Gopi Krishna Nedanuri <gnedanur@qti.qualcomm.com>
Signed-off-by: Utsab Bose <ubose@codeaurora.org>
2016-10-29 23:12:15 +08:00
Hareesh Gundu
2ce8d472c1 msm: kgsl: Fix Z180 memory leak
Decerement entry refcount, which incremented
in kgsl_sharedmem_find_region.

CRs-Fixed: 635747
Change-Id: I621ba8f8e119a9ab8ba5455b28a565e3cae2f7cd
Signed-off-by: Hareesh Gundu <hareeshg@codeaurora.org>
2016-10-29 23:12:15 +08:00
Alok Chauhan
c3bac0cb1e msm: msm_bus: Fix the type error causing bandwidth overflow
On legacy chipsets, long int was being used to store
return value after calculating interleaved bw. However,
NoCs support 64-bit integers ab/ib values. problme occurs
if client request for higher bw and if difference of ab
value exceeds the range of 32 bit integer, the Value
overflows and turns negative, which leads to wrong bw calculation.

This patch fixes this integer overflow by correcting argument
type to store bw.

CRs-Fixed: 537213
Change-Id: I8c6c79ba245a988c2c54ccaca3f3eaf5cb857ce5
Signed-off-by: Alok Chauhan <alokc@codeaurora.org>
2016-10-29 23:12:15 +08:00