Commit Graph

136 Commits

Author SHA1 Message Date
Kevin F. Haggerty 238a0fb5ad Merge tag 'v3.4.113' into lineage-16.0
This is the 3.4.113 stable release

Change-Id: I80791430656359c5447a675cbff4431362d18df0
Signed-off-by: Kevin F. Haggerty <haggertk@lineageos.org>
2019-08-05 14:20:47 +02:00
Tom Marshall e864a35d26 kernel: Only expose su when daemon is running
It has been claimed that the PG implementation of 'su' has security
vulnerabilities even when disabled.  Unfortunately, the people that
find these vulnerabilities often like to keep them private so they
can profit from exploits while leaving users exposed to malicious
hackers.

In order to reduce the attack surface for vulnerabilites, it is
therefore necessary to make 'su' completely inaccessible when it
is not in use (except by the root and system users).

Change-Id: Ia7d50ba46c3d932c2b0ca5fc8e9ec69ec9045f85
Signed-off-by: Kevin F. Haggerty <haggertk@lineageos.org>
2019-08-05 09:12:33 +02:00
Francescodario Cuzzocrea 85baa390bf misc: Import SM-G900H kernel source code
* Samsung Package Version: G800HXXU1CRJ1
    * CAF Tag: LA.BF.1.1.3-00110-8x26.0
2019-08-02 15:14:10 +02:00
Srivatsa Vaddagiri 5c5d24f1a4 sched: Fix reference to stale task_struct in try_to_wake_up()
try_to_wake_up() currently drops p->pi_lock and later checks for need
to notify cpufreq governor on task migrations or wakeups. However the
woken task could exit between the time p->pi_lock is released and the
time the test for notification is run. As a result, the test for
notification could refer to an exited task. task_notify_on_migrate(p)
could thus lead to invalid memory reference.

Fix this by running the test for notification with task's pi_lock
held.

Change-Id: I1c7a337473d2d8e79342a015a179174ce00702e1
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-11-25 12:13:58 +05:30
Steve Muckle 39414b4b9a sched: fix race between try_to_wake_up() and move_task()
Until a task's state has been seen as interruptible/uninterruptible
and it is no longer on_cpu, it is possible that the task may move
to another CPU (load balancing may cause this). Here is an example
where the race condition results in incorrect operation:

- cpu 0 calls put_prev_task on task A, task A's state is TASK_RUNNING
- cpu 0 runs task B, which attempts to wake up A
- cpu 0 begins try_to_wake_up(), recording src_cpu for task A as cpu 0
- cpu 1 then pulls task A (perhaps due to idle balance)
- cpu 1 runs task A, which then sleeps, becoming INTERRUPTIBLE
- cpu 0 continues in try_to_wake_up(), thinking task A's previous
  cpu is 0, where it is actually 1
- if select_task_rq returns cpu 0, task A will be woken up on cpu 0
  without properly updating its cpu to 0 in set_task_cpu()

CRs-Fixed: 665958
Change-Id: Icee004cb320bd8edfc772d9f74e670a9d4978a99
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2014-07-30 14:10:13 -07:00
Paul E. McKenney 2a3e318d6c Revert "rcu: Move PREEMPT_RCU preemption to switch_to() invocation"
This reverts commit 616c310e83b872024271c915c1b9ab505b9efad9.
(Move PREEMPT_RCU preemption to switch_to() invocation).
Testing by Sasha Levin <levinsasha928@gmail.com> showed that this
can result in deadlock due to invoking the scheduler when one of
the runqueue locks is held.  Because this commit was simply a
performance optimization, revert it.

CRs-fixed: 657837
Change-Id: Idc7a560cf2d1696d8cd7d24c99747cdca227bcc2
Reported-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Sasha Levin <levinsasha928@gmail.com>
Git-commit: cba6d0d64ee53772b285d0c0c288deefbeaf7775
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Ramesh Gupta Guntha <rggupt@codeaurora.org>
2014-07-21 21:46:54 +05:30
Paul E. McKenney 9ea92e946e rcu: Move PREEMPT_RCU preemption to switch_to() invocation
Currently, PREEMPT_RCU readers are enqueued upon entry to the scheduler.
This is inefficient because enqueuing is required only if there is a
context switch, and entry to the scheduler does not guarantee a context
switch.

The commit therefore moves the enqueuing to immediately precede the
call to switch_to() from the scheduler.

CRs-fixed: 657837
Change-Id: I7d824196483cafb82371b978f3d995a3ab1696a0
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: 616c310e83b872024271c915c1b9ab505b9efad9
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Ramesh Gupta Guntha <rggupt@codeaurora.org>
2014-07-21 21:28:26 +05:30
Linux Build Service Account 717d33e877 Merge "sched: Fix hotplug vs. set_cpus_allowed_ptr()" 2014-07-15 21:44:09 -07:00
Lai Jiangshan 5967c327e7 sched: Fix hotplug vs. set_cpus_allowed_ptr()
Lai found that:

  WARNING: CPU: 1 PID: 13 at arch/x86/kernel/smp.c:124 native_smp_send_reschedule+0x2d/0x4b()
  ...
  migration_cpu_stop+0x1d/0x22

was caused by set_cpus_allowed_ptr() assuming that cpu_active_mask is
always a sub-set of cpu_online_mask.

This isn't true since 5fbd036b55 ("sched: Cleanup cpu_active madness").

So set active and online at the same time to avoid this particular
problem.

CRs-Fixed: 680496
Change-Id: I89ac9b6829acf200072975bc7d028a469167f083
Fixes: 5fbd036b55 ("sched: Cleanup cpu_active madness")
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael wang <wangyun@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Link: http://lkml.kernel.org/r/53758B12.8060609@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Git-commit: 24d52daafcd53f109893ec2faf668c8eeef4a382
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Matt Wagantall <mattw@codeaurora.org>
2014-06-25 19:25:56 -07:00
Neil Zhang 702603d5e8 sched: Remove redundant update_runtime notifier
migration_call() will do all the things that update_runtime() does.
So let's remove it.

Furthermore, there is potential risk that the current code will catch
BUG_ON at line 689 of rt.c when do cpu hotplug while there are realtime
threads running because of enabling runtime twice while the rt_runtime
may already changed.

Change-Id: If2d953316d93c6b7e32f94bd49f2c10e64de6ed8
Signed-off-by: Neil Zhang <zhangwm@marvell.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1365685499-26515-1-git-send-email-zhangwm@marvell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Git-commit: c5405a495e88d93cf9b4f4cc91507c7f4afcb901
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
[mattw@codeaurora.org: resolved trivial header file context conflict]
Signed-off-by: Matt Wagantall <mattw@codeaurora.org>
2014-06-24 11:22:47 -07:00
Shawn Bohrer 192f07a56e sched/rt: Use root_domain of rt_rq not current processor
When the system has multiple domains do_sched_rt_period_timer()
can run on any CPU and may iterate over all rt_rq in
cpu_online_mask.  This means when balance_runtime() is run for a
given rt_rq that rt_rq may be in a different rd than the current
processor.  Thus if we use smp_processor_id() to get rd in
do_balance_runtime() we may borrow runtime from a rt_rq that is
not part of our rd.

This changes do_balance_runtime to get the rd from the passed in
rt_rq ensuring that we borrow runtime only from the correct rd
for the given rt_rq.

This fixes a BUG at kernel/sched/rt.c:687! in __disable_runtime
when we try reclaim runtime lent to other rt_rq but runtime has
been lent to a rt_rq in another rd.

Change-Id: Id8b9f88bdd783ef87bfa8d288a2fe7e86dd0b1e2
Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Mike Galbraith <bitbucket@online.de>
Cc: peterz@infradead.org
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/1358186131-29494-1-git-send-email-sbohrer@rgmadvisors.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Git-commit: 89960feebaf4f9a53f93a0ce6888207e4a808799
Git-repo: git://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-03-18 19:01:28 -07:00
Linux Build Service Account 9afa842433 Merge "sched,cgroup: Fix up task_groups list" 2014-02-01 22:37:45 -08:00
Syed Rameez Mustafa df0b728e93 sched: convert WARN_ON() to printk_sched() in try_to_wake_up_local()
try_to_wake_up_local() is called with the rq lock held. Printing to
console in this context can result in a deadlock if klogd needs to
be woken up. Print to the kernel log buffer via printk_sched()
instead which avoids the wakeup.

Change-Id: Ia07baea3cb7e0b158545207fdbbb866203256d3c
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-01-30 18:46:49 -08:00
Mike Galbraith bbea0c183e sched,cgroup: Fix up task_groups list
With multiple instances of task_groups, for_each_rt_rq() is a noop,
no task groups having been added to the rt.c list instance.  This
renders __enable/disable_runtime() and print_rt_stats() noop, the
user (non) visible effect being that rt task groups are missing in
/proc/sched_debug.

Change-Id: Ib337ebda0e752f838d787c03fe1b0a8f03ea0551
Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: stable@kernel.org # v3.3+
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1344308413.6846.7.camel@marge.simpson.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Git-commit: 35cf4e50b16331def6cfcbee11e49270b6db07f5
Git-repo: git://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-01-30 17:26:32 -08:00
Matt Wagantall bfc9623ffa sched/debug: Make sysrq prints of sched debug data optional
Calls to sysrq_sched_debug_show() can yield rather verbose output
which contributes to log spew and, under heavy load, may increase
the chances of a watchdog bark.

Make printing of this data optional with the introduction of a
new Kconfig, CONFIG_SYSRQ_SCHED_DEBUG.

Change-Id: I5f54d901d0dea403109f7ac33b8881d967a899ed
Signed-off-by: Matt Wagantall <mattw@codeaurora.org>
2013-12-10 22:18:07 -08:00
Linux Build Service Account 95d087c820 Merge "tracing/sched: add load balancer tracepoint" 2013-11-23 10:10:21 -08:00
Vincent Guittot f1db23f76d sched: Fix clear NOHZ_BALANCE_KICK
I have faced a sequence where the Idle Load Balance was sometime not
triggered for a while on my platform, in the following scenario:

 CPU 0 and CPU 1 are running tasks and CPU 2 is idle

 CPU 1 kicks the Idle Load Balance
 CPU 1 selects CPU 2 as the new Idle Load Balancer
 CPU 2 sets NOHZ_BALANCE_KICK for CPU 2
 CPU 2 sends a reschedule IPI to CPU 2

 While CPU 3 wakes up, CPU 0 or CPU 1 migrates a waking up task A on CPU 2

 CPU 2 finally wakes up, runs task A and discards the Idle Load Balance
       task A quickly goes back to sleep (before a tick occurs on CPU 2)
 CPU 2 goes back to idle with NOHZ_BALANCE_KICK set

Whenever CPU 2 will be selected as the ILB, no reschedule IPI will be sent
because NOHZ_BALANCE_KICK is already set and no Idle Load Balance will be
performed.

We must wait for the sched softirq to be raised on CPU 2 thanks to another
part the kernel to come back to clear NOHZ_BALANCE_KICK.

The proposed solution clears NOHZ_BALANCE_KICK in schedule_ipi if
we can't raise the sched_softirq for the Idle Load Balance.

Change since V1:

- move the clear of NOHZ_BALANCE_KICK in got_nohz_idle_kick if the ILB
  can't run on this CPU (as suggested by Peter)

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1370419991-13870-1-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Git-commit: 873b4c65b519fd769940eb281f77848227d4e5c1
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
[smuckle@codeaurora.org: minor merge resolution for 3.4 in scheduler_ipi()]
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
Change-Id: I3548612057cccc2ecc29429c129c44183083831f
2013-11-21 16:25:37 -08:00
Steve Muckle 04ca28640c tracing/sched: add load balancer tracepoint
When doing performance analysis it can be useful to see exactly
what is going on with the load balancer - when it runs and why
exactly it may not be redistributing load.

This additional tracepoint will show the idle context of the
load balance operation (idle, not idle, newly idle), various
values from the load balancing operation, the final result,
and the new balance interval.

Change-Id: I9e5c97ae3878bea44e60d189ff3cec2275f2c75e
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2013-11-21 12:42:25 -08:00
Steve Muckle 4c7b6d4477 sched: change WARN_ON_ONCE to WARN_ON in try_to_wake_up_local()
The WARN_ON_ONCE() calls at the beginning of try_to_wake_up_local()
were recently converted from BUG_ON() calls. If these hit it indicates
something is wrong and that may contribute to other system instability.
To eliminate the risk of an instance of one of these errors going
un-noticed because there was an earlier instance that occured long ago,
change to WARN_ON(). If there ever is a flood of these there are bigger
problems.

Change-Id: I392832e2b6ec24b3569b001b1af9ecd4ed6828e7
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2013-09-06 09:58:02 -07:00
Tejun Heo 33e57304cd sched: Convert BUG_ON()s in try_to_wake_up_local() to WARN_ON_ONCE()s
try_to_wake_up_local() should only be invoked to wake up another
task in the same runqueue and BUG_ON()s are used to enforce the
rule. Missing try_to_wake_up_local() can stall workqueue
execution but such stalls are likely to be finite either by
another work item being queued or the one blocked getting
unblocked.  There's no reason to trigger BUG while holding rq
lock crashing the whole system.

Convert BUG_ON()s in try_to_wake_up_local() to WARN_ON_ONCE()s.

Change-Id: I75fdaaf4dcaefcf3893e0404d98c8aa91f89934d
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20130318192234.GD3042@htj.dyndns.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Git-commit: 383efcd00053ec40023010ce5034bd702e7ab373
CRs-Fixed: 519942
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2013-09-06 09:57:49 -07:00
Peter Boonstoppel 51e718d5c5 sched: Unthrottle rt runqueues in __disable_runtime()
migrate_tasks() uses _pick_next_task_rt() to get tasks from the
real-time runqueues to be migrated. When rt_rq is throttled
_pick_next_task_rt() won't return anything, in which case
migrate_tasks() can't move all threads over and gets stuck in an
infinite loop.

Instead unthrottle rt runqueues before migrating tasks.

Additionally: move unthrottle_offline_cfs_rqs() to rq_offline_fair()

Change-Id: If8a4a399f1a14b7f4789c1b205dcfadbde555214
Signed-off-by: Peter Boonstoppel <pboonstoppel@nvidia.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Turner <pjt@google.com>
Link: http://lkml.kernel.org/r/5FBF8E85CA34454794F0F7ECBA79798F379D3648B7@HQMAIL04.nvidia.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Git-commit: a4c96ae319b8047f62dedbe1eac79e321c185749
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2013-08-21 11:36:14 -07:00
Arun Bharadwaj 4dc47fb113 tracing/sched: Track per-cpu rt and non-rt cpu_load.
Add a new tracepoint trace_sched_enq_deq_task to track
per-cpu rt and non-rt cpu_load during task enqueue
and dequeue.

This is useful to visualize and compare the load on
different cpus and also to understand how balanced
the load is at any point of time.

Note: We only print cpu_load[0] because we only care about
the most recent load history for tracking load balancer
effectiveness.

Change-Id: I46f0bb84e81652099ed5edf8c2686c70c8b8330c
Signed-off-by: Arun Bharadwaj <abharadw@codeaurora.org>
2013-06-19 16:55:29 -07:00
Linux Build Service Account f7706a202b Merge "sched: Make sure to not re-read variables after validation" 2013-06-18 06:01:35 -07:00
Linux Build Service Account f43021b657 Merge "sched: re-calculate a cpu's next_balance point upon sched domain changes" 2013-06-15 01:16:56 -07:00
Peter Zijlstra fef154fa32 sched: Make sure to not re-read variables after validation
We could re-read rq->rt_avg after we validated it was smaller than
total, invalidating the check and resulting in an unintended negative.

Change-Id: I8543974aad539107768e9e513ca3a8c4cb79b2ff
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: David Rientjes <rientjes@google.com>
Link: http://lkml.kernel.org/r/1337688268.9698.29.camel@twins
Signed-off-by: Ingo Molnar <mingo@kernel.org>
CRs-Fixed: 497236
Git-commit: b654f7de41b0e3903ee2b51d3b8db77fe52ce728
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2013-06-12 10:46:35 -07:00
Srivatsa Vaddagiri 91b609bdd9 sched: re-calculate a cpu's next_balance point upon sched domain changes
Commit 55ddeb0f (sched: Reset rq->next_interval before going idle) reset
a cpu's rq->next_balance when pulled_task = 0, which will be true when
the cpu failed to pull any task, causing it go idle. However that patch
relied on next_balance being calculated as a result of traversing cpu's
sched domain hierarchy.

A cpu that is the only online cpu will however not be attached to any
sched domain hierarchy. When such a cpu calls into idle_balance(), we
will end up initializing next_balance to be 1sec away! Such a CPU will
defer load balance check for another 1sec, even though we may bring up
more cpus in the meantime requiring it to check for load imbalance more
frequently. This could then lead to increased scheduling latency for
some tasks.

This patch results in a cpu's next_balance being re-calculated when its
attaching to a new sched domain hierarchy.  This should let cpus call
load balance checks at the right time we expect them to!

Change-Id: I855cff8da5ca28d278596c3bb0163b839d4704bc
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2013-06-11 17:43:09 -07:00
Steve Muckle 00292ad78e sched: remove migration notification from RT class
Commit 88a7e37d26 (sched: provide per cpu-cgroup option to
notify on migrations) added a notifier call when a task is moved
to a different CPU. Unfortunately the two call sites in the RT
sched class where this occurs happens with a runqueue lock held.
This can result in a deadlock if the notifier call attempts to do
something like wake up a task.

Fortunately the benefit of 88a7e37d26 comes mainly from notifying
on migration of non-RT tasks, so we can simply ignore the movements
of RT tasks.

CRs-Fixed: 491370
Change-Id: I8849d826bf1eeaf85a6f6ad872acb475247c5926
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2013-05-24 09:23:45 -07:00
Linux Build Service Account a59922ed30 Merge "sched: provide per cpu-cgroup option to notify on migrations" 2013-05-15 12:43:52 -07:00
Srivatsa Vaddagiri 63a5fe4edc sched: fix reference to wrong cfs_rq
Commit 7db16c8c (sched: Fix SCHED_HRTICK bug leading to late preemption
of tasks) introduced a bug in sched_slice() calculation by using wrong
cfs_rq for tasks. rq->cfs was incorrectly used as task's cfs_rq, rather
than the correct one to which they belonged.

Fix the bug by using correct cfs_rq for tasks.

Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2013-05-13 15:11:55 -07:00
Steve Muckle 88a7e37d26 sched: provide per cpu-cgroup option to notify on migrations
On systems where CPUs may run asynchronously, task migrations
between CPUs running at grossly different speeds can cause
problems.

This change provides a mechanism to notify a subsystem
in the kernel if a task in a particular cgroup migrates to a
different CPU. Other subsystems (such as cpufreq) may then
register for this notifier to take appropriate action when
such a task is migrated.

The cgroup attribute to set for this behavior is
"notify_on_migrate" .

Change-Id: Ie1868249e53ef901b89c837fdc33b0ad0c0a4590
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2013-04-25 09:10:32 -07:00
Linux Build Service Account 1dae1ea3af Merge "sched: Mark schedule_io_timeout() with EXPORT_SYMBOL" 2013-04-23 00:46:05 -07:00
Linux Build Service Account e6d116369d Merge "sched: Fix SCHED_HRTICK bug leading to late preemption of tasks" 2013-04-22 18:58:32 -07:00
Jordan Crouse a32dfbad17 sched: Mark schedule_io_timeout() with EXPORT_SYMBOL
Make schedule_io_timeout() visible to modules.

Change-Id: Ic0dedbad8a591a9a721f0d2e8f6c372ec75bc4b2
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
2013-04-18 22:29:37 -06:00
Srivatsa Vaddagiri 7db16c8cfe sched: Fix SCHED_HRTICK bug leading to late preemption of tasks
SCHED_HRTICK feature is useful to preempt SCHED_FAIR tasks on-the-dot
(just when they would have exceeded their ideal_runtime). It makes use
of a a per-cpu hrtimer resource and hence alarming that hrtimer should
be based on total SCHED_FAIR tasks a cpu has across its various cfs_rqs,
rather than being based on number of tasks in a particular cfs_rq (as
implemented currently). As a result, with current code, its possible for
a running task (which is the sole task in its cfs_rq) to be preempted
much after its ideal_runtime has elapsed, resulting in increased latency
for tasks in other cfs_rq on same cpu.

Fix this by alarming sched hrtimer based on total number of SCHED_FAIR
tasks a CPU has across its various cfs_rqs.

Change-Id: I1f23680a64872f8ce0f451ac4bcae28e8967918f
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2013-04-18 11:42:22 -07:00
Srivatsa Vaddagiri 55ddeb0f00 sched: Reset rq->next_interval before going idle
next_balance, the point in jiffy time scale when a cpu will next load
balance, could have been calculated when the cpu was busy. A busy cpu
will apply its sched domain's busy_factor (usually > 1) in computing
next_balance for that sched domain, which causes the (busy) cpu to load
balance less frequently in its sched domains. However when the same cpu
is going idle, its next_balance needs to be reset without consideration
of busy_factor. Failure to do so would not trigger nohz idle balancer on
that cpu for unnecessarily long time (introducing additional scheduling
latencies for tasks). Fix bug in scheduler which aims to reset
next_balance before a cpu goes idle (as per existing comment) but is
clearly not doing so.

Change-Id: I7e027a51686528c4092d770c7d33c874d38f5df4
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2013-03-07 14:58:17 -08:00
Srivatsa Vaddagiri 4ca1d04ea0 sched: fix rq->lock recursion
Enabling SCHED_HRTICK currently results in rq->lock recursion and a hard
hang at bootup.  Essentially try_to_wakeup() grabs rq->lock and tries
arming a hrtimer via hrtimer_restart(), which deep down tries waking up
ksoftirqd, which leads to a recursive call to try_to_wakeup() and thus
attempt to take rq->lock recursively!!

This is fixed by having scheduler queue hrtimer via
__hrtimer_start_range_ns() which avoids waking up ksoftirqd.

Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
Change-Id: I11a13be1d9db3a749614ccf3d4f5fb7bf6f18fa1
2012-12-03 11:37:28 -08:00
Steve Muckle 9d5b38dc00 sched: add sysctl for controlling task migrations on wake
The PF_WAKE_UP_IDLE per-task flag made it impossible to enable
the old behavior of SD_SHARE_PKG_RESOURCES, where every task
migrates to an idle CPU on wakeup.

The sched_wake_to_idle sysctl value, when made nonzero, will cause
all tasks to migrate to an idle CPU if one is available when the
task is woken up. This is regardless of how PF_WAKE_UP_IDLE is
configured for tasks in the system. Similar to PF_WAKE_UP_IDLE,
the SD_SHARE_PKG_RESOURCES scheduler domain flag must be enabled
for the sysctl value to have an effect.

Change-Id: I23bed846d26502c7aed600bfcf1c13053a7e5f61
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2012-11-09 14:47:28 -08:00
Jeff Ohlstein 0299fcaaad sched_avg: add run queue averaging
Add code to calculate the run queue depth of a cpu and iowait
depth of the cpu.

The scheduler calls in to sched_update_nr_prod whenever there
is a runqueue change. This function maintains the runqueue average
and the iowait of that cpu in that time interval.

Whoever wants to know the runqueue average is expected to call
sched_get_nr_running_avg periodically to get the accumulated
runqueue and iowait averages for all the cpus.

Change-Id: Id8cb2ecf0ed479f090a83ccb72dd59c53fa73e0c
Signed-off-by: Jeff Ohlstein <johlstei@codeaurora.org>
2012-10-04 13:36:34 -07:00
Steve Muckle 500988016c sched: add PF_WAKE_UP_IDLE
Certain workloads may benefit from the SD_SHARE_PKG_RESOURCES behavior
of waking their tasks up on idle CPUs. The feature has too much of a
negative impact on other workloads however to apply globally. The
PF_WAKE_UP_IDLE flag tells the scheduler to wake up tasks that have this
flag set, or tasks woken by tasks with this flag set, on an idle CPU
if one is available.

Change-Id: I20b28faf35029f9395e9d9f5ddd57ce2de795039
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2012-09-10 14:10:53 -07:00
Steve Muckle f132c6cf77 Merge commit 'AU_LINUX_ANDROID_ICS.04.00.04.00.126' into msm-3.4
AU_LINUX_ANDROID_ICS.04.00.04.00.126 from msm-3.0.
First parent is from google/android-3.4.

* commit 'AU_LINUX_ANDROID_ICS.04.00.04.00.126': (8712 commits)
  PRNG: Device tree entry for qrng device.
  vidc:1080p: Set video core timeout value for Thumbnail mode
  msm: sps: improve the debugging support in SPS driver
  board-8064 msm: Overlap secure and non secure video firmware heaps.
  msm: clock: Add handoff ops for 7x30 and copper XO clocks
  msm_fb: display: Wait for external vsync before DTV IOMMU unmap
  msm: Fix ciruclar dependency in debug UART settings
  msm: gdsc: Add GDSC regulator driver for msm-copper
  defconfig: Enable Mobicore Driver.
  mobicore: Add mobicore driver.
  mobicore: rename variable to lower case.
  mobicore: rename folder.
  mobicore: add makefiles
  mobicore: initial import of kernel driver
  ASoC: msm: Add SLIMBUS_2_RX CPU DAI
  board-8064-gpio: Update FUNC for EPM SPI CS
  msm_fb: display: Remove chicken bit config during video playback
  mmc: msm_sdcc: enable the sanitize capability
  msm-fb: display: lm2 writeback support on mpq platfroms
  msm_fb: display: Disable LVDS phy & pll during panel off
  ...

Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2012-06-06 18:45:28 -07:00
Colin Cross 5500e4fab2 Merge commit 'v3.4' into android-3.4 2012-05-25 13:56:28 -07:00
Colin Cross 2eec7c9f3f sched/rt: fix SCHED_RR across cgroups
task_tick_rt has an optimization to only reschedule SCHED_RR tasks
if they were the only element on their rq.  However, with cgroups
a SCHED_RR task could be the only element on its per-cgroup rq but
still be competing with other SCHED_RR tasks in its parent's
cgroup.  In this case, the SCHED_RR task in the child cgroup would
never yield at the end of its timeslice.  If the child cgroup
rt_runtime_us was the same as the parent cgroup rt_runtime_us,
the task in the parent cgroup would starve completely.

Modify task_tick_rt to check that the task is the only task on its
rq, and that the each of the scheduling entities of its ancestors
is also the only entity on its rq.

Change-Id: I4f5b118517f85db3570923eb2f5e4c933ece9247
Signed-off-by: Colin Cross <ccross@android.com>
2012-05-18 17:03:09 -07:00
Igor Mammedov 30b4e9eb78 sched: Fix KVM and ia64 boot crash due to sched_groups circular linked list assumption
If we have one cpu that failed to boot and boot cpu gave up on
waiting for it and then another cpu is being booted, kernel
might crash with following OOPS:

   BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
   IP: [<ffffffff812c3630>] __bitmap_weight+0x30/0x80
   Call Trace:
       [<ffffffff8108b9b6>] build_sched_domains+0x7b6/0xa50

The crash happens in init_sched_groups_power() that expects
sched_groups to be circular linked list. However it is not
always true, since sched_groups preallocated in __sdt_alloc are
initialized in build_sched_groups and it may exit early

        if (cpu != cpumask_first(sched_domain_span(sd)))
                return 0;

without initializing sd->groups->next field.

Fix bug by initializing next field right after sched_group was
allocated.

Also-Reported-by: Jiang Liu <liuj97@gmail.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Cc: a.p.zijlstra@chello.nl
Cc: pjt@google.com
Cc: seto.hidetoshi@jp.fujitsu.com
Link: http://lkml.kernel.org/r/1336559908-32533-1-git-send-email-imammedo@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-09 12:27:35 +02:00
Colin Cross aadf030d84 Merge commit 'v3.4-rc5' into android-3.4 2012-05-01 15:47:09 -07:00
he, bo fb2cf2c660 sched: Fix OOPS when build_sched_domains() percpu allocation fails
Under extreme memory used up situations, percpu allocation
might fail. We hit it when system goes to suspend-to-ram,
causing a kworker panic:

 EIP: [<c124411a>] build_sched_domains+0x23a/0xad0
 Kernel panic - not syncing: Fatal exception
 Pid: 3026, comm: kworker/u:3
 3.0.8-137473-gf42fbef #1

 Call Trace:
  [<c18cc4f2>] panic+0x66/0x16c
  [...]
  [<c1244c37>] partition_sched_domains+0x287/0x4b0
  [<c12a77be>] cpuset_update_active_cpus+0x1fe/0x210
  [<c123712d>] cpuset_cpu_inactive+0x1d/0x30
  [...]

With this fix applied build_sched_domains() will return -ENOMEM and
the suspend attempt fails.

Signed-off-by: he, bo <bo.he@intel.com>
Reviewed-by: Zhang, Yanmin <yanmin.zhang@intel.com>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/1335355161.5892.17.camel@hebo
[ So, we fail to deallocate a CPU because we cannot allocate RAM :-/
  I don't like that kind of sad behavior but nevertheless it should
  not crash under high memory load. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-04-26 12:54:53 +02:00
Peter Zijlstra eb95308ee2 sched: Fix more load-balancing fallout
Commits 367456c756 ("sched: Ditch per cgroup task lists for
load-balancing") and 5d6523ebd ("sched: Fix load-balance wreckage")
left some more wreckage.

By setting loop_max unconditionally to ->nr_running load-balancing
could take a lot of time on very long runqueues (hackbench!). So keep
the sysctl as max limit of the amount of tasks we'll iterate.

Furthermore, the min load filter for migration completely fails with
cgroups since inequality in per-cpu state can easily lead to such
small loads :/

Furthermore the change to add new tasks to the tail of the queue
instead of the head seems to have some effect.. not quite sure I
understand why.

Combined these fixes solve the huge hackbench regression reported by
Tim when hackbench is ran in a cgroup.

Reported-by: Tim Chen <tim.c.chen@linux.intel.com>
Acked-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1335365763.28150.267.camel@twins
[ got rid of the CONFIG_PREEMPT tuning and made small readability edits ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-04-26 12:54:52 +02:00
Colin Cross 4a12178d3d cgroup: Add generic cgroup subsystem permission checks
Rather than using explicit euid == 0 checks when trying to move
tasks into a cgroup via CFS, move permission checks into each
specific cgroup subsystem. If a subsystem does not specify a
'allow_attach' handler, then we fall back to doing our checks
the old way.

Use the 'allow_attach' handler for the 'cpu' cgroup to allow
non-root processes to add arbitrary processes to a 'cpu' cgroup
if it has the CAP_SYS_NICE capability set.

This version of the patch adds a 'allow_attach' handler instead
of reusing the 'can_attach' handler.  If the 'can_attach' handler
is reused, a new cgroup that implements 'can_attach' but not
the permission checks could end up with no permission checks
at all.

Change-Id: Icfa950aa9321d1ceba362061d32dc7dfa2c64f0c
Original-Author: San Mehat <san@google.com>
Signed-off-by: Colin Cross <ccross@android.com>
2012-04-09 13:53:11 -07:00
Arve Hjønnevåg 07f174eba0 sched: Enable might_sleep before initializing drivers.
This allows detection of init bugs in built-in drivers.

Signed-off-by: Arve Hjønnevåg <arve@android.com>
2012-04-09 13:53:08 -07:00
Linus Torvalds f22e08a79f Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar.

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched: Fix incorrect usage of for_each_cpu_mask() in select_fallback_rq()
  sched: Fix __schedule_bug() output when called from an interrupt
  sched/arch: Introduce the finish_arch_post_lock_switch() scheduler callback
2012-03-31 13:35:31 -07:00
Srivatsa S. Bhat e3831edd59 sched: Fix incorrect usage of for_each_cpu_mask() in select_fallback_rq()
The function for_each_cpu_mask() expects a *pointer* to struct
cpumask as its second argument, whereas select_fallback_rq()
passes the value itself.

And moreover, for_each_cpu_mask() has been marked as obselete
in include/linux/cpumask.h. So move to the more appropriate
for_each_cpu() variant.

Reported-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Dave Jones <davej@redhat.com>
Cc: Liu Chuansheng <chuansheng.liu@intel.com>
Cc: vapier@gentoo.org
Cc: rusty@rustcorp.com.au
Link: http://lkml.kernel.org/r/4F75BED4.9050005@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-03-31 10:43:36 +02:00