android_kernel_samsung_msm8976

Commit Graph

Author	SHA1	Message	Date
Tim Chen	41b1d66c35	sched/rt: Reduce rq lock contention by eliminating locking of non-feasible target commit 80e3d87b2c5582db0ab5e39610ce3707d97ba409 upstream. This patch adds checks that prevens futile attempts to move rt tasks to a CPU with active tasks of equal or higher priority. This reduces run queue lock contention and improves the performance of a well known OLTP benchmark by 0.7%. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Shawn Bohrer <sbohrer@rgmadvisors.com> Cc: Suruchi Kadu <suruchi.a.kadu@intel.com> Cc: Doug Nelson<doug.nelson@intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1421430374.2399.27.camel@schen9-desk2.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>	2019-07-27 22:11:08 +02:00
Shawn Bohrer	524e3d8583	sched/rt: Remove redundant nr_cpus_allowed test In `76854c7e8f` ("sched: Use rt.nr_cpus_allowed to recover select_task_rq() cycles") an optimization was added to select_task_rq_rt() that immediately returns when p->nr_cpus_allowed == 1 at the beginning of the function. This makes the latter p->nr_cpus_allowed > 1 check redundant, which can now be removed. Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: Mike Galbraith <mgalbraith@suse.de> Cc: tomk@rgmadvisors.com Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1380914693-24634-1-git-send-email-shawn.bohrer@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-07-27 22:11:04 +02:00
Gideon Israel Dsouza	11262043a0	kernel: use macros from compiler.h instead of __attribute__((...)) To increase compiler portability there is <linux/compiler.h> which provides convenience macros for various gcc constructs. Eg: __weak for __attribute__((weak)). I've replaced all instances of gcc attributes with the right macro in the kernel subsystem. Signed-off-by: Gideon Israel Dsouza <gidisrael@gmail.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-07-27 22:10:27 +02:00
Phil Auld	00b32ad42d	sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup [ Upstream commit 2e8e19226398db8265a8e675fcc0118b9e80c9e8 ] With extremely short cfs_period_us setting on a parent task group with a large number of children the for loop in sched_cfs_period_timer() can run until the watchdog fires. There is no guarantee that the call to hrtimer_forward_now() will ever return 0. The large number of children can make do_sched_cfs_period_timer() take longer than the period. NMI watchdog: Watchdog detected hard LOCKUP on cpu 24 RIP: 0010:tg_nop+0x0/0x10 <IRQ> walk_tg_tree_from+0x29/0xb0 unthrottle_cfs_rq+0xe0/0x1a0 distribute_cfs_runtime+0xd3/0xf0 sched_cfs_period_timer+0xcb/0x160 ? sched_cfs_slack_timer+0xd0/0xd0 __hrtimer_run_queues+0xfb/0x270 hrtimer_interrupt+0x122/0x270 smp_apic_timer_interrupt+0x6a/0x140 apic_timer_interrupt+0xf/0x20 </IRQ> To prevent this we add protection to the loop that detects when the loop has run too many times and scales the period and quota up, proportionally, so that the timer can complete before then next period expires. This preserves the relative runtime quota while preventing the hard lockup. A warning is issued reporting this state and the new values. Signed-off-by: Phil Auld <pauld@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> Cc: Anton Blanchard <anton@ozlabs.org> Cc: Ben Segall <bsegall@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190319130005.25492-1-pauld@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-07-27 22:10:10 +02:00
Peter Zijlstra	0bab0a32f6	sched/core: Fix TASK_DEAD race in finish_task_switch() commit 95913d97914f44db2b81271c2e2ebd4d2ac2df83 upstream. So the problem this patch is trying to address is as follows: CPU0 CPU1 context_switch(A, B) ttwu(A) LOCK A->pi_lock A->on_cpu == 0 finish_task_switch(A) prev_state = A->state <-. WMB \| A->on_cpu = 0; \| UNLOCK rq0->lock \| \| context_switch(C, A) `-- A->state = TASK_DEAD prev_state == TASK_DEAD put_task_struct(A) context_switch(A, C) finish_task_switch(A) A->state == TASK_DEAD put_task_struct(A) The argument being that the WMB will allow the load of A->state on CPU0 to cross over and observe CPU1's store of A->state, which will then result in a double-drop and use-after-free. Now the comment states (and this was true once upon a long time ago) that we need to observe A->state while holding rq->lock because that will order us against the wakeup; however the wakeup will not in fact acquire (that) rq->lock; it takes A->pi_lock these days. We can obviously fix this by upgrading the WMB to an MB, but that is expensive, so we'd rather avoid that. The alternative this patch takes is: smp_store_release(&A->on_cpu, 0), which avoids the MB on some archs, but not important ones like ARM. Reported-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Cc: manfred@colorfullife.com Cc: will.deacon@arm.com Fixes: `e4a52bcb9a` ("sched: Remove rq->lock from the first half of ttwu()") Link: http://lkml.kernel.org/r/20150929124509.GG3816@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>	2019-07-27 22:09:34 +02:00
Oleg Nesterov	347f993451	sched: print_rq(): Don't use tasklist_lock read_lock_irqsave(tasklist_lock) in print_rq() looks strange. We do not need to disable irqs, and they are already disabled by the caller. And afaics this lock buys nothing, we can rely on rcu_read_lock(). In this case it makes sense to also move rcu_read_lock/unlock from the caller to print_rq(). Change-Id: Iadf0de148e27623af4535abc40c77c1dfd1f9c76 Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Kirill Tkhai <tkhai@yandex.ru> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20140921193341.GA28628@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-07-27 22:09:20 +02:00
Oleg Nesterov	f0900182d5	sched: s/do_each_thread/for_each_process_thread/ in debug.c Change kernel/sched/debug.c to use for_each_process_thread(). Change-Id: Idb9f4ffca0b60746a1109be17ce22cc06f3cc690 Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Frank Mayhar <fmayhar@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Sanjay Rao <srao@redhat.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20140813191956.GA19324@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-07-27 22:09:20 +02:00
Oleg Nesterov	afdb2f2d74	sched: s/do_each_thread/for_each_process_thread/ in core.c Change kernel/sched/core.c to use for_each_process_thread(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Frank Mayhar <fmayhar@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Sanjay Rao <srao@redhat.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20140813191953.GA19315@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> (cherry picked from commit 5d07f4202c5d63b73ba1734ed38e08461a689313) Signed-off-by: Alex Shi <alex.shi@linaro.org> Change-Id: Iecb3aa3e69df0147d5c9402dcb8250bfec309ef4	2019-07-27 22:09:16 +02:00
Oleg Nesterov	173cb89e43	sched: Change thread_group_cputime() to use for_each_thread() Change thread_group_cputime() to use for_each_thread() instead of buggy while_each_thread(). This also makes the pid_alive() check unnecessary. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Frank Mayhar <fmayhar@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Sanjay Rao <srao@redhat.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20140813192000.GA19327@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> (cherry picked from commit 1e4dda08b4c39b3d8f4a3ee7269d49e0200c8af8) Signed-off-by: Alex Shi <alex.shi@linaro.org> Change-Id: I5aa603f3e31275e607f039b6f037ddc630755d95	2019-07-27 22:09:16 +02:00
Peter Zijlstra	2324d3f1a3	sched/cpuset/pm: Fix cpuset vs. suspend-resume bugs commit 50e76632339d4655859523a39249dd95ee5e93e7 upstream. Cpusets vs. suspend-resume is _completely_ broken. And it got noticed because it now resulted in non-cpuset usage breaking too. On suspend cpuset_cpu_inactive() doesn't call into cpuset_update_active_cpus() because it doesn't want to move tasks about, there is no need, all tasks are frozen and won't run again until after we've resumed everything. But this means that when we finally do call into cpuset_update_active_cpus() after resuming the last frozen cpu in cpuset_cpu_active(), the top_cpuset will not have any difference with the cpu_active_mask and this it will not in fact do _anything_. So the cpuset configuration will not be restored. This was largely hidden because we would unconditionally create identity domains and mobile users would not in fact use cpusets much. And servers what do use cpusets tend to not suspend-resume much. An addition problem is that we'd not in fact wait for the cpuset work to finish before resuming the tasks, allowing spurious migrations outside of the specified domains. Fix the rebuild by introducing cpuset_force_rebuild() and fix the ordering with cpuset_wait_for_hotplug(). Reported-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: `deb7aa308e` ("cpuset: reorganize CPU / memory hotplug handling") Link: http://lkml.kernel.org/r/20170907091338.orwxrqkbfkki3c24@hirez.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Change-Id: Ia40ffcf49507af1d5493d7e534e9433ca346db02	2019-07-27 22:08:19 +02:00
Peter Zijlstra	e1989d554d	sched/autogroup: Fix possible Spectre-v1 indexing for sched_prio_to_weight[] commit 354d7793070611b4df5a79fbb0f12752d0ed0cc5 upstream. > kernel/sched/autogroup.c:230 proc_sched_autogroup_set_nice() warn: potential spectre issue 'sched_prio_to_weight' Userspace controls @nice, sanitize the array index. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> [bwh: Backported to 3.16: adjust filename, context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>	2019-07-27 21:52:06 +02:00
Mike Galbraith	0025b68cb9	sched/autogroup: Fix 64-bit kernel nice level adjustment commit 83929cce95251cc77e5659bf493bd424ae0e7a67 upstream. Michael Kerrisk reported: > Regarding the previous paragraph... My tests indicate > that writing any value to the autogroup [nice priority level] > file causes the task group to get a lower priority. Because autogroup didn't call the then meaningless scale_load()... Autogroup nice level adjustment has been broken ever since load resolution was increased for 64-bit kernels. Use scale_load() to scale group weight. Michael Kerrisk tested this patch to fix the problem: > Applied and tested against 4.9-rc6 on an Intel u7 (4 cores). > Test setup: > > Terminal window 1: running 40 CPU burner jobs > Terminal window 2: running 40 CPU burner jobs > Terminal window 1: running 1 CPU burner job > > Demonstrated that: > * Writing "0" to the autogroup file for TW1 now causes no change > to the rate at which the process on the terminal consume CPU. > * Writing -20 to the autogroup file for TW1 caused those processes > to get the lion's share of CPU while TW2 TW3 get a tiny amount. > * Writing -20 to the autogroup files for TW1 and TW3 allowed the > process on TW3 to get as much CPU as it was getting as when > the autogroup nice values for both terminals were 0. Reported-by: Michael Kerrisk <mtk.manpages@gmail.com> Tested-by: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-man <linux-man@vger.kernel.org> Link: http://lkml.kernel.org/r/1479897217.4306.6.camel@gmx.de Signed-off-by: Ingo Molnar <mingo@kernel.org> [bwh: Backported to 3.16: s/sched_prio_to_weight/prio_to_weight/] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>	2019-07-27 21:52:06 +02:00
Peter Zijlstra	9f2bb2d35f	sched/core: Fix possible Spectre-v1 indexing for sched_prio_to_weight[] commit 7281c8dec8a87685cb54d503d8cceef5a0fc2fdd upstream. > kernel/sched/core.c:6921 cpu_weight_nice_write_s64() warn: potential spectre issue 'sched_prio_to_weight' Userspace controls @nice, so sanitize the value before using it to index an array. Change-Id: I0e59bc7ecbaf2367e59391c8955e2c246aeeb946 Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> [bwh: Backported to 3.16: Vulnerable array lookup is in set_load_weight()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>	2019-07-27 21:52:06 +02:00
Peter Zijlstra	9547ea4dc7	sched/topology: Fix overlapping sched_group_mask commit 73bb059f9b8a00c5e1bf2f7ca83138c05d05e600 upstream. The point of sched_group_mask is to select those CPUs from sched_group_cpus that can actually arrive at this balance domain. The current code gets it wrong, as can be readily demonstrated with a topology like: node 0 1 2 3 0: 10 20 30 20 1: 20 10 20 30 2: 30 20 10 20 3: 20 30 20 10 Where (for example) domain 1 on CPU1 ends up with a mask that includes CPU0: [] CPU1 attaching sched-domain: [] domain 0: span 0-2 level NUMA [] groups: 1 (mask: 1), 2, 0 [] domain 1: span 0-3 level NUMA [] groups: 0-2 (mask: 0-2) (cpu_capacity: 3072), 0,2-3 (cpu_capacity: 3072) This causes sched_balance_cpu() to compute the wrong CPU and consequently should_we_balance() will terminate early resulting in missed load-balance opportunities. The fixed topology looks like: [] CPU1 attaching sched-domain: [] domain 0: span 0-2 level NUMA [] groups: 1 (mask: 1), 2, 0 [] domain 1: span 0-3 level NUMA [] groups: 0-2 (mask: 1) (cpu_capacity: 3072), 0,2-3 (cpu_capacity: 3072) (note: this relies on OVERLAP domains to always have children, this is true because the regular topology domains are still here -- this is before degenerate trimming) Debugged-by: Lauro Ramos Venancio <lvenanci@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Fixes: `e3589f6c81` ("sched: Allow for overlapping sched_domain spans") Signed-off-by: Ingo Molnar <mingo@kernel.org> [bwh: Backported to 3.16: - Use span, not sg_span - Adjust filename context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>	2019-07-27 21:45:20 +02:00
Peter Zijlstra	6b6a0ce408	sched/topology: Fix building of overlapping sched-groups commit 0372dd2736e02672ac6e189c31f7d8c02ad543cd upstream. When building the overlapping groups, we very obviously should start with the previous domain of _this_ @cpu, not CPU-0. This can be readily demonstrated with a topology like: node 0 1 2 3 0: 10 20 30 20 1: 20 10 20 30 2: 30 20 10 20 3: 20 30 20 10 Where (for example) CPU1 ends up generating the following nonsensical groups: [] CPU1 attaching sched-domain: [] domain 0: span 0-2 level NUMA [] groups: 1 2 0 [] domain 1: span 0-3 level NUMA [] groups: 1-3 (cpu_capacity = 3072) 0-1,3 (cpu_capacity = 3072) Where the fact that domain 1 doesn't include a group with span 0-2 is the obvious fail. With patch this looks like: [] CPU1 attaching sched-domain: [] domain 0: span 0-2 level NUMA [] groups: 1 0 2 [] domain 1: span 0-3 level NUMA [] groups: 0-2 (cpu_capacity = 3072) 0,2-3 (cpu_capacity = 3072) Debugged-by: Lauro Ramos Venancio <lvenanci@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Fixes: `e3589f6c81` ("sched: Allow for overlapping sched_domain spans") Signed-off-by: Ingo Molnar <mingo@kernel.org> [bwh: Backported to 3.16: adjust filename] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>	2019-07-27 21:44:53 +02:00
Rabin Vincent	d7199f9a91	sched/debug: Don't dump sched debug info in SysRq-W commit fb90a6e93c0684ab2629a42462400603aa829b9c upstream. sysrq_sched_debug_show() can dump a lot of information. Don't print out all that if we're just trying to get a list of blocked tasks (SysRq-W). The information is still accessible with SysRq-T. Change-Id: I3503b0850a80ed55291dd8264a3337d874737867 Signed-off-by: Rabin Vincent <rabinv@axis.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1459777322-30902-1-git-send-email-rabin.vincent@axis.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Nikolay Borisov <n.borisov.lkml@gmail.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Willy Tarreau <w@1wt.eu>	2019-07-27 21:43:52 +02:00
Peter Zijlstra	6ec33b5e8f	locking/static_keys: Add static_key_{en,dis}able() helpers commit e33886b38cc82a9fc3b2d655dfc7f50467594138 upstream. Add two helpers to make it easier to treat the refcount as boolean. [js] do not involve WARN_ON_ONCE as it causes build failures Suggested-by: Jason Baron <jasonbaron0@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz> [wt: only backported for use in next fix ; s/static_key_count(key)/atomic_read(&key->enabled)/] Signed-off-by: Willy Tarreau <w@1wt.eu>	2019-07-27 21:43:12 +02:00
Peter Zijlstra	801e41acb3	sched/core: Fix an SMP ordering race in try_to_wake_up() vs. schedule() commit ecf7d01c229d11a44609c0067889372c91fb4f36 upstream. Oleg noticed that its possible to falsely observe p->on_cpu == 0 such that we'll prematurely continue with the wakeup and effectively run p on two CPUs at the same time. Even though the overlap is very limited; the task is in the middle of being scheduled out; it could still result in corruption of the scheduler data structures. CPU0 CPU1 set_current_state(...) <preempt_schedule> context_switch(X, Y) prepare_lock_switch(Y) Y->on_cpu = 1; finish_lock_switch(X) store_release(X->on_cpu, 0); try_to_wake_up(X) LOCK(p->pi_lock); t = X->on_cpu; // 0 context_switch(Y, X) prepare_lock_switch(X) X->on_cpu = 1; finish_lock_switch(Y) store_release(Y->on_cpu, 0); </preempt_schedule> schedule(); deactivate_task(X); X->on_rq = 0; if (X->on_rq) // false if (t) while (X->on_cpu) cpu_relax(); context_switch(X, ..) finish_lock_switch(X) store_release(X->on_cpu, 0); Avoid the load of X->on_cpu being hoisted over the X->on_rq load. Reported-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Willy Tarreau <w@1wt.eu>	2019-07-27 21:42:03 +02:00
Balbir Singh	4e1e436ed9	sched/core: Fix a race between try_to_wake_up() and a woken up task commit 135e8c9250dd5c8c9aae5984fde6f230d0cbfeaf upstream. The origin of the issue I've seen is related to a missing memory barrier between check for task->state and the check for task->on_rq. The task being woken up is already awake from a schedule() and is doing the following: do { schedule() set_current_state(TASK_(UN)INTERRUPTIBLE); } while (!cond); The waker, actually gets stuck doing the following in try_to_wake_up(): while (p->on_cpu) cpu_relax(); Analysis: The instance I've seen involves the following race: CPU1 CPU2 while () { if (cond) break; do { schedule(); set_current_state(TASK_UN..) } while (!cond); wakeup_routine() spin_lock_irqsave(wait_lock) raw_spin_lock_irqsave(wait_lock) wake_up_process() } try_to_wake_up() set_current_state(TASK_RUNNING); .. list_del(&waiter.list); CPU2 wakes up CPU1, but before it can get the wait_lock and set current state to TASK_RUNNING the following occurs: CPU3 wakeup_routine() raw_spin_lock_irqsave(wait_lock) if (!list_empty) wake_up_process() try_to_wake_up() raw_spin_lock_irqsave(p->pi_lock) .. if (p->on_rq && ttwu_wakeup()) .. while (p->on_cpu) cpu_relax() .. CPU3 tries to wake up the task on CPU1 again since it finds it on the wait_queue, CPU1 is spinning on wait_lock, but immediately after CPU2, CPU3 got it. CPU3 checks the state of p on CPU1, it is TASK_UNINTERRUPTIBLE and the task is spinning on the wait_lock. Interestingly since p->on_rq is checked under pi_lock, I've noticed that try_to_wake_up() finds p->on_rq to be 0. This was the most confusing bit of the analysis, but p->on_rq is changed under runqueue lock, rq_lock, the p->on_rq check is not reliable without this fix IMHO. The race is visible (based on the analysis) only when ttwu_queue() does a remote wakeup via ttwu_queue_remote. In which case the p->on_rq change is not done uder the pi_lock. The result is that after a while the entire system locks up on the raw_spin_irqlock_save(wait_lock) and the holder spins infintely Reproduction of the issue: The issue can be reproduced after a long run on my system with 80 threads and having to tweak available memory to very low and running memory stress-ng mmapfork test. It usually takes a long time to reproduce. I am trying to work on a test case that can reproduce the issue faster, but thats work in progress. I am still testing the changes on my still in a loop and the tests seem OK thus far. Big thanks to Benjamin and Nick for helping debug this as well. Ben helped catch the missing barrier, Nick caught every missing bit in my theory. Signed-off-by: Balbir Singh <bsingharora@gmail.com> [ Updated comment to clarify matching barriers. Many architectures do not have a full barrier in switch_to() so that cannot be relied upon. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Alexey Kardashevskiy <aik@ozlabs.ru> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nicholas Piggin <nicholas.piggin@gmail.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/e02cce7b-d9ca-1ad0-7a61-ea97c7582b37@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Willy Tarreau <w@1wt.eu>	2019-07-27 21:42:03 +02:00
LuK1337	65f8423215	Import T813XXS2BRC2 kernel source changes Change-Id: I90bb6c013287c1edbf8ca607d1666cc4c62d504e	2018-05-26 00:39:42 +02:00
Tom Marshall	169c186dc8	kernel: Only expose su when daemon is running It has been claimed that the PG implementation of 'su' has security vulnerabilities even when disabled. Unfortunately, the people that find these vulnerabilities often like to keep them private so they can profit from exploits while leaving users exposed to malicious hackers. In order to reduce the attack surface for vulnerabilites, it is therefore necessary to make 'su' completely inaccessible when it is not in use (except by the root and system users). Change-Id: I79716c72f74d0b7af34ec3a8054896c6559a181d	2017-05-15 14:43:52 +00:00
Luca Stefani	ff1ebfd98d	This is the 3.10.102 stable release -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJXXS5iAAoJEE44bZycYXAvDj8P/jbhmGAgW6tw2cnS90QIZDqG M/nclEId61jICNvbfP6zsioKeWyrmzr5G7NjqTThsSNhCo/DXs3ddMqLy3pOaFdq mytXtHIUpwZoplEib+ODinW40CMqnu11XSWEcee2nrsPuGNsnc7BY0wmFBa6UVCV rOZef9SN9lJcZSYY/auvgLDXOXdQ+NMxp5hau30aF5HBO8hTDXStjPRcUwCvz7aR govTQJHlS4HzLH3JOYS3Dt8IYFDOrKhQIby2nFdw7eiUxHCRy2F0asabTh3DzCw1 iLvFroozjyVXwozfWMqLCvMa+514MXJy8Nkva6xiAHraC8UrgfPtcNsTdgtkdH9T V2Am9b0L7yiBdG6hsZLxkU3akk7vU/0dtppwzvudANT6i2tGcDSBeaZq3T2pAv7B 7coY53GzHZdQnbdTZbYeS1fxebxyXw50D5OJkF8DyLhoL7Uj2Dvv0QdjKv+U/e5D VQ+ZyGcBdCLuOzflXysI10E01y0/M3FrkubgGBM4Oh0eYKCHJaHG/NCZy5JY/qxy S0phem8RbeZPbcL14z+5buWIi1lUkTiCIMG8c32ZEmDh84drnICqABA0RzKmqdkj ucQa+PzkMQ1DyhAMUl/CwpBfSqf1Zs3agLo78Kp5MTGfeAA90m0SeVqhmDgWhwqG HhSlsPFfMfmJl5S0uJpQ =UhFl -----END PGP SIGNATURE----- Merge tag 'v3.10.102' into HEAD This is the 3.10.102 stable release Change-Id: Ic7d338fb190966b26aa151361fc37414f701d8b2	2017-04-18 17:22:08 +02:00
Luca Stefani	062311b2df	This is the 3.10.99 stable release -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJW2MPMAAoJEDjbvchgkmk+xlkP/3pLYC8OxOPz11sBFOWDB6Jn +/La3kW252TMcY7K8Z6R2UJC93HxXaCySAZTrLrwUL6mqpSStiHhX/1HQMI6If4c jMtsbgWpU+HZzprPzY8IK6rdrZJKz+Nxu3LMuV0pYTAFKLnCa4d9bSYZ52UArVnC w13KGpk/gnWTO7A6ZNx4dcRpMqYHWcG+eJsT9zdExmyk65qBCxhhUxXh+DijmSn7 QXrFJ4zjWr1kIdsk6Moat/HCTt/zvwMiWuHdnqYIzUSmvWZWbaQsqGw0cFvKM2hL pOJ3zf3fgUY6fsV0vG+SFdrMmL6RtL/v0c2EGM5ZlYCIPbUZcK+XMlaEqOe6UAHz hITIE+r03l2zqagWVb/2HOen8liHIxnfqUPYgHd6vmXz2qWXg9sWTsOhr3ZAQQLA tf0JDjmx/KCyBmiA7ZyhRLeyhx0jD/csxxo14YME8N3tJCyw5gEIOgXlOLNxhWRu uCqSN27FDnnf6ppbX1euMeWxzqi4DCZFMDJQT743V5sJIz10BsVR9HJS6mwyUioN ia4qVc99JfSEsXuawlZhC44Ht+Z/tTSxQPcZjWMHvftGVfxS9AZVf85BM5zNa91t 52mtJivT25N7JxHE41iEQA9t4V1shCjGmEUKD4cVMKgC18cpXD/awDlJ1Or1YuAO ro6ElZeHj+O3YETFp31/ =GlVi -----END PGP SIGNATURE----- Merge tag 'v3.10.99' into HEAD This is the 3.10.99 stable release Change-Id: I8113e58a5519664be2acc502462633d6d2f9ebf5	2017-04-18 17:17:46 +02:00
LuK1337	4e71469c73	Merge tag 'LA.BR.1.3.6-03510-8976.0' into HEAD Change-Id: Ie506850703bf9550ede802c13ba5f8c2ce723fa3	2017-04-18 12:11:50 +02:00
LuK1337	fc9499e55a	Import latest Samsung release * Package version: T713XXU2BQCO Change-Id: I293d9e7f2df458c512d59b7a06f8ca6add610c99	2017-04-18 03:43:52 +02:00
Pavankumar Kondeti	e593048d81	sched: Fix integer overflow in sched_update_nr_prod() "int" type is used to hold the time difference between the successive updates to nr_run in sched_update_nr_prod(). This can result in overflow, if the function is called ~2.15 sec after it was called before. The most probable scenarios are when CPU is idle and hotplugged. But as we update the last_time of all possible CPUs in sched_get_nr_running_avg() periodically from a deferrable timer context (core_ctl module), this overflow is observed only when the system is completely idle for long time. When this overflow happens we hit a BUG_ON() in sched_get_nr_running_avg(). Use "u64" type instead of "int" for holding the time difference and add additional BUG_ON() to catch the instances where sched_clock() returns a backward value. Change-Id: I284abb5889ceb8cf9cc689c79ed69422a0e74986 Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>	2016-09-29 02:12:18 -07:00
Thomas Gleixner	0579a12791	sched/cputime: Fix steal time accounting vs. CPU hotplug commit e9532e69b8d1d1284e8ecf8d2586de34aec61244 upstream. On CPU hotplug the steal time accounting can keep a stale rq->prev_steal_time value over CPU down and up. So after the CPU comes up again the delta calculation in steal_account_process_tick() wreckages itself due to the unsigned math: u64 steal = paravirt_steal_clock(smp_processor_id()); steal -= this_rq()->prev_steal_time; So if steal is smaller than rq->prev_steal_time we end up with an insane large value which then gets added to rq->prev_steal_time, resulting in a permanent wreckage of the accounting. As a consequence the per CPU stats in /proc/stat become stale. Nice trick to tell the world how idle the system is (100%) while the CPU is 100% busy running tasks. Though we prefer realistic numbers. None of the accounting values which use a previous value to account for fractions is reset at CPU hotplug time. update_rq_clock_task() has a sanity check for prev_irq_time and prev_steal_time_rq, but that sanity check solely deals with clock warps and limits the /proc/stat visible wreckage. The prev_time values are still wrong. Solution is simple: Reset rq->prev_*_time when the CPU is plugged in again. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: <stable@vger.kernel.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Glauber Costa <glommer@parallels.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: commit `095c0aa83e` "sched: adjust scheduler cpu power for stolen time" Fixes: commit `aa48380851` "sched: Remove irq time from available CPU power" Fixes: commit `e6e6685acc` "KVM guest: Steal time accounting" Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1603041539490.3686@nanos Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Willy Tarreau <w@1wt.eu>	2016-06-07 10:42:48 +02:00
Riley Andrews	a9f3059ae9	sched: add sched blocked tracepoint which dumps out context of sleep. Decare war on uninterruptible sleep. Add a tracepoint which walks the kernel stack and dumps the first non-scheduler function called before the scheduler is invoked. Change-Id: I19e965d5206329360a92cbfe2afcc8c30f65c229 Signed-off-by: Riley Andrews <riandrews@google.com>	2016-05-18 14:34:41 +05:30
Sasha Levin	6bf97b0500	sched/core: Remove false-positive warning from wake_up_process() commit 119d6f6a3be8b424b200dcee56e74484d5445f7e upstream. Because wakeups can (fundamentally) be late, a task might not be in the expected state. Therefore testing against a task's state is racy, and can yield false positives. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: oleg@redhat.com Fixes: `9067ac85d5` ("wake_up_process() should be never used to wakeup a TASK_STOPPED/TRACED task") Link: http://lkml.kernel.org/r/1448933660-23082-1-git-send-email-sasha.levin@oracle.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-03-03 15:06:22 -08:00
Syed Rameez Mustafa	0d29c5aa0b	sched: turn off the TTWU_QUEUE feature While the feature TTWU_QUEUE has the advantage of reducing cache bouncing of runqueue locks, it has the side effect that runqueue statistics are not updated until the remote CPU has a chance to enqueue the task. Since there is no upper bound on the amount of time it can take the remote CPU to enqueue the task, several sequential wakeups can result in suboptimal task placement based on the stale statistics. Turn off the feature as the cost of sub-optimal placement is much higher than the cost of cache bouncing spinlocks for msm based systems. Change-Id: I0b85c0225237b2bc44f54934769f5e3750c0f3d6 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>	2016-02-29 20:07:13 -08:00
Ruchi Kandoi	9d329d4cf1	sched: cpufreq: Adds a field cpu_power in the task_struct cpu_power has been added to keep track of amount of power each task is consuming. cpu_power is updated whenever stime and utime are updated for a task. power is computed by taking into account the frequency at which the current core was running and the current for cpu actively running at hat frequency. Bug: 21498425 Change-Id: Ic535941e7b339aab5cae9081a34049daeb44b248 Signed-off-by: Ruchi Kandoi <kandoiruchi@google.com> Git-commit: 94877641f6b6ea17aa335729f548eb5647db3e3e Git-repo: https://android.googlesource.com/kernel/msm/ Signed-off-by: Nirmal Abraham <nabrah@codeaurora.org> [clingutla@codeaurora.org:This fixes the undefined reference to acct_update_power() when building for ARCH=um in include/linux/cpufreq.h] Signed-off-by: Lingutla Chandrasekhar <clingutla@codeaurora.org> Signed-off-by: Avaneesh Kumar Dwivedi <akdwived@codeaurora.org>	2016-01-05 21:44:05 +05:30
Linux Build Service Account	6515f031ef	Merge "sched: use ktime instead of sched_clock for load tracking"	2015-12-22 04:46:55 -08:00
Joonwoo Park	b136867440	sched: use ktime instead of sched_clock for load tracking At present, HMP scheduler uses sched_clock to setup window boundary to be aligned with timer interrupt to ensure timer interrupt fires after window rollover. However this alignment won't last long since the timer interrupt rearms next timer based on time measured by ktime which isn't coupled with sched_clock. Convert sched_clock to ktime to avoid wallclock discrepancy between scheduler and timer so that we can ensure scheduler's window boundary is always aligned with timer. CRs-fixed: 933330 Change-Id: I4108819a4382f725b3ce6075eb46aab0cf670b7e Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org> [pkondeti@codeaurora.org: resolved trival merge conflicts] Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>	2015-12-16 13:38:12 +05:30
Junjie Wu	b7537b11b7	sched: Export sched_setscheduler_nocheck() Export sched_setscheduler_nocheck() so that external kernel modules can use it. Change-Id: Ib50f537f5aef50c365ba63fb8ffce05bc1c7c431 Signed-off-by: Junjie Wu <junjiew@codeaurora.org>	2015-12-10 02:17:36 -08:00
Joonwoo Park	b9c755e46f	sched: encourage idle load balance and discourage active load balance Encourage IDLE and NEWLY_IDLE load balance by ignoring cache hotness and discourage active load balance when by increasing busy balancing failure threshold to initiate active load balancer in order to reduce scheduler latency and avoid unnecessary active migration within a same domain. Change-Id: I22f6aba11932ccbb82a436c0532589c46f9148ed Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org> [pkondeti@codeaurora.org: resolved minor merge conflicts] Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>	2015-11-30 22:12:16 -08:00
Srivatsa Vaddagiri	a330f3d5bc	sched: colocate related threads Provide userspace interface for tasks to be grouped together as "related" threads. For example, all threads involved in updating display buffer could be tagged as related. Scheduler will attempt to provide special treatment for group of related threads such as: 1) Colocation of related threads in same "preferred" cluster 2) Aggregation of demand towards determination of cluster frequency This patch extends scheduler to provide best-effort colocation support for a group of related threads. Change-Id: Ic2cd769faf5da4d03a8f3cb0ada6224d0101a5f5 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>	2015-11-25 21:43:34 -08:00
Srivatsa Vaddagiri	cdbbd96a7f	sched: Consolidate cluster-specific information Many cluster-shared attributes like cur_freq, max_freq etc are needlessly maintained in per-cpu 'struct rq' currently. Consolidate them in cluster structure. Change-Id: I36e508082bb1e8a7c1a60e99902b5bc260f5f8f6 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org> Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>	2015-11-25 21:43:15 -08:00
Srivatsa Vaddagiri	af5b0f1a58	sched: Introduce sched_cluster A cluster is set of CPUs sharing some power controls and an L2 cache. This patch buids a list of clusters at bootup which are sorted by their max_power_cost. Organizing CPUs in terms of clusters helps optimize cpu selection logic in select_best_cpu() quite a bit. A subsequent patch modifies select_best_cpu() to make use of clusters. Change-Id: I3fecc542b36db014afc0375a1ea4c4802e2f4dba Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org> Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>	2015-11-25 21:42:10 -08:00
Joonwoo Park	b2570c36c6	sched: initialize frequency domain cpumask It's possible select_best_cpu() gets called before the first cpufreq notifier call. In such scenario select_best_cpu() can hang forever by not clearing search_cpus. Initialize frequency domain cpumask with the CPU of rq to avoid such scenario. CRs-fixed: 931349 Change-Id: If8d31c5477efe61ad7c6b336ba9e27ca6f556b63 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org> Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>	2015-11-25 21:41:35 -08:00
Pavankumar Kondeti	44d7fe5e0a	sched/cputime: fix a deadlock on 32bit systems cpu_hardirq_time and cpu_softirq_time are protected with seqlock on 32bit systems. There is a potential deadlock with this seqlock and rq->lock. CPU 1 CPU0 ========================== ======================== --> acquire CPU0 rq->lock --> __irq_enter() ----> task enqueue/dequeue ----> irqtime_account_irq() ------> update_rq_clock() ------> irq_time_write_begin() --------> irq_time_read() --------> sched_account_irqtime() (waiting for the seqlock (waiting for the CPU0 rq->lock) held in irq_time_write_begin() Fix this issue by dropping the seqlock before calling sched_account_irqtime() Change-Id: I29a33876e372f99435a57cc11eada9c8cfd59a3f Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>	2015-11-03 00:44:38 -08:00
Joonwoo Park	8074174246	sched: fair: Return current CPU if search_cpu(s) is empty If a task is affined to CPUs that are offline, select_best_cpu and best_small_task_cpu will attempt to select a cpu from an empty mask, resulting in corruption/crashes. Fix this by returning the current CPU if search_cpu(s) is empty. Change-Id: Ib56a8a2513b0ae7ec8414b5ed3134e88b24dd2ac Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org> Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>	2015-09-02 23:28:01 -07:00
Pavankumar Kondeti	2a67eaf2f4	sched: fix bug in small task CPU selection The maximum capacity CPU is a fallback option for small tasks and selected only when all the allowed lower capacity CPUs are overloaded. When the first such fallback CPU is encountered during the search, its cluster cpumask is computed and removed from the search. cpumask_and(&fb_search_cpu, &search_cpu, &rq->freq_domain_cpumask); cpumask_andnot(&search_cpu, &search_cpu, &rq->freq_domain_cpumask); The iterator CPU is cleared from the search_cpu mask before this, due to which it is not set in fb_search_cpu mask. Later, this mask is used to construct a search mask for iterating over the lower capacity CPUs to find the busy CPU. cpumask_and(&search_cpu, tsk_cpus_allowed(p), cpu_online_mask); cpumask_andnot(&search_cpu, &search_cpu, &fb_search_cpu); The search_cpu mask has now a maximum capacity CPU due to the above mentioned bug and may get selected instead of a lower capacity CPU that can accommodate this small task under spill threshold. Change-Id: Id330af66542907e24dfe60a6424b195aa8623cea Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>	2015-07-29 11:05:15 -07:00
Pavankumar Kondeti	fffdb903c9	Revert "sched: Use only partial wait time as task demand" This reverts commit `14fd2e5918` ("sched: Use only partial wait time as task demand") as it causes performance regression. Change-Id: Iaddfce9c98bff328f50d746c9a86a0c8c34aa0b9 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org> [pkondeti@codeaurora.org: Resolved minor conflict] Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>	2015-07-15 10:14:09 +05:30
Joonwoo Park	a15e267424	sched: fix incorrect prev_runnable_sum accounting with long ISR run At present, when IRQ handler spans multiple scheduler windows, HMP scheduler resets the IRQ CPU's prev_runnable_sum with its current max capacity under the assumption that there is no other possible contribution to the CPU's prev_runnable_sum. This isn't correct as another CPU can migrate tasks to the IRQ CPU. Furthermore such incorrectness can trigger BUG_ON() if the migrated task's prev_window is larger than migrating CPU's current capacity in following scenario. 1. ISR on the power efficient CPU has been running for multiple windows. 2. A task which has prev_window higher than IRQ CPU's current capacity migrated to the IRQ CPU. 3. Servicing IRQ is done and the IRQ CPU resets its prev_runnable_rum = CPU's current capacity. 4. Before window rollover, the task on the IRQ CPU migrates to other CPU and fixes up source and destnation CPUs' busy time. 5. BUG_ON(src_rq->prev_runnable_sum < 0) triggers as p->ravg.prev_window is larger than src_rq->prev_runnable_sum. Fix such incorrectness by preserving prev_runnable_sum when ISR spans multiple scheduler windows. There is no need to reset it. CRs-fixed: 828055 Change-Id: I1f95ece026493e49d3810f9c940ec5f698cc0b81 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>	2015-07-10 01:38:02 -07:00
Linux Build Service Account	549c5857cf	Merge "sched: inline function scale_load_to_cpu()"	2015-07-06 21:51:35 -07:00
Joonwoo Park	c6f0c9548b	sched: inline function scale_load_to_cpu() Inline relatively small and frequently used function scale_load_to_cpu(). CRs-fixed: 849655 Change-Id: Id5f60595c394959d78e6da4cc4c18c338fec285b Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>	2015-06-29 11:03:56 -07:00
Joonwoo Park	a7f3ff4330	sched: look for least busy and fallback CPU only when it's needed Function best_small_task_cpu() has bias on mostly idle CPUs and shallow cstate CPUs. Thus chance of needing to find the least busy or the least power cost fallback CPU is quite rare typically. At present, however, the function finds those two CPUs always unnecessarily for most of time. Optimize the function by amending it to look for the least busy CPU and the least power cost fallback CPU only when those are in need. This change is solely for optimization and doesn't make functional changes. CRs-fixed: 849655 Change-Id: I5eca11436e85b448142a7a7644f422c71eb25e8e Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>	2015-06-29 11:01:14 -07:00
Joonwoo Park	59e32e9c46	sched: iterate search CPUs starting from prev_cpu for optimization Function best_small_task_cpu() looks for a mostly idle CPU and returns it as the best CPU for a given small task. At present, however, it cannot break the CPU search loop when the function found a mostly idle CPU but continues to iterate CPU search loop because the function needs to find and return the given task's previous CPU as the best CPU to avoid unnecessary task migration when the previous CPU is mostly idle. Optimize the function best_small_task_cpu() to iterate search CPUs starting from the given task's CPU so it can break the loop as soon as mostly idle CPU found. This optimization saves few hundreds ns spent by the function and doesn't make any functional change. CRs-fixed: 849655 Change-Id: I8c540963487f4102dac4d54e9f98e24a4a92a7b3 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>	2015-06-29 11:01:14 -07:00
Syed Rameez Mustafa	95ff722ec7	sched: Optimize the select_best_cpu() "for" loop select_best_cpu() is agnostic of the hardware topology. This means that certain functions such as task_will_fit() and skip_cpu() are run unnecessarily for every CPU in a cluster whereas they need to run only once per cluster. Reduce the execution time of select_best_cpu() by ensuring these functions run only once per cluster. The frequency domain mask is used to identify CPUs that fall in the same cluster. CRs-fixed: 849655 Change-Id: Id24208710a0fc6321e24d9a773f00be9312b75de Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: added continue after clearing search_cpus. fixed indentations with space. fixed skip_cpu() to return true when rq == task_rq.] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>	2015-06-29 11:01:13 -07:00
Syed Rameez Mustafa	89e7c56a27	sched: Optimize select_best_cpu() to reduce execution time select_best_cpu() is a crucial wakeup routine that determines the time taken by the scheduler to wake up a task. Optimize this routine to get higher performance. The following changes have been made as part of the optimization listed in order of how they built on top of one another: * Several routines called by select_best_cpu() recalculate task load and CPU load even though these are already known quantities. For example mostly_idle_cpu_sync() calculates CPU load; task_will_fit() calculates task load before spill_threshold_crossed() recalculates both. Remove these redundant calculations by moving the task load and CPU load computations to the select_best_cpu() 'for' loop and passing to any functions that need the information. * Rewrite best_small_task_cpu() to avoid the existing two pass approach. The two pass approach was only in place to find the minimum power cluster for small task placement. This information can easily be established by looking at runqueue capacities. The cluster with not the highest capacity constitutes the minimum power cluster. A special CPU mask is called the mpc_mask required to safeguard against undue side effects on SMP systems. Also terminate the function early if the previous CPU is found to be mostly_idle. * Reorganize code to ensure that no unnecessary computations or variable assignments are done. For example there is no need to compute CPU load if that information does not end up getting used in any iteration of the 'for' loop. * The tick logic for EA migrations unnecessarily checks for the power of all CPUs only for skip_cpu() to throw away the result later. Ensure that for EA we only check CPUs within the same cluster and avoid running select_best_cpu() whenever possible. CRs-fixed: 849655 Change-Id: I4e722912fcf3fe4e365a826d4d92a4dd45c05ef3 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: fixed cpufreq_notifier_policy() to set mpc_mask. added a comment about prerequisite of lower_power_cpu_available(). s/struct rq * rq/struct rq *rq/.] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>	2015-06-29 11:01:13 -07:00

1 2 3 4 5 ...

810 Commits