Keep time calculation in 64-bit throughout. If we have long times
between idle calculations this can result in deltas > 32 bits
which causes incorrect load percentage calculations and selecting
the wrong frequencies if we truncate here.
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Signed-off-by: Alex Naidis <alex.naidis@linux.com>
Git-commit: 865c05d076acdc1e943b0e04f96721546a2e74ca
Git-repo: https://android.googlesource.com/kernel/common.git
Signed-off-by: Srinivasarao P <spathi@codeaurora.org>
Set use_sched_load tunable early in store so that we pass
the correct 64-bit jiffy to scheduler.
Change-Id: I46ed73441c9d242f15e5759360d0cea4a9dd23d0
Signed-off-by: Hanumath Prasad <hpprasad@codeaurora.org>
There is a race window as explained below when governor tries to change
the cpu frequency and some other thread (say thermal mitigation) try to
change the policy limits simultaneously.
speedchange task (ThreadA) Thread B(say Thermal)
cpufreq_interactive_speedchange_task()
|
__cpufreq_driver_target()
|
set_cpu_freq()
|
cpufreq_update_policy()
|
modified policy_max
|
check policy->curr against
new policy limits,return
without calling
__cpufreq_driver_target as
policy->curr(which is not
updated by ThreadA) is still
within the new policy limits.
|
sent CPUFREQ_POSTCHANGE notification
|
updated policy->cur which happens to be higher than policy->max
This results the current frequency being higher than the policy->max and
violating the policy limits. This causes thermal impact and in turn high
power consumption. So Fix this by calling __cpufreq_driver_target() always
with current frequency and leave it to __cpufreq_driver_target() to
guarantee there is no race condition when multiple threads are changing
frequencies.
Change-Id: I9136e9245677e8fc90a628d3099aca8d63d3677c
Signed-off-by: Hanumath Prasad <hpprasad@codeaurora.org>
The above hispeed delay and min sample time delays are used to
distinguish between sporadic load changes versus steady state load
changes. The governor tried to make sure the frequency changes only
when the load change is a steady state load change.
However, when the load change is for predictable reasons like
migration, the delays only negatively affect performance and power.
Once a significant load is migrated into a CPU, it's fairly reasonable
to assume it's going to continue contributing that additional load.
Similarly once a significant load is migrated away from a CPU, it's
fairly reasonable to assume the load will be gone forever. Future
migrations can bring back a load or take it away, but the
notifications that come along with it will allow us to quickly correct
for it. For this reason, when the load change is due to a
notification, do not delay frequency changes.
Change-Id: I19ad294b599e30654fbbeb0c56e8b50b0e19198f
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
When governor is using regular busy time tracking, cpu_load will
never exceed 100 because busy time will never exceed elapsed time in
any one sampling window. The only exception is when frequency is
reduced in middle of a window (e.g. due to thermal throttling). In
this case, cpu_load is likely irrelevant since current frequency
governor has been voting is already higher than what target can run
at.
However, on a heterogeneous CPU system with scheduler input enabled
to track the load of migrated tasks, cpu_load could also exceed 100
when a task migrates from more capable CPU to slower CPU. When this
happens, governor already knows the exact frequency required to handle
this load. There is no need to progressively ramp up frequency in order
to assess the load's real demand. It's not desirable to starve such a
migrating task by forcing it through ramping up process on the slower
CPU.
Direclty jump beyond hispeed_freq and ignore above_hispeed_delay if
cpu_load exceeds 100.
Change-Id: Ib87057e4f00732fad943ab595a33e3059494ef15
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
When a CPU is running at policy->min, slack timer will not be scheduled.
If policy->min is reduced later, current implementation doesn't
reschedule slack timer and thus could leave CPU at a higher
frequency indefinitely as long as the CPU is idle. This behavior is
undesirable from power perspective.
Change-Id: I40bfd7c93ad3fd06e3837dc48befdc07f29c78c8
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
min_sample_time needs to be cluster-based to match
above_hispeed_delay. If each CPU keeps making local decisions, it's
possible min_sample_time is not correctly enforced at cluster level,
which results in undesired frequency drops.
Change-Id: I3eb24971b5d57260426f4932f37bb9ec170c3e42
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
It is not correct to boost all the cpus when tunable boost
parameters are changed. It also does not need to boost the
cpus which is already boosted.
Signed-off-by: Lianwei Wang <a22439@motorola.com>
Git-commit: 805863ee7ef93d05cd654e069f5d7f00a0f4b257
Git-repo: https://android.googlesource.com/kernel/common.git
[imaund@codeaurora.org: Resolved context conflicts]
Signed-off-by: Ian Maund <imaund@codeaurora.org>
Currently interactive governor timer doesn't re-arm itself when it
selects policy->max as the new frequency to go to. On hitting idle
for the first time at max frequency timer is armed again. This
mechanism doesn't have any noticeable performance benefits since
CPUs running at max frequencies without going into idles show high
loads which prevents the governor from lowering their frequencies.
This change rearms the timer even at max frequency which removes
the need to handle idle starts. This simplifies the code and also
makes the governor timer windows more regular so that the
notifications going out from the governor are uniformly spaced
apart. Max freq hysteresis start timestamp is refreshed everytime
policy->max is selected as the new frequency to prevent stepping
down from max frequency earlier than intended.
Change-Id: I9c137113b703f2064f1e668628db91de94cc0887
Signed-off-by: Rohit Gupta <rohgup@codeaurora.org>
Compiler warns about hvt variable in
cpufreq_interactive_speedchange_task() being used without
initialization. Initialize it to ~0ULL.
Note that in reality, this won't happen because when governor_enabled
is true, these two conditions are guaranteed:
1) policy->cpus won't be empty, and
2) target_freq won't be 0.
Otherwise, it indicates more serious issues in the system.
With these two conditions, hvt will be overwritten by at least
one CPU's local_hvttime.
Change-Id: I4378393ed811674f25d54852c296ee5ff407e7e3
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
Current implementation of cpufreq_interactive_enable_sched_input()
returns early if use_sched_input is already enabled. This breaks
refcounting for migration notification registration. It could also
result in failure of registering migration notification after
hotplugging the entire cluster and/or suspend/resume.
Change-Id: I079b2c70b182f696cd8a883f5c8e3a37b5c6d21d
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
down_read_trylock is not always non-blocking if the same thread calls
down_write() before.
CPU1 CPU2
down_read()
down_write()
__down_write_nested()
schedule()
__down_read_trylock()
up_read()
acquires sem->wait_lock
__rwsem_wake_one_writer()
tries to lock sem->wait_lock
Now CPU2 is waiting for CPU1's schedule() to complete, while holding
sem->wait_lock. CPU1 needs sem->wait_lock to continue.
This problem only happens after cpufreq_interactive introduced load
change notification that could be called within schedule().
Add a separate flag to ignore notification if current thread is in
middle of down_write(). This avoids attempting to hold sem->wait_lock.
The additional flag doesn't have any side effects because
down_read_trylock() would have failed anyway.
Change-Id: Iff97cac36c170cf6d03f36de695141289c3d6930
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
Report CPU load to modules subscribed to cpufreq govinfo notification
chain every time governor timer expires to evaluate load.
Change-Id: I0b35947b1924c179649aafa0b7b93d974164af1a
Signed-off-by: Rohit Gupta <rohgup@codeaurora.org>
Change 0cd554ae58
(cpufreq: interactive: Exercise hispeed settings at a policy level)
introduced policy level hispeed settings. The change is correct in
general, but it sets hispeed_validate_time after actual frequency
change has completed. Waking up speedchange task and setting
frequency takes non-trival amount of time. This period is not accounted
for in above_hispeed_delay, resulting in additional delays when
ramping up frequency. Frequency switch latency varies a lot
depending on beginning and end frequency, and thus it cannot be easily
compensated by user setting above_hispeed_delay.
Record a local hispeed_validated_time in every CPU's timer function.
Cluster hispeed_validated_time is the local hispeed_validated_time of
CPUs voting for the highest frequency.
Change-Id: Id8ae547fe3a70f8710f60b6e2125954111b7a2b6
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
Disable sample window alignment by default to match default behavior
of upstream interactive governor.
Change-Id: Ibbf4bdd4dd423f97d3a9dd5442eba78b378e66e2
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
If a heavy task migrates between otherwise idle CPUs in a policy during
every sample window, the above hispeed delay window for the CPUs would get
restarted for every sample window. Due to the continuous restart of above
hispeed delay window, none of the CPUs would ever pick a target frequency
higher than hispeed frequency. This causes the policy's frequency to be
stuck at hispeed freq even if the load justifies a higher frequency.
To fix this, the above high speed delay window is restarted only when the
policy frequency changes. This ensures that tasks migrating between CPUs in
a policy are handled correctly.
Also, the hispeed load/frequency heuristic is only necessary when the
information is insufficient to determine if the load on the CPU needs at
least hispeed frequency. When the policy frequency is already at or above
hispeed frequency, if the CPU load% based on policy frequency is not above
hispeed load, then the information is clearly sufficient to determine that
the load on the CPU does not need hispeed frequency.
Therefore, compute CPU load% (which is used only to compare against hispeed
load) based on policy frequency instead of CPU target frequency.
Change-Id: I1749d663949e34753ecb5c426a16563796f8b0b2
Signed-off-by: Saravana Kannan <skannan@codeaurora.org>
Previously, there was a limitation in load change callback that it
can't attempt to wake up a task. Therefore the best we can do is to
schedule timer at current jiffy. The timer function will only be
executed at next timer tick. This could take up to 10ms.
Now that this limitation is removed, re-evaluate load immediately upon
receiving this callback.
Change-Id: Iab3de4705b9aae96054655b1541e32fb040f7e60
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
Make sampling window alignment optional when scheduler inputs
are not enabled.
Change-Id: If69c111a3efe219cdd1e38c1f46f03404789c0bb
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
Previously known as sampling down factor, max_freq_hysteresis
extends the period that interactive governor will stay at policy->max.
This feature is to accomodate short idle periods in an otherwise very
intensive workload.
When the feature is enabled, it ensures that once a CPU goes to max
frequency, it doesn't reduce the frequency for max_freq_hysteresis
microseconds from the time it first goes to idle.
Change-Id: Ia54985cb554f63f8c22d0b554a0a0f2ed2be038f
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
Interactive governor does not have enough information about the tasks
on a CPU to make a more informed decision on the frequency the CPUs
should run at. To address this problem, modify interactive governor
to get load information from scheduler. In addition, it can get
notification from scheduler on significant load change to reevaluate
CPU frequency immediately.
Add two sysfs file to control the behavior of load evaluation:
use_sched_load:
When enabled, governor uses load information from scheduler
instead of busy/idle time from past window.
use_migration_notif:
Whenever a task migrates, scheduler might send a notification
so that governor can re-evaluate load and scale frequency.
Governor will ignore this notification unless both
use_sched_hint and use_migration_notification are true for
the policy group.
Change-Id: Iaf66e424c6166ec15480db027002b3a3b357d79c
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
Replace mod_timer_pinned() with del_timer(), add_timer_on().
mod_timer_pinned() always adds timer onto current CPU. Interactive
governor expects each CPU's timers to be running on the same CPU.
If cpufreq_interactive_timer_resched() is called from another CPU,
the timer will be armed on the wrong CPU.
Replacing mod_timer_pinned() with del_timer() and add_timer_on()
guarantees timers are still run on the right CPU even if another
CPU reschedules the timer. This would provide more flexibility
for future changes.
Change-Id: I3a10be37632afc0ea4e0cc9c86323b9783b216b1
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
Currently, tunables are only saved to per_cpu field when
CPUFREQ_GOV_POLICY_EXIT event happens. Save tunables the moment they
are created so that per_cpu cached_tunables field always matches
the tunables in use. This is useful for modifying tunable values
across clusters.
Change-Id: I9e30d5e93d6fde1282b5450458d8a605d568a0f5
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
Timers are scheduled in unit of jiffies. Round up timer_rate so that
it matches the actual sampling period.
Change-Id: I47e666f835752528331f50b1e76784e6d67f8bcf
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
When CPU has been busy for a long time, last evaluated jiffy will be
quite behind because the timer would have been canceled. We don't want
to schedule a timer to fire in the past as load will always be 100%.
Reset last evaluated jiffy so that timer will be scheduled for the
next window.
Change-Id: Ie25e65eab1f16acdeda267987ca605d653f1f32a
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
It's more advantageous to evaluate all CPUs at same time so that
interactive governor gets a complete picture of the load on
each CPU at a specific time. It could also reduce number of speed
changes made if there are many CPUs controlled by same policy. In
addition, waking up all CPUs at same time would allow the cluster
to go into a deeper sleep state when it's idle.
Change-Id: I6915050c5339ef1af106eb906ebe4b7c618061e2
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
Interactive governor already has a per_cpu field cpuinfo to keep track
of per_cpu data. Move cached_tunables into cpuinfo.
Change-Id: I77fda0cda76b56ff949456a95f96d129d877aa7b
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
Remove sampling_down_factor feature.
This commit revert d094d23694
(cpufreq: interactive: Add a sampling_down_factor for max frequencies)
and subsequent modifications related to sampling down factor.
Change-Id: Ib7ec0a918bd3e85a3425dbdeefcd2f2aecffe69c
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
Sync freq feature is not valid for a HMP system with clusters.
This commit reverts commit f3d1980b4d
(cpufreq: interactive: sync freq feature for interactive governor)
Change-Id: I78cb91a94b1a022f8daed045f5aae69f1c00783d
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
Commit f8b276565c
(cpufreq: Sync on thread migration optimizations) is no longer needed
for targets with synchronous CPUs.
Part of that commit has already been reverted in
a913b3afca
(cpufreq: interactive: Revert timer start modification)
This commit reverts the remaining changes.
Change-Id: I7eadeb7e48cfbef8fec74eb1b0e221eb65482f52
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
This reverts commit ff6af80775.
Commit ff6af807 tries to avoid a corner case where frequency is stuck in
hispeed_freq for one additional window. For example, if timer_rate is
20ms, and go_hispeed_delay is 40ms, frequency might be stuck at
hispeed_freq for 60ms due to imprecision in jiffies. Same problem can be
easily solved by making go_hispeed_delay 1ms smaller instead of changing
the code.
Change-Id: Idab7c29ed28374df219210e444454068864d144d
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
When tunables are not available for events other than
CPUFREQ_GOV_POLICY_INIT in cpufreq_governor_interactive(), trigger a
panic instead of throwing a warning.
When the original warning happens, some race condition must have
occurred, and governor will be in a bad state even if it might still
run for a while. Panic directly so that it's easier to catch the
first race event.
Change-Id: I2dc1185cabfe72a63739452731fe242924d2cf45
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
To avoid multiple frees of an allocated tunables struct during
module_exit(), the pointer to the allocated tunables should be stored in
only one of the per-CPU cached_tunables pointer.
So, in the case of per policy governor configuration, store the cached
values in the pointer of first CPU in a policy. In the case of one governor
across all policies, store it in the CPU0 pointer.
Change-Id: Id4334246491519ac91ab725a8758b2748f743bb0
Signed-off-by: Saravana Kannan <skannan@codeaurora.org>
The cpufreq_interactive_timer gets cancelled and rescheduled
whenever the cpufreq_policy is changed. When the cpufreq policy is
changed at a rate faster than the sampling_rate of the interactive
governor, then the governor misses to change the target frequency
for long duration. The patch removes the need of cancelling the
timers when policy->min is changed.
Change-Id: Ibd98d151e1c73b8bd969484583ff98ee9f1135ef
Signed-off-by: Badhri Jagan Sridharan <Badhri@google.com>
Git-commit: 9b97d655a558607c5d46ef1f21365d695f8d1ee2
Git-Repo: https://android.googlesource.com/kernel/common.git
[junjiew@codeaurora.org: resolve merge conflicts]
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
commit f8b276565c
(cpufreq: Sync on thread migration optimizations)
introduced a change to cpufreq_interactive_timer_start() in order
to reschedule the timer differently based on whether min or max
is changed. A better way is to reschedule the timer only when
necessary.
Revert timer start modification in preparation for the final fix.
Change-Id: I13f3b75a6eee03ac6380c24db899806a9bfbc96a
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
common_tunables are only used in cpufreq_interactive. Make it
static.
Change-Id: Iec8ee12af2728c8878d001dc1cf3613be529dc67
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
Userspace might change tunable values for a governor. Currently, if
all CPUs in a policy go offline, governor frees its tunable. This
wipes out all userspace modifications. Kernel drivers can call
cpu_up/down() directly and thus userspace won't have a chance to
restore the tunables.
Permanently save tunable struct in a per_cpu field so that we
preserve tunable values across hotplug, suspend/resume and governor
switch.
Change-Id: I126b8278c8e75c8eadb3e2ddfe97fcc72cddfa23
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
sysfs ops for target_loads and above_hispeed_delay can be called before
initializing tunables at CPUFREQ_GOV_POLICY_INIT. Create sysfs entries after
initialization.
Change-Id: I50356198d7629731c0d32a3066d61fe8354e0001
Signed-off-by: Minsung Kim <ms925.kim@samsung.com>
Git-commit: 0ac276ebfca1d405153f4a3476aa1f7f66bbbec8
Git-Repo: https://android.googlesource.com/kernel/common.git
[junjiew@codeaurora.org: Resolve merge conflicts]
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
The gcc warns like:
cpufreq_interactive.c:745:6: warning: operation on 'ret' may be undefined [-Wsequence-point]
It was introduced by commit cf0fad49d17cb8273ce555dd5b7afab67d7923bf.
Since sprintf(...) just return 1 (one character) in this case, ret should not changed.
Just discarding the result of sprintf(...) leads to the result that
the committer of cf0fad49d17cb8273ce555dd5b7afab67d7923bf wants.
Change-Id: Ifed1cef6d6a31c3ed23dad03a567b3b9eddf3a57
Signed-off-by: Chih-Wei Huang <cwhuang@android-x86.org>
Git-commit: 0715d10b9e6c04327adb189e13ecd6a3c2df48ce
Git-Repo: https://android.googlesource.com/kernel/common.git
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
If we have a multi-package system, where we have multiple instances of struct
policy (per package), currently we can't have multiple instances of same
governor. i.e. We can't have multiple instances of Interactive governor for
multiple packages.
This is a bottleneck for multicluster system, where we want different packages
to use Interactive governor, but with different tunables.
This patch uses the infrastructure provided by earlier patches pushed in
Mainline in v3.10-rc1/rc2 and implements per policy instances of Interactive
governor.
Change-Id: I70436d4a5a45c6cb6edf37f3e46d0b9fbc930982
[toddpoynor@google.com: merge with later code, minor changes]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Git-commit: 65f53ba0525cf92f397b22aea94ee637542a6757
Git-repo: https://android.googlesource.com/kernel/common/
[junjiew@codeaurora.org: resolved many conflicts to keep our previous
modifications to interactive governor]
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
This moves definition of cpufreq_gov_interactive towards the bottom of file, so
that we don't have to add prototype of cpufreq_governor_interactive() in the
beginning of file.
Change-Id: I04bd1004954eb36502c5cd7e35d3d7274cddaf95
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Git-commit: e50d640cacfe61e69f1b4fa92fa0174c688e919c
Git-Repo: https://android.googlesource.com/kernel/common.git
[junjiew@codeaurora.org: Resolve merge conflicts]
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>