Commit Graph

810 Commits

Author SHA1 Message Date
Harald Gustafsson 77a8fdf49f sched/deadline: Add period support for SCHED_DEADLINE tasks
Make it possible to specify a period (different or equal than
deadline) for -deadline tasks. Relative deadlines (D_i) are used on
task arrivals to generate new scheduling (absolute) deadlines as "d =
t + D_i", and periods (P_i) to postpone the scheduling deadlines as "d
= d + P_i" when the budget is zero.

This is in general useful to model (and schedule) tasks that have slow
activation rates (long periods), but have to be scheduled soon once
activated (short deadlines).

Signed-off-by: Harald Gustafsson <harald.gustafsson@ericsson.com>
Signed-off-by: Dario Faggioli <raistlin@linux.it>
Signed-off-by: Juri Lelli <juri.lelli@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1383831828-15501-7-git-send-email-juri.lelli@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Git-Commit: 755378a47192a3d1f7c3a8ca6c15c1cf76de0af2
Git-Repo: git://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2015-05-19 19:15:59 -07:00
Dario Faggioli 5861add1b6 sched/deadline: Add SCHED_DEADLINE avg_update accounting
Make the core scheduler and load balancer aware of the load
produced by -deadline tasks, by updating the moving average
like for sched_rt.

Signed-off-by: Dario Faggioli <raistlin@linux.it>
Signed-off-by: Juri Lelli <juri.lelli@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1383831828-15501-6-git-send-email-juri.lelli@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Git-Commit: 239be4a982154ea0c979fca5846349bb68973aed
Git-Repo: git://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2015-05-19 19:15:58 -07:00
Juri Lelli 0316441630 sched/deadline: Add SCHED_DEADLINE SMP-related data structures & logic
Introduces data structures relevant for implementing dynamic
migration of -deadline tasks and the logic for checking if
runqueues are overloaded with -deadline tasks and for choosing
where a task should migrate, when it is the case.

Adds also dynamic migrations to SCHED_DEADLINE, so that tasks can
be moved among CPUs when necessary. It is also possible to bind a
task to a (set of) CPU(s), thus restricting its capability of
migrating, or forbidding migrations at all.

The very same approach used in sched_rt is utilised:
 - -deadline tasks are kept into CPU-specific runqueues,
 - -deadline tasks are migrated among runqueues to achieve the
   following:
    * on an M-CPU system the M earliest deadline ready tasks
      are always running;
    * affinity/cpusets settings of all the -deadline tasks is
      always respected.

Therefore, this very special form of "load balancing" is done with
an active method, i.e., the scheduler pushes or pulls tasks between
runqueues when they are woken up and/or (de)scheduled.
IOW, every time a preemption occurs, the descheduled task might be sent
to some other CPU (depending on its deadline) to continue executing
(push). On the other hand, every time a CPU becomes idle, it might pull
the second earliest deadline ready task from some other CPU.

To enforce this, a pull operation is always attempted before taking any
scheduling decision (pre_schedule()), as well as a push one after each
scheduling decision (post_schedule()). In addition, when a task arrives
or wakes up, the best CPU where to resume it is selected taking into
account its affinity mask, the system topology, but also its deadline.
E.g., from the scheduling point of view, the best CPU where to wake
up (and also where to push) a task is the one which is running the task
with the latest deadline among the M executing ones.

In order to facilitate these decisions, per-runqueue "caching" of the
deadlines of the currently running and of the first ready task is used.
Queued but not running tasks are also parked in another rb-tree to
speed-up pushes.

Signed-off-by: Juri Lelli <juri.lelli@gmail.com>
Signed-off-by: Dario Faggioli <raistlin@linux.it>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1383831828-15501-5-git-send-email-juri.lelli@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[rameezmustafa@codeaurora.org: Port to msm-3.10]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
Git-Commit: 1baca4ce16b8cc7d4f50be1f7914799af30a2861
Git-Repo: git://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git
2015-05-19 19:15:58 -07:00
Dario Faggioli 362f964efa sched/deadline: Add SCHED_DEADLINE structures & implementation
Introduces the data structures, constants and symbols needed for
SCHED_DEADLINE implementation.

Core data structure of SCHED_DEADLINE are defined, along with their
initializers. Hooks for checking if a task belong to the new policy
are also added where they are needed.

Adds a scheduling class, in sched/dl.c and a new policy called
SCHED_DEADLINE. It is an implementation of the Earliest Deadline
First (EDF) scheduling algorithm, augmented with a mechanism (called
Constant Bandwidth Server, CBS) that makes it possible to isolate
the behaviour of tasks between each other.

The typical -deadline task will be made up of a computation phase
(instance) which is activated on a periodic or sporadic fashion. The
expected (maximum) duration of such computation is called the task's
runtime; the time interval by which each instance need to be completed
is called the task's relative deadline. The task's absolute deadline
is dynamically calculated as the time instant a task (better, an
instance) activates plus the relative deadline.

The EDF algorithms selects the task with the smallest absolute
deadline as the one to be executed first, while the CBS ensures each
task to run for at most its runtime every (relative) deadline
length time interval, avoiding any interference between different
tasks (bandwidth isolation).
Thanks to this feature, also tasks that do not strictly comply with
the computational model sketched above can effectively use the new
policy.

To summarize, this patch:
 - introduces the data structures, constants and symbols needed;
 - implements the core logic of the scheduling algorithm in the new
   scheduling class file;
 - provides all the glue code between the new scheduling class and
   the core scheduler and refines the interactions between sched/dl
   and the other existing scheduling classes.

Signed-off-by: Dario Faggioli <raistlin@linux.it>
Signed-off-by: Michael Trimarchi <michael@amarulasolutions.com>
Signed-off-by: Fabio Checconi <fchecconi@gmail.com>
Signed-off-by: Juri Lelli <juri.lelli@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1383831828-15501-4-git-send-email-juri.lelli@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[rameezmustafa@codeaurora.org: Port to msm-3.10]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
Git-Commit: aab03e05e8f7e26f51dee792beddcb5cca9215a5
Git-Repo: git://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git
2015-05-19 19:15:57 -07:00
Frederic Weisbecker dbb0220237 sched: Use an accessor to read the rq clock
Read the runqueue clock through an accessor. This
prepares for adding a debugging infrastructure to
detect missing or redundant calls to update_rq_clock()
between a scheduler's entry and exit point.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Turner <pjt@google.com>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1365724262-20142-6-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[rameezmustafa@codeaurora.org: Port to msm-3.10]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
Git-Commit: 78becc27097585c6aec7043834cadde950ae79f2
Git-Repo: git://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git
2015-05-19 19:15:56 -07:00
Dario Faggioli b970ef27c6 sched: Add sched_class->task_dead() method
Add a new function to the scheduling class interface. It is called
at the end of a context switch, if the prev task is in TASK_DEAD state.

It will be useful for the scheduling classes that want to be notified
when one of their tasks dies, e.g. to perform some cleanup actions,
such as SCHED_DEADLINE.

Signed-off-by: Dario Faggioli <raistlin@linux.it>
Reviewed-by: Paul Turner <pjt@google.com>
Signed-off-by: Juri Lelli <juri.lelli@gmail.com>
Cc: bruce.ashfield@windriver.com
Cc: claudio@evidence.eu.com
Cc: darren@dvhart.com
Cc: dhaval.giani@gmail.com
Cc: fchecconi@gmail.com
Cc: fweisbec@gmail.com
Cc: harald.gustafsson@ericsson.com
Cc: hgu1972@gmail.com
Cc: insop.song@gmail.com
Cc: jkacur@redhat.com
Cc: johan.eker@ericsson.com
Cc: liming.wang@windriver.com
Cc: luca.abeni@unitn.it
Cc: michael@amarulasolutions.com
Cc: nicola.manica@disi.unitn.it
Cc: oleg@redhat.com
Cc: paulmck@linux.vnet.ibm.com
Cc: p.faure@akatech.ch
Cc: rostedt@goodmis.org
Cc: tommaso.cucinotta@sssup.it
Cc: vincent.guittot@linaro.org
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1383831828-15501-2-git-send-email-juri.lelli@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[rameezmustafa@codeaurora.org: Port to msm-3.10]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
Git-Commit: e6c390f2dfd04c165ce45b0032f73fba85b1f282
Git-Repo: git://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git
2015-05-19 19:15:55 -07:00
Syed Rameez Mustafa 3846054d39 Revert "sched: Fix up scheduler syscall LTP fails"
This reverts commit ca454d54c4.
The above commit was an upstream cherry-pick ported to msm-3.10
in the absence of the deadline scheduler. Revert this in anticipation
of introducing the deadline scheduler. The original patch will be
brought in as part of the deadline scheduler port.

Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2015-05-19 19:15:54 -07:00
Peter Zijlstra cad4316a91 sched: Move wait.c into kernel/sched/
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-5q5yqvdaen0rmapwloeaotx3@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[rameezmustafa@codeaurora.org: Port to msm-3.10]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
Git-Commit: 7a6354e241d8fbc145836ac24e47630f12754536
Git-Repo: git://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git
2015-05-19 19:15:46 -07:00
Srivatsa Vaddagiri acb4675380 sched: report loads greater than 100% only during load alert notifications
The busy time of CPUs is adjusted during task migrations. This can
result in reporting the load greater than 100% to the governor and
causes direct jumps to the higher frequencies during the intra cluster
migrations. Hence clip the load to 100% during the load reporting at
the end of the window. The load is not clipped for load alert notifications
which allows ramping up the frequency faster for inter cluster migrations
and heavy task wakeup scenarios.

Change-Id: I7347260aa476287ecfc706d4dd0877f4b75a1089
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
2015-05-18 10:18:40 -07:00
Linux Build Service Account adbca96680 Merge "sched: auto adjust the upmigrate and downmigrate thresholds" 2015-05-18 03:51:50 -07:00
Pavankumar Kondeti 298aa60519 sched: auto adjust the upmigrate and downmigrate thresholds
The load scale factor of a CPU gets boosted when its max freq
is restricted. A task load at the same frequency is scaled higher
than normal under this scenario. This results in tasks migrating
early to the better capacity CPUs and their residency over there
also gets increased as their inflated load would be relatively
higher than than the downmigrate threshold.

Auto adjust the upmigrate and downmigrate thresholds by a factor
equal to  rq->max_possible_freq/rq->max_freq of a lower capacity CPU.
If the adjusted upmigrate threshold exceeds the window size, it is
clipped to the window size. If the adjusted downmigrate threshold
decreases the difference between the upmigrate and downmigrate, it is
clipped to a value such that the difference between the modified
and the original thresholds is same.

Change-Id: Ifa70ee5d4ca5fe02789093c7f070c77629907f04
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
2015-05-15 07:00:34 +05:30
Pavankumar Kondeti 82c540dbe7 sched: don't inherit initial task load from the parent
child task is not supposed to inherit initial task load attribute
from the parent. Reset the child's init_load_pct attribute during
fork.

Change-Id: I458b121f10f996fda364e97b51aaaf6c345c1dbb
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
2015-05-13 09:54:20 +05:30
Linux Build Service Account f3fc67be00 Merge "sched: don't inflate the task load when the CPU max freq is restricted" 2015-05-05 05:12:54 -07:00
Linux Build Service Account ad9f3336f8 Merge "sched/fair: Add irq load awareness to the tick CPU selection logic" 2015-05-01 20:28:25 -07:00
Pavankumar Kondeti 826651e7c0 sched: don't inflate the task load when the CPU max freq is restricted
When the CPU max freq is restricted and the CPU is running at the
max freq, the task load is inflated by max_possible_freq/max_freq
factor. This results in tasks migrating early to the better capacity
CPUs which makes things worse if the frequency restriction is due
to the thermal condition.

Change-Id: Ie0ea405d7005764a6fb852914e88cf97102c138a
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
2015-04-29 16:32:52 +05:30
Linux Build Service Account 44c5e84af0 Merge "sched: disable IRQs in update_min_max_capacity" 2015-04-28 22:50:00 -07:00
Steve Muckle f66245b279 sched: disable IRQs in update_min_max_capacity
IRQs must be disabled while locking runqueues since an
interrupt may cause a runqueue lock to be acquired.

CRs-fixed: 828598
Change-Id: Id66f2e25ed067fc4af028482db8c3abd3d10c20f
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2015-04-27 15:26:32 -07:00
Ian Maund 8b08aa9e75 This is the 3.10.67 stable release
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJUyuGRAAoJEDjbvchgkmk+7EwQALYPOeh+AManQFB1MQvFuOgZ
 /4ulpjhGXw/RPTKHMeyHo8vRfUhMOx8UPF62uql+g1l9b/Zt2bs6qXu4QcxRRsQc
 trSTUpi+U14y1hkgqOVOcFYP2ZaTjNEBQgLJ4eGn46CliLqme+rfoyRYm2GXzcR4
 6cbSAr3mufdFIpi9/8Dn62Gv0aws5lIv3qkHJXznyuux3tisPT5y6Ux2KJoivPn/
 SqADtRpwo+7lTjl15fE++9AqNsGMorV6toT2OO/7nXP+824psInKLmREAT2qC99b
 BG61vcYdxOuHtzmwrvCf1jSRjxhvZT0j2xhBr/vCKcxy08AT0vDv68zrV1r6TIuu
 U7/CKXtFBY95cjfnkTLJuswBSuIA/+sQHV6DaddH0V8fcZ6rQMLrblQ9ZcFFFkmT
 2SG6lmlXqZvcEKYGMnL/Dcow1rkRhB5stiGgTkYxjiRSRpzAHISRJ/GGpsT+rRqK
 HpBs5p9JshvRl7RWKwAu+DNGaEK1X/WYxc4/jw6dZFWX7lEWSMIPlr9zXgZCZ39y
 V6lV1VVlT9/CSs1swKHUyhHHehlFsnIlQ6Fkiycr/KkuqBLs92Hyb7WhpVa819yX
 osXdxSm6J54skiOLKYpBWHpnY09Tc+p28VEfMpErTExgp2oE8F34K7kdhoQPQb97
 2mHiXNa+J4CLUNQ+sRmw
 =HDBo
 -----END PGP SIGNATURE-----

Merge commit 'v3.10.67' into msm-3.10

This merge brings us up to date with upstream kernel.org tag v3.10.67.
It also contains changes to allow forbidden warnings introduced in
the commit 'core, nfqueue, openvswitch: Orphan frags in skb_zerocopy
and handle errors'. Once upstream has corrected these warnings, the
changes to scripts/gcc-wrapper.py, in this commit, can be reverted.

* commit 'v3.10.67' (915 commits)
  Linux 3.10.67
  md/raid5: fetch_block must fetch all the blocks handle_stripe_dirtying wants.
  ext4: fix warning in ext4_da_update_reserve_space()
  quota: provide interface for readding allocated space into reserved space
  crypto: add missing crypto module aliases
  crypto: include crypto- module prefix in template
  crypto: prefix module autoloading with "crypto-"
  drbd: merge_bvec_fn: properly remap bvm->bi_bdev
  Revert "swiotlb-xen: pass dev_addr to swiotlb_tbl_unmap_single"
  ipvs: uninitialized data with IP_VS_IPV6
  KEYS: close race between key lookup and freeing
  sata_dwc_460ex: fix resource leak on error path
  x86/asm/traps: Disable tracing and kprobes in fixup_bad_iret and sync_regs
  x86, tls: Interpret an all-zero struct user_desc as "no segment"
  x86, tls, ldt: Stop checking lm in LDT_empty
  x86/tsc: Change Fast TSC calibration failed from error to info
  x86, hyperv: Mark the Hyper-V clocksource as being continuous
  clocksource: exynos_mct: Fix bitmask regression for exynos4_mct_write
  can: dev: fix crtlmode_supported check
  bus: mvebu-mbus: fix support of MBus window 13
  ARM: dts: imx25: Fix PWM "per" clocks
  time: adjtimex: Validate the ADJ_FREQUENCY values
  time: settimeofday: Validate the values of tv from user
  dm cache: share cache-metadata object across inactive and active DM tables
  ipr: wait for aborted command responses
  drm/i915: Fix mutex->owner inspection race under DEBUG_MUTEXES
  scripts/recordmcount.pl: There is no -m32 gcc option on Super-H anymore
  ALSA: usb-audio: Add mic volume fix quirk for Logitech Webcam C210
  libata: prevent HSM state change race between ISR and PIO
  pinctrl: Fix two deadlocks
  gpio: sysfs: fix gpio device-attribute leak
  gpio: sysfs: fix gpio-chip device-attribute leak
  Linux 3.10.66
  s390/3215: fix tty output containing tabs
  s390/3215: fix hanging console issue
  fsnotify: next_i is freed during fsnotify_unmount_inodes.
  netfilter: ipset: small potential read beyond the end of buffer
  mmc: sdhci: Fix sleep in atomic after inserting SD card
  LOCKD: Fix a race when initialising nlmsvc_timeout
  x86, um: actually mark system call tables readonly
  um: Skip futex_atomic_cmpxchg_inatomic() test
  decompress_bunzip2: off by one in get_next_block()
  ARM: shmobile: sh73a0 legacy: Set .control_parent for all irqpin instances
  ARM: omap5/dra7xx: Fix frequency typos
  ARM: clk-imx6q: fix video divider for rev T0 1.0
  ARM: imx6q: drop unnecessary semicolon
  ARM: dts: imx25: Fix the SPI1 clocks
  Input: I8042 - add Acer Aspire 7738 to the nomux list
  Input: i8042 - reset keyboard to fix Elantech touchpad detection
  can: kvaser_usb: Don't send a RESET_CHIP for non-existing channels
  can: kvaser_usb: Reset all URB tx contexts upon channel close
  can: kvaser_usb: Don't free packets when tight on URBs
  USB: keyspan: fix null-deref at probe
  USB: cp210x: add IDs for CEL USB sticks and MeshWorks devices
  USB: cp210x: fix ID for production CEL MeshConnect USB Stick
  usb: dwc3: gadget: Stop TRB preparation after limit is reached
  usb: dwc3: gadget: Fix TRB preparation during SG
  OHCI: add a quirk for ULi M5237 blocking on reset
  gpiolib: of: Correct error handling in of_get_named_gpiod_flags
  NFSv4.1: Fix client id trunking on Linux
  ftrace/jprobes/x86: Fix conflict between jprobes and function graph tracing
  vfio-pci: Fix the check on pci device type in vfio_pci_probe()
  uvcvideo: Fix destruction order in uvc_delete()
  smiapp: Take mutex during PLL update in sensor initialisation
  af9005: fix kernel panic on init if compiled without IR
  smiapp-pll: Correct clock debug prints
  video/logo: prevent use of logos after they have been freed
  storvsc: ring buffer failures may result in I/O freeze
  iscsi-target: Fail connection on short sendmsg writes
  hp_accel: Add support for HP ZBook 15
  cfg80211: Fix 160 MHz channels with 80+80 and 160 MHz drivers
  ARC: [nsimosci] move peripherals to match model to FPGA
  drm/i915: Force the CS stall for invalidate flushes
  drm/i915: Invalidate media caches on gen7
  drm/radeon: properly filter DP1.2 4k modes on non-DP1.2 hw
  drm/radeon: check the right ring in radeon_evict_flags()
  drm/vmwgfx: Fix fence event code
  enic: fix rx skb checksum
  alx: fix alx_poll()
  tcp: Do not apply TSO segment limit to non-TSO packets
  tg3: tg3_disable_ints using uninitialized mailbox value to disable interrupts
  netlink: Don't reorder loads/stores before marking mmap netlink frame as available
  netlink: Always copy on mmap TX.
  Linux 3.10.65
  mm: Don't count the stack guard page towards RLIMIT_STACK
  mm: propagate error from stack expansion even for guard page
  mm, vmscan: prevent kswapd livelock due to pfmemalloc-throttled process being killed
  perf session: Do not fail on processing out of order event
  perf: Fix events installation during moving group
  perf/x86/intel/uncore: Make sure only uncore events are collected
  Btrfs: don't delay inode ref updates during log replay
  ARM: mvebu: disable I/O coherency on non-SMP situations on Armada 370/375/38x/XP
  scripts/kernel-doc: don't eat struct members with __aligned
  nilfs2: fix the nilfs_iget() vs. nilfs_new_inode() races
  nfsd4: fix xdr4 inclusion of escaped char
  fs: nfsd: Fix signedness bug in compare_blob
  serial: samsung: wait for transfer completion before clock disable
  writeback: fix a subtle race condition in I_DIRTY clearing
  cdc-acm: memory leak in error case
  genhd: check for int overflow in disk_expand_part_tbl()
  USB: cdc-acm: check for valid interfaces
  ALSA: hda - Fix wrong gpio_dir & gpio_mask hint setups for IDT/STAC codecs
  ALSA: hda - using uninitialized data
  ALSA: usb-audio: extend KEF X300A FU 10 tweak to Arcam rPAC
  driver core: Fix unbalanced device reference in drivers_probe
  x86, vdso: Use asm volatile in __getcpu
  x86_64, vdso: Fix the vdso address randomization algorithm
  HID: Add a new id 0x501a for Genius MousePen i608X
  HID: add battery quirk for USB_DEVICE_ID_APPLE_ALU_WIRELESS_2011_ISO keyboard
  HID: roccat: potential out of bounds in pyra_sysfs_write_settings()
  HID: i2c-hid: prevent buffer overflow in early IRQ
  HID: i2c-hid: fix race condition reading reports
  iommu/vt-d: Fix an off-by-one bug in __domain_mapping()
  UBI: Fix double free after do_sync_erase()
  UBI: Fix invalid vfree()
  pstore-ram: Allow optional mapping with pgprot_noncached
  pstore-ram: Fix hangs by using write-combine mappings
  PCI: Restore detection of read-only BARs
  ASoC: dwc: Ensure FIFOs are flushed to prevent channel swap
  ASoC: max98090: Fix ill-defined sidetone route
  ASoC: sigmadsp: Refuse to load firmware files with a non-supported version
  ath5k: fix hardware queue index assignment
  swiotlb-xen: pass dev_addr to swiotlb_tbl_unmap_single
  can: peak_usb: fix memset() usage
  can: peak_usb: fix cleanup sequence order in case of error during init
  ath9k: fix BE/BK queue order
  ath9k_hw: fix hardware queue allocation
  ocfs2: fix journal commit deadlock
  Linux 3.10.64
  Btrfs: fix fs corruption on transaction abort if device supports discard
  Btrfs: do not move em to modified list when unpinning
  eCryptfs: Remove buggy and unnecessary write in file name decode routine
  eCryptfs: Force RO mount when encrypted view is enabled
  udf: Verify symlink size before loading it
  exit: pidns: alloc_pid() leaks pid_namespace if child_reaper is exiting
  ncpfs: return proper error from NCP_IOC_SETROOT ioctl
  crypto: af_alg - fix backlog handling
  userns: Unbreak the unprivileged remount tests
  userns: Allow setting gid_maps without privilege when setgroups is disabled
  userns: Add a knob to disable setgroups on a per user namespace basis
  userns: Rename id_map_mutex to userns_state_mutex
  userns: Only allow the creator of the userns unprivileged mappings
  userns: Check euid no fsuid when establishing an unprivileged uid mapping
  userns: Don't allow unprivileged creation of gid mappings
  userns: Don't allow setgroups until a gid mapping has been setablished
  userns: Document what the invariant required for safe unprivileged mappings.
  groups: Consolidate the setgroups permission checks
  umount: Disallow unprivileged mount force
  mnt: Update unprivileged remount test
  mnt: Implicitly add MNT_NODEV on remount when it was implicitly added by mount
  mac80211: free management frame keys when removing station
  mac80211: fix multicast LED blinking and counter
  KEYS: Fix stale key registration at error path
  isofs: Fix unchecked printing of ER records
  x86/tls: Don't validate lm in set_thread_area() after all
  dm space map metadata: fix sm_bootstrap_get_nr_blocks()
  dm bufio: fix memleak when using a dm_buffer's inline bio
  nfs41: fix nfs4_proc_layoutget error handling
  megaraid_sas: corrected return of wait_event from abort frame path
  mmc: block: add newline to sysfs display of force_ro
  mfd: tc6393xb: Fail ohci suspend if full state restore is required
  md/bitmap: always wait for writes on unplug.
  x86, kvm: Clear paravirt_enabled on KVM guests for espfix32's benefit
  x86_64, switch_to(): Load TLS descriptors before switching DS and ES
  x86/tls: Disallow unusual TLS segments
  x86/tls: Validate TLS entries to protect espfix
  isofs: Fix infinite looping over CE entries
  Linux 3.10.63
  ALSA: usb-audio: Don't resubmit pending URBs at MIDI error recovery
  powerpc: 32 bit getcpu VDSO function uses 64 bit instructions
  ARM: sched_clock: Load cycle count after epoch stabilizes
  igb: bring link up when PHY is powered up
  ext2: Fix oops in ext2_get_block() called from ext2_quota_write()
  nEPT: Nested INVEPT
  net: sctp: use MAX_HEADER for headroom reserve in output path
  net: mvneta: fix Tx interrupt delay
  rtnetlink: release net refcnt on error in do_setlink()
  net/mlx4_core: Limit count field to 24 bits in qp_alloc_res
  tg3: fix ring init when there are more TX than RX channels
  ipv6: gre: fix wrong skb->protocol in WCCP
  sata_fsl: fix error handling of irq_of_parse_and_map
  ahci: disable MSI on SAMSUNG 0xa800 SSD
  AHCI: Add DeviceIDs for Sunrise Point-LP SATA controller
  media: smiapp: Only some selection targets are settable
  drm/i915: Unlock panel even when LVDS is disabled
  drm/radeon: kernel panic in drm_calc_vbltimestamp_from_scanoutpos with 3.18.0-rc6
  i2c: davinci: generate STP always when NACK is received
  i2c: omap: fix i207 errata handling
  i2c: omap: fix NACK and Arbitration Lost irq handling
  xen-netfront: Remove BUGs on paged skb data which crosses a page boundary
  mm: fix swapoff hang after page migration and fork
  mm: frontswap: invalidate expired data on a dup-store failure
  Linux 3.10.62
  nfsd: Fix ACL null pointer deref
  powerpc/powernv: Honor the generic "no_64bit_msi" flag
  bnx2fc: do not add shared skbs to the fcoe_rx_list
  nfsd4: fix leak of inode reference on delegation failure
  nfsd: Fix slot wake up race in the nfsv4.1 callback code
  rt2x00: do not align payload on modern H/W
  can: dev: avoid calling kfree_skb() from interrupt context
  spi: dw: Fix dynamic speed change.
  iser-target: Handle DEVICE_REMOVAL event on network portal listener correctly
  target: Don't call TFO->write_pending if data_length == 0
  srp-target: Retry when QP creation fails with ENOMEM
  Input: xpad - use proper endpoint type
  ARM: 8222/1: mvebu: enable strex backoff delay
  ARM: 8216/1: xscale: correct auxiliary register in suspend/resume
  ALSA: usb-audio: Add ctrl message delay quirk for Marantz/Denon devices
  can: esd_usb2: fix memory leak on disconnect
  USB: xhci: don't start a halted endpoint before its new dequeue is set
  usb-quirks: Add reset-resume quirk for MS Wireless Laser Mouse 6000
  usb: serial: ftdi_sio: add PIDs for Matrix Orbital products
  USB: serial: cp210x: add IDs for CEL MeshConnect USB Stick
  USB: keyspan: fix tty line-status reporting
  USB: keyspan: fix overrun-error reporting
  USB: ssu100: fix overrun-error reporting
  iio: Fix IIO_EVENT_CODE_EXTRACT_DIR bit mask
  powerpc/pseries: Fix endiannes issue in RTAS call from xmon
  powerpc/pseries: Honor the generic "no_64bit_msi" flag
  of/base: Fix PowerPC address parsing hack
  ASoC: wm_adsp: Avoid attempt to free buffers that might still be in use
  ASoC: sgtl5000: Fix SMALL_POP bit definition
  PCI/MSI: Add device flag indicating that 64-bit MSIs don't work
  ipx: fix locking regression in ipx_sendmsg and ipx_recvmsg
  pptp: fix stack info leak in pptp_getname()
  qmi_wwan: Add support for HP lt4112 LTE/HSPA+ Gobi 4G Modem
  ieee802154: fix error handling in ieee802154fake_probe()
  ipv4: Fix incorrect error code when adding an unreachable route
  inetdevice: fixed signed integer overflow
  sparc64: Fix constraints on swab helpers.
  uprobes, x86: Fix _TIF_UPROBE vs _TIF_NOTIFY_RESUME
  x86, mm: Set NX across entire PMD at boot
  x86: Require exact match for 'noxsave' command line option
  x86_64, traps: Rework bad_iret
  x86_64, traps: Stop using IST for #SS
  x86_64, traps: Fix the espfix64 #DF fixup and rewrite it in C
  MIPS: Loongson: Make platform serial setup always built-in.
  MIPS: oprofile: Fix backtrace on 64-bit kernel
  Linux 3.10.61
  mm: memcg: handle non-error OOM situations more gracefully
  mm: memcg: do not trap chargers with full callstack on OOM
  mm: memcg: rework and document OOM waiting and wakeup
  mm: memcg: enable memcg OOM killer only for user faults
  x86: finish user fault error path with fatal signal
  arch: mm: pass userspace fault flag to generic fault handler
  arch: mm: do not invoke OOM killer on kernel fault OOM
  arch: mm: remove obsolete init OOM protection
  mm: invoke oom-killer from remaining unconverted page fault handlers
  net: sctp: fix skb_over_panic when receiving malformed ASCONF chunks
  net: sctp: fix panic on duplicate ASCONF chunks
  net: sctp: fix remote memory pressure from excessive queueing
  KVM: x86: Don't report guest userspace emulation error to userspace
  SCSI: hpsa: fix a race in cmd_free/scsi_done
  net/mlx4_en: Fix BlueFlame race
  ARM: Correct BUG() assembly to ensure it is endian-agnostic
  perf/x86/intel: Use proper dTLB-load-misses event on IvyBridge
  mei: bus: fix possible boundaries violation
  perf: Handle compat ioctl
  MIPS: Fix forgotten preempt_enable() when CPU has inclusive pcaches
  dell-wmi: Fix access out of memory
  ARM: probes: fix instruction fetch order with <asm/opcodes.h>
  br: fix use of ->rx_handler_data in code executed on non-rx_handler path
  netfilter: nf_nat: fix oops on netns removal
  netfilter: xt_bpf: add mising opaque struct sk_filter definition
  netfilter: nf_log: release skbuff on nlmsg put failure
  netfilter: nfnetlink_log: fix maximum packet length logged to userspace
  netfilter: nf_log: account for size of NLMSG_DONE attribute
  ipc: always handle a new value of auto_msgmni
  clocksource: Remove "weak" from clocksource_default_clock() declaration
  kgdb: Remove "weak" from kgdb_arch_pc() declaration
  media: ttusb-dec: buffer overflow in ioctl
  NFSv4: Fix races between nfs_remove_bad_delegation() and delegation return
  nfs: Fix use of uninitialized variable in nfs_getattr()
  NFS: Don't try to reclaim delegation open state if recovery failed
  NFSv4: Ensure that we remove NFSv4.0 delegations when state has expired
  Input: alps - allow up to 2 invalid packets without resetting device
  Input: alps - ignore potential bare packets when device is out of sync
  dm raid: ensure superblock's size matches device's logical block size
  dm btree: fix a recursion depth bug in btree walking code
  block: Fix computation of merged request priority
  parisc: Use compat layer for msgctl, shmat, shmctl and semtimedop syscalls
  scsi: only re-lock door after EH on devices that were reset
  nfs: fix pnfs direct write memory leak
  firewire: cdev: prevent kernel stack leaking into ioctl arguments
  arm64: __clear_user: handle exceptions on strb
  ARM: 8198/1: make kuser helpers depend on MMU
  drm/radeon: add missing crtc unlock when setting up the MC
  mac80211: fix use-after-free in defragmentation
  macvtap: Fix csum_start when VLAN tags are present
  iwlwifi: configure the LTR
  libceph: do not crash on large auth tickets
  xtensa: re-wire umount syscall to sys_oldumount
  ALSA: usb-audio: Fix memory leak in FTU quirk
  ahci: disable MSI instead of NCQ on Samsung pci-e SSDs on macbooks
  ahci: Add Device IDs for Intel Sunrise Point PCH
  audit: keep inode pinned
  x86, x32, audit: Fix x32's AUDIT_ARCH wrt audit
  sparc32: Implement xchg and atomic_xchg using ATOMIC_HASH locks
  sparc64: Do irq_{enter,exit}() around generic_smp_call_function*().
  sparc64: Fix crashes in schizo_pcierr_intr_other().
  sunvdc: don't call VD_OP_GET_VTOC
  vio: fix reuse of vio_dring slot
  sunvdc: limit each sg segment to a page
  sunvdc: compute vdisk geometry from capacity
  sunvdc: add cdrom and v1.1 protocol support
  net: sctp: fix memory leak in auth key management
  net: sctp: fix NULL pointer dereference in af->from_addr_param on malformed packet
  gre6: Move the setting of dev->iflink into the ndo_init functions.
  ip6_tunnel: Use ip6_tnl_dev_init as the ndo_init function.
  Linux 3.10.60
  libceph: ceph-msgr workqueue needs a resque worker
  Btrfs: fix kfree on list_head in btrfs_lookup_csums_range error cleanup
  of: Fix overflow bug in string property parsing functions
  sysfs: driver core: Fix glue dir race condition by gdp_mutex
  i2c: at91: don't account as iowait
  acer-wmi: Add acpi_backlight=video quirk for the Acer KAV80
  rbd: Fix error recovery in rbd_obj_read_sync()
  drm/radeon: remove invalid pci id
  usb: gadget: udc: core: fix kernel oops with soft-connect
  usb: gadget: function: acm: make f_acm pass USB20CV Chapter9
  usb: dwc3: gadget: fix set_halt() bug with pending transfers
  crypto: algif - avoid excessive use of socket buffer in skcipher
  mm: Remove false WARN_ON from pagecache_isize_extended()
  x86, apic: Handle a bad TSC more gracefully
  posix-timers: Fix stack info leak in timer_create()
  mac80211: fix typo in starting baserate for rts_cts_rate_idx
  PM / Sleep: fix recovery during resuming from hibernation
  tty: Fix high cpu load if tty is unreleaseable
  quota: Properly return errors from dquot_writeback_dquots()
  ext3: Don't check quota format when there are no quota files
  nfsd4: fix crash on unknown operation number
  cpc925_edac: Report UE events properly
  e7xxx_edac: Report CE events properly
  i3200_edac: Report CE events properly
  i82860_edac: Report CE events properly
  scsi: Fix error handling in SCSI_IOCTL_SEND_COMMAND
  lib/bitmap.c: fix undefined shift in __bitmap_shift_{left|right}()
  cgroup/kmemleak: add kmemleak_free() for cgroup deallocations.
  usb: Do not allow usb_alloc_streams on unconfigured devices
  USB: opticon: fix non-atomic allocation in write path
  usb-storage: handle a skipped data phase
  spi: pxa2xx: toggle clocks on suspend if not disabled by runtime PM
  spi: pl022: Fix incorrect dma_unmap_sg
  usb: dwc3: gadget: Properly initialize LINK TRB
  wireless: rt2x00: add new rt2800usb device
  USB: option: add Haier CE81B CDMA modem
  usb: option: add support for Telit LE910
  USB: cdc-acm: only raise DTR on transitions from B0
  USB: cdc-acm: add device id for GW Instek AFG-2225
  usb: serial: ftdi_sio: add "bricked" FTDI device PID
  usb: serial: ftdi_sio: add Awinda Station and Dongle products
  USB: serial: cp210x: add Silicon Labs 358x VID and PID
  serial: Fix divide-by-zero fault in uart_get_divisor()
  staging:iio:ade7758: Remove "raw" from channel name
  staging:iio:ade7758: Fix check if channels are enabled in prenable
  staging:iio:ade7758: Fix NULL pointer deref when enabling buffer
  staging:iio:ad5933: Drop "raw" from channel names
  staging:iio:ad5933: Fix NULL pointer deref when enabling buffer
  OOM, PM: OOM killed task shouldn't escape PM suspend
  freezer: Do not freeze tasks killed by OOM killer
  ext4: fix oops when loading block bitmap failed
  cpufreq: intel_pstate: Fix setting max_perf_pct in performance policy
  ext4: fix overflow when updating superblock backups after resize
  ext4: check s_chksum_driver when looking for bg csum presence
  ext4: fix reservation overflow in ext4_da_write_begin
  ext4: add ext4_iget_normal() which is to be used for dir tree lookups
  ext4: grab missed write_count for EXT4_IOC_SWAP_BOOT
  ext4: don't check quota format when there are no quota files
  ext4: check EA value offset when loading
  jbd2: free bh when descriptor block checksum fails
  MIPS: tlbex: Properly fix HUGE TLB Refill exception handler
  target: Fix APTPL metadata handling for dynamic MappedLUNs
  target: Fix queue full status NULL pointer for SCF_TRANSPORT_TASK_SENSE
  qla_target: don't delete changed nacls
  ARC: Update order of registers in KGDB to match GDB 7.5
  ARC: [nsimosci] Allow "headless" models to boot
  KVM: x86: Emulator fixes for eip canonical checks on near branches
  KVM: x86: Fix wrong masking on relative jump/call
  kvm: x86: don't kill guest on unknown exit reason
  KVM: x86: Check non-canonical addresses upon WRMSR
  KVM: x86: Improve thread safety in pit
  KVM: x86: Prevent host from panicking on shared MSR writes.
  kvm: fix excessive pages un-pinning in kvm_iommu_map error path.
  media: tda7432: Fix setting TDA7432_MUTE bit for TDA7432_RF register
  media: ds3000: fix LNB supply voltage on Tevii S480 on initialization
  media: em28xx-v4l: give back all active video buffers to the vb2 core properly on streaming stop
  media: v4l2-common: fix overflow in v4l_bound_align_image()
  drm/nouveau/bios: memset dcb struct to zero before parsing
  drm/tilcdc: Fix the error path in tilcdc_load()
  drm/ast: Fix HW cursor image
  Input: i8042 - quirks for Fujitsu Lifebook A544 and Lifebook AH544
  Input: i8042 - add noloop quirk for Asus X750LN
  framebuffer: fix border color
  modules, lock around setting of MODULE_STATE_UNFORMED
  dm log userspace: fix memory leak in dm_ulog_tfr_init failure path
  block: fix alignment_offset math that assumes io_min is a power-of-2
  drbd: compute the end before rb_insert_augmented()
  dm bufio: update last_accessed when relinking a buffer
  virtio_pci: fix virtio spec compliance on restore
  selinux: fix inode security list corruption
  pstore: Fix duplicate {console,ftrace}-efi entries
  mfd: rtsx_pcr: Fix MSI enable error handling
  mnt: Prevent pivot_root from creating a loop in the mount tree
  UBI: add missing kmem_cache_free() in process_pool_aeb error path
  random: add and use memzero_explicit() for clearing data
  crypto: more robust crypto_memneq
  fix misuses of f_count() in ppp and netlink
  kill wbuf_queued/wbuf_dwork_lock
  ALSA: pcm: Zero-clear reserved fields of PCM status ioctl in compat mode
  evm: check xattr value length and type in evm_inode_setxattr()
  x86, pageattr: Prevent overflow in slow_virt_to_phys() for X86_PAE
  x86_64, entry: Fix out of bounds read on sysenter
  x86_64, entry: Filter RFLAGS.NT on entry from userspace
  x86, flags: Rename X86_EFLAGS_BIT1 to X86_EFLAGS_FIXED
  x86, fpu: shift drop_init_fpu() from save_xstate_sig() to handle_signal()
  x86, fpu: __restore_xstate_sig()->math_state_restore() needs preempt_disable()
  x86: Reject x32 executables if x32 ABI not supported
  vfs: fix data corruption when blocksize < pagesize for mmaped data
  UBIFS: fix free log space calculation
  UBIFS: fix a race condition
  UBIFS: remove mst_mutex
  fs: Fix theoretical division by 0 in super_cache_scan().
  fs: make cont_expand_zero interruptible
  mmc: rtsx_pci_sdmmc: fix incorrect last byte in R2 response
  libata-sff: Fix controllers with no ctl port
  pata_serverworks: disable 64-KB DMA transfers on Broadcom OSB4 IDE Controller
  Revert "percpu: free percpu allocation info for uniprocessor system"
  lockd: Try to reconnect if statd has moved
  drivers/net: macvtap and tun depend on INET
  ipv4: dst_entry leak in ip_send_unicast_reply()
  ax88179_178a: fix bonding failure
  ipv4: fix nexthop attlen check in fib_nh_match
  tracing/syscalls: Ignore numbers outside NR_syscalls' range
  Linux 3.10.59
  ecryptfs: avoid to access NULL pointer when write metadata in xattr
  ARM: at91/PMC: don't forget to write PMC_PCDR register to disable clocks
  ALSA: usb-audio: Add support for Steinberg UR22 USB interface
  ALSA: emu10k1: Fix deadlock in synth voice lookup
  ALSA: pcm: use the same dma mmap codepath both for arm and arm64
  arm64: compat: fix compat types affecting struct compat_elf_prpsinfo
  spi: dw-mid: terminate ongoing transfers at exit
  kernel: add support for gcc 5
  fanotify: enable close-on-exec on events' fd when requested in fanotify_init()
  mm: clear __GFP_FS when PF_MEMALLOC_NOIO is set
  Bluetooth: Fix issue with USB suspend in btusb driver
  Bluetooth: Fix HCI H5 corrupted ack value
  rt2800: correct BBP1_TX_POWER_CTRL mask
  PCI: Generate uppercase hex for modalias interface class
  PCI: Increase IBM ipr SAS Crocodile BARs to at least system page size
  iwlwifi: Add missing PCI IDs for the 7260 series
  NFSv4.1: Fix an NFSv4.1 state renewal regression
  NFSv4: fix open/lock state recovery error handling
  NFSv4: Fix lock recovery when CREATE_SESSION/SETCLIENTID_CONFIRM fails
  lzo: check for length overrun in variable length encoding.
  Revert "lzo: properly check for overruns"
  Documentation: lzo: document part of the encoding
  m68k: Disable/restore interrupts in hwreg_present()/hwreg_write()
  Drivers: hv: vmbus: Fix a bug in vmbus_open()
  Drivers: hv: vmbus: Cleanup vmbus_establish_gpadl()
  Drivers: hv: vmbus: Cleanup vmbus_teardown_gpadl()
  Drivers: hv: vmbus: Cleanup vmbus_post_msg()
  firmware_class: make sure fw requests contain a name
  qla2xxx: Use correct offset to req-q-out for reserve calculation
  mptfusion: enable no_write_same for vmware scsi disks
  be2iscsi: check ip buffer before copying
  regmap: fix NULL pointer dereference in _regmap_write/read
  regmap: debugfs: fix possbile NULL pointer dereference
  spi: dw-mid: check that DMA was inited before exit
  spi: dw-mid: respect 8 bit mode
  x86/intel/quark: Switch off CR4.PGE so TLB flush uses CR3 instead
  kvm: don't take vcpu mutex for obviously invalid vcpu ioctls
  KVM: s390: unintended fallthrough for external call
  kvm: x86: fix stale mmio cache bug
  fs: Add a missing permission check to do_umount
  Btrfs: fix race in WAIT_SYNC ioctl
  Btrfs: fix build_backref_tree issue with multiple shared blocks
  Btrfs: try not to ENOSPC on log replay
  Linux 3.10.58
  USB: cp210x: add support for Seluxit USB dongle
  USB: serial: cp210x: added Ketra N1 wireless interface support
  USB: Add device quirk for ASUS T100 Base Station keyboard
  ipv6: reallocate addrconf router for ipv6 address when lo device up
  tcp: fixing TLP's FIN recovery
  sctp: handle association restarts when the socket is closed.
  ip6_gre: fix flowi6_proto value in xmit path
  hyperv: Fix a bug in netvsc_start_xmit()
  tg3: Allow for recieve of full-size 8021AD frames
  tg3: Work around HW/FW limitations with vlan encapsulated frames
  l2tp: fix race while getting PMTU on PPP pseudo-wire
  openvswitch: fix panic with multiple vlan headers
  packet: handle too big packets for PACKET_V3
  tcp: fix tcp_release_cb() to dispatch via address family for mtu_reduced()
  sit: Fix ipip6_tunnel_lookup device matching criteria
  myri10ge: check for DMA mapping errors
  Linux 3.10.57
  cpufreq: ondemand: Change the calculation of target frequency
  cpufreq: Fix wrong time unit conversion
  nl80211: clear skb cb before passing to netlink
  drbd: fix regression 'out of mem, failed to invoke fence-peer helper'
  jiffies: Fix timeval conversion to jiffies
  md/raid5: disable 'DISCARD' by default due to safety concerns.
  media: vb2: fix VBI/poll regression
  mm: numa: Do not mark PTEs pte_numa when splitting huge pages
  mm, thp: move invariant bug check out of loop in __split_huge_page_map
  ring-buffer: Fix infinite spin in reading buffer
  init/Kconfig: Fix HAVE_FUTEX_CMPXCHG to not break up the EXPERT menu
  perf: fix perf bug in fork()
  udf: Avoid infinite loop when processing indirect ICBs
  Linux 3.10.56
  vm_is_stack: use for_each_thread() rather then buggy while_each_thread()
  oom_kill: add rcu_read_lock() into find_lock_task_mm()
  oom_kill: has_intersects_mems_allowed() needs rcu_read_lock()
  oom_kill: change oom_kill.c to use for_each_thread()
  introduce for_each_thread() to replace the buggy while_each_thread()
  kernel/fork.c:copy_process(): unify CLONE_THREAD-or-thread_group_leader code
  arm: multi_v7_defconfig: Enable Zynq UART driver
  ext2: Fix fs corruption in ext2_get_xip_mem()
  serial: 8250_dma: check the result of TX buffer mapping
  ARM: 7748/1: oabi: handle faults when loading swi instruction from userspace
  netfilter: nf_conntrack: avoid large timeout for mid-stream pickup
  PM / sleep: Use valid_state() for platform-dependent sleep states only
  PM / sleep: Add state field to pm_states[] entries
  ipvs: fix ipv6 hook registration for local replies
  ipvs: Maintain all DSCP and ECN bits for ipv6 tun forwarding
  ipvs: avoid netns exit crash on ip_vs_conn_drop_conntrack
  md/raid1: fix_read_error should act on all non-faulty devices.
  media: cx18: fix kernel oops with tda8290 tuner
  Fix nasty 32-bit overflow bug in buffer i/o code.
  perf kmem: Make it work again on non NUMA machines
  perf: Fix a race condition in perf_remove_from_context()
  alarmtimer: Lock k_itimer during timer callback
  alarmtimer: Do not signal SIGEV_NONE timers
  parisc: Only use -mfast-indirect-calls option for 32-bit kernel builds
  powerpc/perf: Fix ABIv2 kernel backtraces
  sched: Fix unreleased llc_shared_mask bit during CPU hotplug
  ocfs2/dlm: do not get resource spinlock if lockres is new
  nilfs2: fix data loss with mmap()
  fs/notify: don't show f_handle if exportfs_encode_inode_fh failed
  fsnotify/fdinfo: use named constants instead of hardcoded values
  kcmp: fix standard comparison bug
  Revert "mac80211: disable uAPSD if all ACs are under ACM"
  usb: dwc3: core: fix ordering for PHY suspend
  usb: dwc3: core: fix order of PM runtime calls
  usb: host: xhci: fix compliance mode workaround
  genhd: fix leftover might_sleep() in blk_free_devt()
  lockd: fix rpcbind crash on lockd startup failure
  rtlwifi: rtl8192cu: Add new ID
  percpu: perform tlb flush after pcpu_map_pages() failure
  percpu: fix pcpu_alloc_pages() failure path
  percpu: free percpu allocation info for uniprocessor system
  ata_piix: Add Device IDs for Intel 9 Series PCH
  Input: i8042 - add nomux quirk for Avatar AVIU-145A6
  Input: i8042 - add Fujitsu U574 to no_timeout dmi table
  Input: atkbd - do not try 'deactivate' keyboard on any LG laptops
  Input: elantech - fix detection of touchpad on ASUS s301l
  Input: synaptics - add support for ForcePads
  Input: serport - add compat handling for SPIOCSTYPE ioctl
  dm crypt: fix access beyond the end of allocated space
  block: Fix dev_t minor allocation lifetime
  workqueue: apply __WQ_ORDERED to create_singlethread_workqueue()
  Revert "iwlwifi: dvm: don't enable CTS to self"
  SCSI: libiscsi: fix potential buffer overrun in __iscsi_conn_send_pdu
  NFC: microread: Potential overflows in microread_target_discovered()
  iscsi-target: Fix memory corruption in iscsit_logout_post_handler_diffcid
  iscsi-target: avoid NULL pointer in iscsi_copy_param_list failure
  Target/iser: Don't put isert_conn inside disconnected handler
  Target/iser: Get isert_conn reference once got to connected_handler
  iio:inkern: fix overwritten -EPROBE_DEFER in of_iio_channel_get_by_name
  iio:magnetometer: bugfix magnetometers gain values
  iio: adc: ad_sigma_delta: Fix indio_dev->trig assignment
  iio: st_sensors: Fix indio_dev->trig assignment
  iio: meter: ade7758: Fix indio_dev->trig assignment
  iio: inv_mpu6050: Fix indio_dev->trig assignment
  iio: gyro: itg3200: Fix indio_dev->trig assignment
  iio:trigger: modify return value for iio_trigger_get
  CIFS: Fix SMB2 readdir error handling
  CIFS: Fix directory rename error
  ASoC: davinci-mcasp: Correct rx format unit configuration
  shmem: fix nlink for rename overwrite directory
  x86 early_ioremap: Increase FIX_BTMAPS_SLOTS to 8
  KVM: x86: handle idiv overflow at kvm_write_tsc
  regmap: Fix handling of volatile registers for format_write() chips
  ACPICA: Update to GPIO region handler interface.
  MIPS: mcount: Adjust stack pointer for static trace in MIPS32
  MIPS: ZBOOT: add missing <linux/string.h> include
  ARM: 8165/1: alignment: don't break misaligned NEON load/store
  ARM: 7897/1: kexec: Use the right ISA for relocate_new_kernel
  ARM: 8133/1: use irq_set_affinity with force=false when migrating irqs
  ARM: 8128/1: abort: don't clear the exclusive monitors
  NFSv4: Fix another bug in the close/open_downgrade code
  NFSv4: nfs4_state_manager() vs. nfs_server_remove_lists()
  usb:hub set hub->change_bits when over-current happens
  usb: dwc3: omap: fix ordering for runtime pm calls
  USB: EHCI: unlink QHs even after the controller has stopped
  USB: storage: Add quirks for Entrega/Xircom USB to SCSI converters
  USB: storage: Add quirk for Ariston Technologies iConnect USB to SCSI adapter
  USB: storage: Add quirk for Adaptec USBConnect 2000 USB-to-SCSI Adapter
  storage: Add single-LUN quirk for Jaz USB Adapter
  usb: hub: take hub->hdev reference when processing from eventlist
  xhci: fix oops when xhci resumes from hibernate with hw lpm capable devices
  xhci: Fix null pointer dereference if xhci initialization fails
  USB: zte_ev: fix removed PIDs
  USB: ftdi_sio: add support for NOVITUS Bono E thermal printer
  USB: sierra: add 1199:68AA device ID
  USB: sierra: avoid CDC class functions on "68A3" devices
  USB: zte_ev: remove duplicate Qualcom PID
  USB: zte_ev: remove duplicate Gobi PID
  Revert "USB: option,zte_ev: move most ZTE CDMA devices to zte_ev"
  USB: option: add VIA Telecom CDS7 chipset device id
  USB: option: reduce interrupt-urb logging verbosity
  USB: serial: fix potential heap buffer overflow
  USB: sisusb: add device id for Magic Control USB video
  USB: serial: fix potential stack buffer overflow
  USB: serial: pl2303: add device id for ztek device
  xtensa: fix a6 and a7 handling in fast_syscall_xtensa
  xtensa: fix TLBTEMP_BASE_2 region handling in fast_second_level_miss
  xtensa: fix access to THREAD_RA/THREAD_SP/THREAD_DS
  xtensa: fix address checks in dma_{alloc,free}_coherent
  xtensa: replace IOCTL code definitions with constants
  drm/radeon: add connector quirk for fujitsu board
  drm/vmwgfx: Fix a potential infinite spin waiting for fifo idle
  drm/ast: AST2000 cannot be detected correctly
  drm/i915: Wait for vblank before enabling the TV encoder
  drm/i915: Remove bogus __init annotation from DMI callbacks
  HID: logitech-dj: prevent false errors to be shown
  HID: magicmouse: sanity check report size in raw_event() callback
  HID: picolcd: sanity check report size in raw_event() callback
  cfq-iosched: Fix wrong children_weight calculation
  ALSA: pcm: fix fifo_size frame calculation
  ALSA: hda - Fix invalid pin powermap without jack detection
  ALSA: hda - Fix COEF setups for ALC1150 codec
  ALSA: core: fix buffer overflow in snd_info_get_line()
  arm64: ptrace: fix compat hardware watchpoint reporting
  trace: Fix epoll hang when we race with new entries
  i2c: at91: Fix a race condition during signal handling in at91_do_twi_xfer.
  i2c: at91: add bound checking on SMBus block length bytes
  arm64: flush TLS registers during exec
  ibmveth: Fix endian issues with rx_no_buffer statistic
  ahci: add pcid for Marvel 0x9182 controller
  ahci: Add Device IDs for Intel 9 Series PCH
  pata_scc: propagate return value of scc_wait_after_reset
  drm/i915: read HEAD register back in init_ring_common() to enforce ordering
  drm/radeon: load the lm63 driver for an lm64 thermal chip.
  drm/ttm: Choose a pool to shrink correctly in ttm_dma_pool_shrink_scan().
  drm/ttm: Fix possible division by 0 in ttm_dma_pool_shrink_scan().
  drm/tilcdc: fix double kfree
  drm/tilcdc: fix release order on exit
  drm/tilcdc: panel: fix leak when unloading the module
  drm/tilcdc: tfp410: fix dangling sysfs connector node
  drm/tilcdc: slave: fix dangling sysfs connector node
  drm/tilcdc: panel: fix dangling sysfs connector node
  carl9170: fix sending URBs with wrong type when using full-speed
  Linux 3.10.55
  libceph: gracefully handle large reply messages from the mon
  libceph: rename ceph_msg::front_max to front_alloc_len
  tpm: Provide a generic means to override the chip returned timeouts
  vfs: fix bad hashing of dentries
  dcache.c: get rid of pointless macros
  IB/srp: Fix deadlock between host removal and multipathd
  blkcg: don't call into policy draining if root_blkg is already gone
  mtd: nand: omap: Fix 1-bit Hamming code scheme, omap_calculate_ecc()
  mtd/ftl: fix the double free of the buffers allocated in build_maps()
  CIFS: Fix wrong restart readdir for SMB1
  CIFS: Fix wrong filename length for SMB2
  CIFS: Fix wrong directory attributes after rename
  CIFS: Possible null ptr deref in SMB2_tcon
  CIFS: Fix async reading on reconnects
  CIFS: Fix STATUS_CANNOT_DELETE error mapping for SMB2
  libceph: do not hard code max auth ticket len
  libceph: add process_one_ticket() helper
  libceph: set last_piece in ceph_msg_data_pages_cursor_init() correctly
  md/raid1,raid10: always abort recover on write error.
  xfs: don't zero partial page cache pages during O_DIRECT writes
  xfs: don't zero partial page cache pages during O_DIRECT writes
  xfs: don't dirty buffers beyond EOF
  xfs: quotacheck leaves dquot buffers without verifiers
  RDMA/iwcm: Use a default listen backlog if needed
  md/raid10: Fix memory leak when raid10 reshape completes.
  md/raid10: fix memory leak when reshaping a RAID10.
  md/raid6: avoid data corruption during recovery of double-degraded RAID6
  Bluetooth: Avoid use of session socket after the session gets freed
  Bluetooth: never linger on process exit
  mnt: Add tests for unprivileged remount cases that have found to be faulty
  mnt: Change the default remount atime from relatime to the existing value
  mnt: Correct permission checks in do_remount
  mnt: Move the test for MNT_LOCK_READONLY from change_mount_flags into do_remount
  mnt: Only change user settable mount flags in remount
  ring-buffer: Up rb_iter_peek() loop count to 3
  ring-buffer: Always reset iterator to reader page
  ACPI / cpuidle: fix deadlock between cpuidle_lock and cpu_hotplug.lock
  ACPI: Run fixed event device notifications in process context
  ACPICA: Utilities: Fix memory leak in acpi_ut_copy_iobject_to_iobject
  bfa: Fix undefined bit shift on big-endian architectures with 32-bit DMA address
  ASoC: pxa-ssp: drop SNDRV_PCM_FMTBIT_S24_LE
  ASoC: max98090: Fix missing free_irq
  ASoC: samsung: Correct I2S DAI suspend/resume ops
  ASoC: wm_adsp: Add missing MODULE_LICENSE
  ASoC: pcm: fix dpcm_path_put in dpcm runtime update
  openrisc: Rework signal handling
  MIPS: Fix accessing to per-cpu data when flushing the cache
  MIPS: OCTEON: make get_system_type() thread-safe
  MIPS: asm: thread_info: Add _TIF_SECCOMP flag
  MIPS: Cleanup flags in syscall flags handlers.
  MIPS: asm/reg.h: Make 32- and 64-bit definitions available at the same time
  MIPS: Remove BUG_ON(!is_fpu_owner()) in do_ade()
  MIPS: tlbex: Fix a missing statement for HUGETLB
  MIPS: Prevent user from setting FCSR cause bits
  MIPS: GIC: Prevent array overrun
  drivers: scsi: storvsc: Correctly handle TEST_UNIT_READY failure
  Drivers: scsi: storvsc: Implement a eh_timed_out handler
  powerpc/pseries: Failure on removing device node
  powerpc/mm: Use read barrier when creating real_pte
  powerpc/mm/numa: Fix break placement
  regulator: arizona-ldo1: remove bypass functionality
  mfd: omap-usb-host: Fix improper mask use.
  kernel/smp.c:on_each_cpu_cond(): fix warning in fallback path
  CAPABILITIES: remove undefined caps from all processes
  tpm: missing tpm_chip_put in tpm_get_random()
  firmware: Do not use WARN_ON(!spin_is_locked())
  spi: omap2-mcspi: Configure hardware when slave driver changes mode
  spi: orion: fix incorrect handling of cell-index DT property
  iommu/amd: Fix cleanup_domain for mass device removal
  media: media-device: Remove duplicated memset() in media_enum_entities()
  media: au0828: Only alt setting logic when needed
  media: xc4000: Fix get_frequency()
  media: xc5000: Fix get_frequency()
  Linux 3.10.54
  USB: fix build error with CONFIG_PM_RUNTIME disabled
  NFSv4: Fix problems with close in the presence of a delegation
  NFSv3: Fix another acl regression
  svcrdma: Select NFSv4.1 backchannel transport based on forward channel
  NFSD: Decrease nfsd_users in nfsd_startup_generic fail
  usb: hub: Prevent hub autosuspend if usbcore.autosuspend is -1
  USB: whiteheat: Added bounds checking for bulk command response
  USB: ftdi_sio: Added PID for new ekey device
  USB: ftdi_sio: add Basic Micro ATOM Nano USB2Serial PID
  ARM: OMAP2+: hwmod: Rearm wake-up interrupts for DT when MUSB is idled
  usb: xhci: amd chipset also needs short TX quirk
  xhci: Treat not finding the event_seg on COMP_STOP the same as COMP_STOP_INVAL
  Staging: speakup: Update __speakup_paste_selection() tty (ab)usage to match vt
  jbd2: fix infinite loop when recovering corrupt journal blocks
  mei: nfc: fix memory leak in error path
  mei: reset client state on queued connect request
  Btrfs: fix csum tree corruption, duplicate and outdated checksums
  hpsa: fix bad -ENOMEM return value in hpsa_big_passthru_ioctl
  x86/efi: Enforce CONFIG_RELOCATABLE for EFI boot stub
  x86_64/vsyscall: Fix warn_bad_vsyscall log output
  x86: don't exclude low BIOS area when allocating address space for non-PCI cards
  drm/radeon: add additional SI pci ids
  ext4: fix BUG_ON in mb_free_blocks()
  kvm: iommu: fix the third parameter of kvm_iommu_put_pages (CVE-2014-3601)
  Revert "KVM: x86: Increase the number of fixed MTRR regs to 10"
  KVM: nVMX: fix "acknowledge interrupt on exit" when APICv is in use
  KVM: x86: always exit on EOIs for interrupts listed in the IOAPIC redir table
  KVM: x86: Inter-privilege level ret emulation is not implemeneted
  crypto: ux500 - make interrupt mode plausible
  serial: core: Preserve termios c_cflag for console resume
  ext4: fix ext4_discard_allocated_blocks() if we can't allocate the pa struct
  drivers/i2c/busses: use correct type for dma_map/unmap
  hwmon: (dme1737) Prevent overflow problem when writing large limits
  hwmon: (ads1015) Fix out-of-bounds array access
  hwmon: (lm85) Fix various errors on attribute writes
  hwmon: (ads1015) Fix off-by-one for valid channel index checking
  hwmon: (gpio-fan) Prevent overflow problem when writing large limits
  hwmon: (lm78) Fix overflow problems seen when writing large temperature limits
  hwmon: (sis5595) Prevent overflow problem when writing large limits
  drm: omapdrm: fix compiler errors
  ARM: OMAP3: Fix choice of omap3_restore_es function in OMAP34XX rev3.1.2 case.
  mei: start disconnect request timer consistently
  ALSA: hda/realtek - Avoid setting wrong COEF on ALC269 & co
  ALSA: hda/ca0132 - Don't try loading firmware at resume when already failed
  ALSA: virtuoso: add Xonar Essence STX II support
  ALSA: hda - fix an external mic jack problem on a HP machine
  USB: Fix persist resume of some SS USB devices
  USB: ehci-pci: USB host controller support for Intel Quark X1000
  USB: serial: ftdi_sio: Add support for new Xsens devices
  USB: serial: ftdi_sio: Annotate the current Xsens PID assignments
  USB: OHCI: don't lose track of EDs when a controller dies
  isofs: Fix unbounded recursion when processing relocated directories
  HID: fix a couple of off-by-ones
  HID: logitech: perform bounds checking on device_id early enough
  stable_kernel_rules: Add pointer to netdev-FAQ for network patches
  Linux 3.10.53
  arch/sparc/math-emu/math_32.c: drop stray break operator
  sparc64: ldc_connect() should not return EINVAL when handshake is in progress.
  sunsab: Fix detection of BREAK on sunsab serial console
  bbc-i2c: Fix BBC I2C envctrl on SunBlade 2000
  sparc64: Guard against flushing openfirmware mappings.
  sparc64: Do not insert non-valid PTEs into the TSB hash table.
  sparc64: Add membar to Niagara2 memcpy code.
  sparc64: Fix huge TSB mapping on pre-UltraSPARC-III cpus.
  sparc64: Don't bark so loudly about 32-bit tasks generating 64-bit fault addresses.
  sparc64: Fix top-level fault handling bugs.
  sparc64: Handle 32-bit tasks properly in compute_effective_address().
  sparc64: Make itc_sync_lock raw
  sparc64: Fix argument sign extension for compat_sys_futex().
  sctp: fix possible seqlock seadlock in sctp_packet_transmit()
  iovec: make sure the caller actually wants anything in memcpy_fromiovecend
  net: Correctly set segment mac_len in skb_segment().
  macvlan: Initialize vlan_features to turn on offload support.
  net: sctp: inherit auth_capable on INIT collisions
  tcp: Fix integer-overflow in TCP vegas
  tcp: Fix integer-overflows in TCP veno
  net: sendmsg: fix NULL pointer dereference
  ip: make IP identifiers less predictable
  inetpeer: get rid of ip_id_count
  bnx2x: fix crash during TSO tunneling
  Linux 3.10.52
  x86/espfix/xen: Fix allocation of pages for paravirt page tables
  lib/btree.c: fix leak of whole btree nodes
  net/l2tp: don't fall back on UDP [get|set]sockopt
  net: mvneta: replace Tx timer with a real interrupt
  net: mvneta: add missing bit descriptions for interrupt masks and causes
  net: mvneta: do not schedule in mvneta_tx_timeout
  net: mvneta: use per_cpu stats to fix an SMP lock up
  net: mvneta: increase the 64-bit rx/tx stats out of the hot path
  Revert "mac80211: move "bufferable MMPDU" check to fix AP mode scan"
  staging: vt6655: Fix Warning on boot handle_irq_event_percpu.
  x86_64/entry/xen: Do not invoke espfix64 on Xen
  x86, espfix: Make it possible to disable 16-bit support
  x86, espfix: Make espfix64 a Kconfig option, fix UML
  x86, espfix: Fix broken header guard
  x86, espfix: Move espfix definitions into a separate header file
  x86-64, espfix: Don't leak bits 31:16 of %esp returning to 16-bit stack
  Revert "x86-64, modify_ldt: Make support for 16-bit segments a runtime option"
  timer: Fix lock inversion between hrtimer_bases.lock and scheduler locks
  printk: rename printk_sched to printk_deferred
  iio: buffer: Fix demux table creation
  staging: vt6655: Fix disassociated messages every 10 seconds
  mm, thp: do not allow thp faults to avoid cpuset restrictions
  scsi: handle flush errors properly
  rapidio/tsi721_dma: fix failure to obtain transaction descriptor
  cfg80211: fix mic_failure tracing
  ARM: 8115/1: LPAE: reduce damage caused by idmap to virtual memory layout
  crypto: af_alg - properly label AF_ALG socket
  Linux 3.10.51
  core, nfqueue, openvswitch: Orphan frags in skb_zerocopy and handle errors
  x86/efi: Include a .bss section within the PE/COFF headers
  s390/ptrace: fix PSW mask check
  Fix gcc-4.9.0 miscompilation of load_balance() in scheduler
  mm: hugetlb: fix copy_hugetlb_page_range()
  x86_32, entry: Store badsys error code in %eax
  hwmon: (smsc47m192) Fix temperature limit and vrm write operations
  parisc: Remove SA_RESTORER define
  coredump: fix the setting of PF_DUMPCORE
  Input: fix defuzzing logic
  slab_common: fix the check for duplicate slab names
  slab_common: Do not check for duplicate slab names
  tracing: Fix wraparound problems in "uptime" trace clock
  blkcg: don't call into policy draining if root_blkg is already gone
  ahci: add support for the Promise FastTrak TX8660 SATA HBA (ahci mode)
  libata: introduce ata_host->n_tags to avoid oops on SAS controllers
  libata: support the ata host which implements a queue depth less than 32
  block: don't assume last put of shared tags is for the host
  block: provide compat ioctl for BLKZEROOUT
  media: tda10071: force modulation to QPSK on DVB-S
  media: hdpvr: fix two audio bugs
  Linux 3.10.50
  ARC: Implement ptrace(PTRACE_GET_THREAD_AREA)
  sched: Fix possible divide by zero in avg_atom() calculation
  locking/mutex: Disable optimistic spinning on some architectures
  PM / sleep: Fix request_firmware() error at resume
  dm cache metadata: do not allow the data block size to change
  dm thin metadata: do not allow the data block size to change
  alarmtimer: Fix bug where relative alarm timers were treated as absolute
  drm/radeon: avoid leaking edid data
  drm/qxl: return IRQ_NONE if it was not our irq
  drm/radeon: set default bl level to something reasonable
  irqchip: gic: Fix core ID calculation when topology is read from DT
  irqchip: gic: Add support for cortex a7 compatible string
  ring-buffer: Fix polling on trace_pipe
  mwifiex: fix Tx timeout issue
  perf/x86/intel: ignore CondChgd bit to avoid false NMI handling
  ipv4: fix buffer overflow in ip_options_compile()
  dns_resolver: Null-terminate the right string
  dns_resolver: assure that dns_query() result is null-terminated
  sunvnet: clean up objects created in vnet_new() on vnet_exit()
  net: pppoe: use correct channel MTU when using Multilink PPP
  net: sctp: fix information leaks in ulpevent layer
  tipc: clear 'next'-pointer of message fragments before reassembly
  be2net: set EQ DB clear-intr bit in be_open()
  netlink: Fix handling of error from netlink_dump().
  net: mvneta: Fix big endian issue in mvneta_txq_desc_csum()
  net: mvneta: fix operation in 10 Mbit/s mode
  appletalk: Fix socket referencing in skb
  tcp: fix false undo corner cases
  igmp: fix the problem when mc leave group
  net: qmi_wwan: add two Sierra Wireless/Netgear devices
  net: qmi_wwan: Add ID for Telewell TW-LTE 4G v2
  ipv4: icmp: Fix pMTU handling for rare case
  tcp: Fix divide by zero when pushing during tcp-repair
  bnx2x: fix possible panic under memory stress
  net: fix sparse warning in sk_dst_set()
  ipv4: irq safe sk_dst_[re]set() and ipv4_sk_update_pmtu() fix
  ipv4: fix dst race in sk_dst_get()
  8021q: fix a potential memory leak
  net: sctp: check proc_dointvec result in proc_sctp_do_auth
  tcp: fix tcp_match_skb_to_sack() for unaligned SACK at end of an skb
  ip_tunnel: fix ip_tunnel_lookup
  shmem: fix splicing from a hole while it's punched
  shmem: fix faulting into a hole, not taking i_mutex
  shmem: fix faulting into a hole while it's punched
  iwlwifi: dvm: don't enable CTS to self
  igb: do a reset on SR-IOV re-init if device is down
  hwmon: (adt7470) Fix writes to temperature limit registers
  hwmon: (da9052) Don't use dash in the name attribute
  hwmon: (da9055) Don't use dash in the name attribute
  tracing: Add ftrace_trace_stack into __trace_puts/__trace_bputs
  tracing: Fix graph tracer with stack tracer on other archs
  fuse: handle large user and group ID
  Bluetooth: Ignore H5 non-link packets in non-active state
  Drivers: hv: util: Fix a bug in the KVP code
  media: gspca_pac7302: Add new usb-id for Genius i-Look 317
  usb: Check if port status is equal to RxDetect

Signed-off-by: Ian Maund <imaund@codeaurora.org>
2015-04-24 18:04:40 -07:00
Linux Build Service Account 3e84e78272 Merge "sched: Use only partial wait time as task demand" 2015-04-10 04:50:40 -07:00
Joonwoo Park 895ea7f4c9 sched: fix race conditions where HMP tunables change
When multiple threads race to update HMP scheduler tunables, at present,
the tunables which require big/small task count fix-up can be updated
without fix-up and it can trigger BUG_ON().
This happens because sched_hmp_proc_update_handler() acquires rq locks and
does fix-up only when number of big/small tasks affecting tunables are
updated even though the function sched_hmp_proc_update_handler() calls
set_hmp_defaults() which re-calculates all sysctl input data at that
point.  Consequently a thread that is trying to update a tunable which does
not affect big/small task count can call set_hmp_defaults() and update
big/small task count affecting tunable without fix-up if there is another
thread and it just set fix-up needed sysctl value.

Example of problem scenario :
thread 0                               thread 1
Set sched_small_task – needs fix up.
                                       Set sched_init_task_load – no fix
                                       up needed.
proc_dointvec_minmax() completed
which means sysctl_sched_small_task has
new value.
                                       Call set_hmp_defaults() without
                                       lock/fixup. set_hmp_defaults() still
                                       updates sched_small_tasks with new
                                       sysctl_sched_small_task value by
                                       thread 0.

Fix such issue by embracing proc update handler with already existing
policy mutex.

CRs-fixed: 812443
Change-Id: I7aa4c0efc1ca56e28dc0513480aca3264786d4f7
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2015-03-31 21:16:49 -07:00
Joonwoo Park 4251f58faa sched: check HMP scheduler tunables validity
Check tunables validity to take valid values only.

CRs-fixed: 812443
Change-Id: Ibb9ec0d6946247068174ab7abe775a6389412d5b
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2015-03-31 20:52:13 -07:00
Olav Haugan 3f89b9e684 sched/fair: Add irq load awareness to the tick CPU selection logic
IRQ load is not taken into account when determining whether a task
should be migrated to a different CPU.  A task that runs for a long time
could get stuck on CPU with high IRQ load causing degraded performance.

Add irq load awareness to the tick CPU selection logic.

CRs-fixed: 809119
Change-Id: I7969f7dd947fb5d66fce0bedbc212bfb2d42c8c1
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2015-03-27 08:38:40 -07:00
Linux Build Service Account 0925dc4962 Merge "Merge tmp-61c3cde into msm-3.10" 2015-03-21 21:52:56 -07:00
Linux Build Service Account 70f6e3a95d Merge "sched: Update max_capacity when an entire cluster is hotplugged" 2015-03-20 08:18:58 -07:00
Rom Lemarchand a2ddc4658e cgroup: refactor allow_attach function into common code
move cpu_cgroup_allow_attach to a common subsys_cgroup_allow_attach.
This allows any process with CAP_SYS_NICE to move tasks across cgroups if
they use this function as their allow_attach handler.

Bug: 18260435
Change-Id: I6bb4933d07e889d0dc39e33b4e71320c34a2c90f
Signed-off-by: Rom Lemarchand <romlem@android.com>
Git-commit: 57114e95e8c4f5035c993fc74bbe94cd9573f1bb
Git-repo: https://android.googlesource.com/kernel/common.git
Signed-off-by: Ian Maund <imaund@codeaurora.org>
2015-03-19 14:59:17 -07:00
Syed Rameez Mustafa dd51fd754d sched: Update max_capacity when an entire cluster is hotplugged
When an entire cluster is hotplugged, the scheduler's notion of
max_capacity can get outdated. This introduces the following
inefficiencies in behavior:

* task_will_fit() does not return true on all tasks. Consequently
  all big tasks go through fallback CPU selection logic skipping
  C-state and power checks in select_best_cpu().

* During boost, migration_needed() return true unnecessarily
  causing an avoidable rerun of select_best_cpu().

* An unnecessary kick is sent to all little CPUs when boost is set.

* An opportunity for early bailout from nohz_kick_needed() is lost.

Start handling CPUFREQ_REMOVE_POLICY in the policy notifier callback
which indicates the last CPU in a cluster being hotplugged out. Also
modify update_min_max_capacity() to only iterate through online CPUs
instead of possible CPUs. While we can't guarantee the integrity of
the cpu_online_mask in the notifier callback, the scheduler will fix
up all state soon after any changes to the online mask.

The change does have one side effect; early termination from the
notifier callback when min_max_freq or max_possible_freq remain
unchanged is no longer possible. This is because when the last CPU
in a cluster is hot removed, only max_capacity is updated without
affecting min_max_freq or max_possible_freq. Therefore, when the
first CPU in the same cluster gets hot added at a later point
max_capacity must once again be recomputed despite there being no
change in min_max_freq or max_possible_freq.

Change-Id: I9a1256b5c2cd6fcddd85b069faf5e2ace177e122
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2015-03-18 00:57:29 -07:00
Syed Rameez Mustafa 3282e44c1b sched: Ensure attempting load balance when HMP active balance flags are set
find_busiest_group() can end up returning a NULL group due to load based
checks even though there are tasks that can be migrated to higher capacity
CPUs (LBF_BIG_TASK_ACTIVE_BALANCE) or EA core rotation is possible
(LBF_EA_ACTIVE_BALANCE). To get best power and performance ensure that load
balance does attempt to pull tasks when HMP_ACTIVE_BALANCE flag is set.
Since sched boost also falls under the same category club it into the same
generic condition.

Change-Id: I3db7ec200d2a038917b1f2341602eb87b5aed289
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2015-03-16 16:02:29 -07:00
Syed Rameez Mustafa 14fd2e5918 sched: Use only partial wait time as task demand
The scheduler currently either considers a tasks entire wait time as
task demand or completely ignores wait time based on the tunable
sched_account_wait_time. Both approaches have their limitations,
however. The former artificially boosts tasks demand when it may not
actually be justified. With the latter, the scheduler runs the risk
of never being able to recognize true load (consider two CPU hogs on
a single little CPU). To achieve a compromise between these two
extremes, change the load tracking algorithm to only consider part of
a tasks wait time as its demand. The portion of wait time accounted
as demand is determined by each tasks percent load, i.e. a task that
waits for 10ms and has 60 % task load, only 6 ms of the wait will
contribute to task demand. This approach is more fair as the scheduler
now tries to determine how much of its wait time would a task actually
have been using the CPU if it had been executing. It ensures that tasks
with high demand continue to see most of the benefits of accounting
wait time as busy time, however, lower demand tasks don't experience a
disproportionately high boost to demand triggering unjustified big CPU
usage. Note that this new approach is only applicable to wait time
being considered as task demand and not wait time considered as CPU
busy time.

To achieve the above effect, ensure that anytime a task is waiting, its
runtime in every relevant window segment is appropriately adjusted using
its pct load.

Change-Id: I6a698d6cb1adeca49113c3499029b422daf7871f
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2015-03-16 13:59:42 -07:00
Linux Build Service Account 98290c7cdb Merge "sched: Update cur_freq in the cpufreq policy notifier callback" 2015-03-10 11:18:49 -07:00
Linux Build Service Account 4200d1a74c Merge "sched: avoid CPUs with high irq activity for non-small tasks" 2015-03-09 22:54:06 -07:00
Linux Build Service Account e3adab062f Merge "sched: add scheduling latency tracking procfs node" 2015-03-09 22:53:49 -07:00
Linux Build Service Account aee63baa3f Merge "sched: warn/panic upon excessive scheduling latency" 2015-03-09 22:53:48 -07:00
Linux Build Service Account a2520f7ee0 Merge "sched: fix incorrect wait time and wait count statistics" 2015-03-09 22:53:48 -07:00
Syed Rameez Mustafa 785d88d930 sched: Update cur_freq in the cpufreq policy notifier callback
At boot, the cpufreq framework sends transition notifiers before
sending out the policy notifier. Since the scheduler relies on the
policy notifier to build up the frequency domain masks, when the
initial set of transition notifiers are sent, the scheduler has no
frequency domains. As a result the scheduler fails to update the
cur_freq information. Update cur_freq as part of the policy notifier
so that the scheduler always has the current frequency information.

Change-Id: I7bd2958dfeb064dd20b9ccebafd372436484e5d6
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2015-03-05 16:31:42 -08:00
Joonwoo Park 32851d8550 sched: add scheduling latency tracking procfs node
Add a new procfs node /proc/sys/kernel/sched_max_latency_us to track the
worst scheduling latency.  It provides easier way to identify maximum
scheduling latency seen across the CPUs.

Change-Id: I6e435bbf825c0a4dff2eded4a1256fb93f108d0e
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2015-03-03 09:56:02 -08:00
Joonwoo Park 4653e0549d sched: warn/panic upon excessive scheduling latency
Add new tunables /proc/sys/kernel/sched_latency_warn_threshold_us and
/proc/sys/kernel/sched_latency_panic_threshold_us to warn or panic for the
cases that tasks are runnable but not scheduled more than configured time.

This helps to find out unacceptably high scheduling latency more easily.

Change-Id: If077aba6211062cf26ee289970c5abcd1c218c82
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2015-03-03 09:55:42 -08:00
Linux Build Service Account d19673d015 Merge "sched: actively migrate big tasks on power CPU to idle performance CPU" 2015-03-03 08:50:53 -08:00
Linux Build Service Account 540fdeda13 Merge "sched: Add cgroup-based criteria for upmigration" 2015-03-03 04:51:49 -08:00
Joonwoo Park 153e6c6638 sched: actively migrate big tasks on power CPU to idle performance CPU
When performance CPU runs idle or newly idle load balancer to pull a
task on power efficient CPU, the load balancer always fails and enters
idle mode if the big task on the power efficient CPU is running.  This is
suboptimal when the running task on the power efficient CPU doesn't fit
on the power efficient CPU as it's quite possible that the big task will
sustain on the power efficient CPU until it's preempted while there is
a performance CPU sitting idle.

Revise load balancer algorithm to actively migrate big tasks on power
efficient CPU to performance CPU when performance CPU runs idle or newly
idle load balancer.

Change-Id: Iaf05e0236955fdcc7ded0ff09af0880050a2be32
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2015-03-02 11:55:26 -08:00
Linux Build Service Account 1e77337521 Merge "sched: avoid running idle_balance() on behalf of wrong CPU" 2015-03-01 04:49:20 -08:00
Linux Build Service Account 8550cff9b4 Merge "sched: Avoid pulling all tasks from a CPU during load balance" 2015-02-28 13:06:53 -08:00
Linux Build Service Account 8d53b2b680 Merge "sched: Avoid pulling big tasks to the little cluster during load balance" 2015-02-28 13:06:52 -08:00
Joonwoo Park 96afb6ed24 sched: fix incorrect wait time and wait count statistics
Scheduler at present resets task's wait start timestamp when task migrates
to another rq.  This misleads scheduler itself into reporting less wait
time than actual by omitting time spent for waiting prior to migration and
also more wait count than actual by counting migration as wait end event
which can be seen by trace or /proc/<pid>/sched with CONFIG_SCHEDSTATS=y.

Carry forward migrating task's wait time prior to migration and don't count
migration as a wait-end event to fix such statistics error.

Change-Id: I0f6badf8072fc37826e4476ac2d1195e82b65bf1
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2015-02-26 12:55:47 -08:00
Joonwoo Park bd4f15fe80 sched: avoid running idle_balance() on behalf of wrong CPU
With EA (Energy Awareness), idle_balance() on a CPU runs on behalf of most
power efficient idle CPU among the CPUs in its sched domain level under the
condition that the substitute idle CPU should be limited to a CPU which has
the same capacity with original idle CPU.
It is found that at present idle_balance() spans all the CPUs in its sched
domain and run idle balancer on behalf of any CPU within the domain which
could be all the CPUs in the system which consequently makes idle balancer
on a performance CPU always runs on behalf of a power efficient idle CPU.
This would cause for idle performance CPUs to fail to pull tasks from power
efficient CPUs always when there is only an online performance CPU.

Limit search CPUs to cache sharing CPUs with original idle CPU to ensure to
run idle balancre on behalf of more power efficient CPU but still has the
same capacity with original CPU to fix such issue.

Change-Id: I0575290c24f28db011d9353915186e64df7e57fe
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2015-02-24 16:13:25 -08:00
Linux Build Service Account 6acf5f79e8 Merge "sched/fair: Respect wake to idle over sync wakeup" 2015-02-24 14:24:42 -08:00
Linux Build Service Account b0209222be Merge "sched: Keep track of average nr_big_tasks" 2015-02-23 19:07:06 -08:00
Linux Build Service Account 998c4e88f7 Merge "sched: Fix bug in average nr_running and nr_iowait calculation" 2015-02-23 19:07:06 -08:00
Srivatsa Vaddagiri a28fea62eb sched: Keep track of average nr_big_tasks
Extend sched_get_nr_running_avg() API to return average nr_big_tasks,
in addition to average nr_running and average nr_io_wait tasks. Also
add a new trace point to record values returned by
sched_get_nr_running_avg() API.

Change-Id: Id3591e6d04da8db484b4d1cb9d95dba075f5ab9a
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2015-02-20 10:25:00 +05:30
Srivatsa Vaddagiri 019b9743a3 sched: Fix bug in average nr_running and nr_iowait calculation
sched_get_nr_running_avg() returns average nr_running and nr_iowait
task count since it was last invoked. Fix several bugs in their
calculation.

* sched_update_nr_prod() needs to consider that nr_running count can
  change by more than 1 when CFS_BANDWIDTH feature is used

* sched_get_nr_running_avg() needs to sum up nr_iowait count across
  all cpus, rather than just one

* sched_get_nr_running_avg() could race with sched_update_nr_prod(),
  as a result of which it could use curr_time which is behind a cpu's
  'last_time' value. That would lead to erroneous calculation of
  average nr_running or nr_iowait.

While at it, fix also a bug in BUG_ON() check in
sched_update_nr_prod() function and remove unnecessary nr_running
argument to sched_update_nr_prod() function.

Change-Id: I46737614737292fae0d7204c4648fb9b862f65b2
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2015-02-20 10:24:55 +05:30
Syed Rameez Mustafa bb57c4437d sched: Avoid pulling all tasks from a CPU during load balance
When running load balance, the destination CPU checks the number
of running tasks on the busiest CPU without holding the busiest
CPUs runqueue lock. This opens the load balancer to a race whereby
a third CPU running load balance at the same time; having found the
same busiest group and queue, may have already pulled one of the
waiting tasks from the busiest CPU. Under scenarios where the source
CPU is running the idle task and only a single task remains waiting on
the busiest runqueue (nr_running = 1), the destination CPU will end
up pulling the only enqueued task from that CPU, leaving the destination
CPU with nothing left to run. Fix this race, by reconfirming nr_running
for the busiest CPU, after its runqueue lock has been obtained.

Change-Id: I42e132b15f96d9d5d7b32ef4de3fb92d2f837e63
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2015-02-18 10:37:32 -08:00
Srivatsa Vaddagiri 995fad6d1a sched: Add cgroup-based criteria for upmigration
It may be desirable to discourage upmigration of tasks belonging to
some cgroups. Add a per-cgroup flag (upmigrate_discourage) that
discourages upmigration of tasks of a cgroup. Tasks of the cgroup are
allowed to upmigrate only under overcommitted scenario.

Change-Id: I1780e420af1b6865c5332fb55ee1ee408b74d8ce
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2015-02-18 12:29:28 +05:30
Syed Rameez Mustafa c29058b29a sched: Avoid pulling big tasks to the little cluster during load balance
When a lower capacity CPU attempts to pull work from a higher capacity CPU,
during load balance, it does not distinguish between tasks that will fit
or not fit on the destination CPU. This causes suboptimal load balancing
decisions whereby big tasks end up on the lower capacity CPUs and little
tasks remain on higher capacity CPUs. Avoid this behavior, by first
restricting search to only include tasks that fit on the destination CPU.
If such a task cannot be found, remove this restriction so that any task
can be pulled over to the destination CPU. This behavior is not applicable
during sched_boost, however, as none of the tasks will fit on a lower
capacity CPU.

Change-Id: I1093420a629a0886fc3375849372ab7cf42e928e
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2015-02-17 12:08:27 -08:00
Joonwoo Park 5c8e7ecfc2 sched: avoid CPUs with high irq activity for non-small tasks
The irq-aware scheduler is to achieve better performance by avoiding task
placement to the CPUs which have high irq activity.  However current
scheduler places tasks to the CPUs which are loaded by irq activity
preferably as opposed to what it is meant to be when the task is non-small.
This is suboptimal for both power and performance.
Fix task placement algorithm to avoid CPUs with significant irq activities.

Change-Id: Ifa5a6ac186241bd58fa614e93e3d873a5f5ad4ca
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2015-02-13 18:50:44 -08:00
Joonwoo Park 1df730374f sched: fix rounding error on scaled execution time calculation
It's found that the scaled execution time can be less than its actual time
due to rounding errors.  The HMP scheduler accumulates scaled execution
time of tasks to determine if tasks are in need of up-migration.  But the
rounding error prevents the HMP scheduler from accumulating 100% load which
prevents us from ever reaching an up-migrate of 100%.
Fix rounding error by rounding quotient up.

CRs-fixed: 759041
Change-Id: Ie4d9693593cc3053a292a29078aa56e6de8a2d52
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2015-02-13 18:34:34 -08:00
Olav Haugan 35039673ea sched/fair: Respect wake to idle over sync wakeup
Sync wakeup currently takes precedence over wake to idle flag. A sync
wakeup causes a task to be placed on a non-idle CPU because we expect
this CPU to become idle very shortly. However, even though the sync flag
is set there is no guarantee that the task will go to sleep right away
As a consequence performance suffers.

Fix this by preferring an idle CPU over a potential busy cpu when both
wake to idle and sync wakeup are set.

Change-Id: I6b40a44e2b4d5b5fa6088e4f16428f9867bd928d
CRs-fixed: 794424
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2015-02-11 12:02:12 -08:00
Srivatsa Vaddagiri 2385d33016 sched: Support CFS_BANDWIDTH feature in HMP scheduler
CFS_BANDWIDTH feature is not currently well-supported by HMP
scheduler. Issues encountered include a kernel panic when
rq->nr_big_tasks count becomes negative. This patch fixes HMP
scheduler code to better handle CFS_BANDWIDTH feature. The most
prominent change introduced is maintenance of HMP stats (nr_big_tasks,
nr_small_tasks, cumulative_runnable_avg) per 'struct cfs_rq' in
addition to being maintained in each 'struct rq'. This allows HMP
stats to be updated easily when a group is throttled on a cpu.

Change-Id: Iad9f378b79ab5d9d76f86d1775913cc1941e266a
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2015-01-28 14:13:19 +05:30
Srivatsa Vaddagiri bbef4c5e1b sched: Consolidate hmp stats into their own struct
Key hmp stats (nr_big_tasks, nr_small_tasks and
cumulative_runnable_average) are currently maintained per-cpu in
'struct rq'. Merge those stats in their own structure (struct
hmp_sched_stats) and modify impacted functions to deal with the newly
introduced structure. This cleanup is required for a subsequent patch
which fixes various issues with use of CFS_BANDWIDTH feature in HMP
scheduler.

Change-Id: Ieffc10a3b82a102f561331bc385d042c15a33998
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2015-01-28 14:13:14 +05:30
Srivatsa Vaddagiri 7e767d3e45 sched: Add userspace interface to set PF_WAKE_UP_IDLE
sched_prefer_idle flag controls whether tasks can be woken to any
available idle cpu. It may be desirable to set sched_prefer_idle to 0
so that most tasks wake up to non-idle cpus under mostly_idle
threshold and have specialized tasks override this behavior through
other means. PF_WAKE_UP_IDLE flag per task provides exactly that. It
lets tasks with PF_WAKE_UP_IDLE flag set be woken up to any available
idle cpu independent of sched_prefer_idle flag setting. Currently
only kernel-space API exists to set PF_WAKE_UP_IDLE flag for a task.
This patch adds a user-space API (in /proc filesystem) to set
PF_WAKE_UP_IDLE flag for a given task. /proc/[pid]/sched_wake_up_idle
file can be written to set or clear PF_WAKE_UP_IDLE flag for a given
task.

Change-Id: I13a37e740195e503f457ebe291d54e83b230fbeb
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2015-01-27 19:39:57 +05:30
Linux Build Service Account 4c5b2873a2 Merge "sched: add sched feature FORCE_CPU_THROTTLING_IMMINENT" 2015-01-14 17:12:07 -08:00
Linux Build Service Account a2ada2d4c8 Merge "sched: continue to search less power efficient cpu for load balancer" 2015-01-14 17:12:06 -08:00
Joonwoo Park 66bd788705 sched: add sched feature FORCE_CPU_THROTTLING_IMMINENT
Add a new sched feature FORCE_CPU_THROTTLING_IMMINENT to perform
migration due to EA without checking frequency throttling.  This option
can give us better debugging and verification capability.

Change-Id: Iba445961a7f9812528b4e3aa9c6ddf47a3aad583
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2015-01-08 11:22:33 -08:00
Joonwoo Park 2c7cc326ed sched: continue to search less power efficient cpu for load balancer
When choosing a CPU to do power-aware active balance from the load
balancer currently selects the first eligible CPU it finds, even if
there is another eligible CPU which is higher-power. This can lead to
suboptimal load balancing behavior and extra migrations. Power and
performance will be impacted.

Achieve better power and performance by continuing to search the least
power efficient cpu as long as the cpu's load average is higher than or
equal to the busiest cpu found by far.

CRs-fixed: 777341
Change-Id: I14eb21ab725bf7dab88b2e1e169aced6f2d712ca
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2015-01-08 11:22:33 -08:00
Syed Rameez Mustafa 13e853e988 sched: Update cur_freq for offline CPUs in notifier callback
cpufreq governor does not send frequency change notifications for
offline CPUs. This means that a hot removed CPU's cur_freq information
can get stale if there is a frequency change while that CPU is offline.
When the offline CPU is hotplugged back in, all subsequent load
calculations are based off the stale information until another frequency
change occurs and the corresponding set of notifications are sent out.
Avoid this incorrect load tracking by updating the cur_freq for all
CPUs in the same frequency domain.

Change-Id: Ie11ad9a64e7c9b115d01a7c065f22d386eb431d5
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2015-01-05 15:36:33 -08:00
Linux Build Service Account 05e6c29de4 Merge "sched: add preference for prev_cpu in HMP task placement" 2014-12-29 17:31:49 -08:00
Linux Build Service Account 380cadc7f3 Merge "sched: Per-cpu prefer_idle flag" 2014-12-29 17:31:47 -08:00
Linux Build Service Account 307e71816e Merge "sched: Consider PF_WAKE_UP_IDLE in select_best_cpu()" 2014-12-29 17:31:47 -08:00
Olav Haugan 30d383d45b sched: Fix overflow in max possible capacity calculation
The max possible capacity calculation might overflow given large enough
max possible frequency and capacity. Fix potential for overflow.

Change-Id: Ie9345bc657988845aeb450d922052550cca48a5f
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2014-12-26 11:25:32 -08:00
Steve Muckle cecf6c46cd sched: add preference for prev_cpu in HMP task placement
At present the HMP task placement algorithm scans CPUs in numerical
order and if two identical options are found, the first one
encountered is chosen, even if it is different from the task's
previous CPU.

Add a bias towards the task's previous CPU in such situations. Any
time two or more CPUs are considered equivalent (load, C-state, power
cost), if one of them is the task's previous CPU, bias towards that
CPU. The algorithm is otherwise unchanged.

CRs-Fixed: 772033
Change-Id: I511f5b929c2bfa6fdea9e7433893c27b29ed8026
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2014-12-23 15:54:35 -08:00
Srivatsa Vaddagiri 599bfc7503 sched: Per-cpu prefer_idle flag
Remove the global sysctl_sched_prefer_idle flag and replace it with a
per-cpu prefer_idle flag. The per-cpu flag is expected to same for all
cpus in a cluster. It thus provides convenient means to disable
packing in one cluster while allowing packing in another cluster.

Change-Id: Ie4cc73bb1a55b4eac5697be38e558546161faca1
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-12-23 09:52:43 +05:30
Srivatsa Vaddagiri 92ba1d55f3 sched: Consider PF_WAKE_UP_IDLE in select_best_cpu()
sysctl_sched_prefer_idle controls selection of idle cpus for waking
tasks. In some cases, waking to idle cpus help performance while in
other cases it hurts (as tasks incur latency associated with C-state
wakeup). Its ideal if scheduler can adapt prefer_idle behavior based
on the task that is waking up, but that's hard for scheduler to
figure by itself. PF_WAKE_UP_IDLE hint can be provided by external
module/driver in such case to guide scheduler in preferring an idle
cpu for select tasks irrespective of sysctl_sched_prefer_idle flag.

This patch enhances select_best_cpu() to consider PF_WAKE_UP_IDLE
hint. Wakeup posted from any task that has PF_WAKE_UP_IDLE set is a
hint for scheduler to prefer idle cpu for waking tasks. Similarly
scheduler will attempt to place any task with PF_WAKE_UP_IDLE set on
idle cpu when they wakeup.

CRs-Fixed: 773101
Change-Id: Ia8bf334d98fd9fd2ff9eda875430497d55d64ce6
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-12-23 09:52:27 +05:30
Olav Haugan 7e13b27b8b sched: Add sysctl to enable power aware scheduling
Add sysctl to enable energy awareness at runtime. This is useful for
performance/power tuning/measurements and debugging. In addition this
will match up with the Documentation/scheduler/sched-hmp.txt documentation.

Change-Id: I0a9185498640d66917b38bf5d55f6c59fc60ad5c
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2014-12-22 14:37:33 -08:00
Olav Haugan 294b88dc67 sched: Ensure no active EA migration occurs when EA is disabled
There exists a flag called "sched_enable_power_aware" that is not honored
everywhere. Fix this.

Change-Id: I62225939b71b25970115565b4e9ccb450e252d7c
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2014-12-22 14:23:55 -08:00
Joonwoo Park fc994a4b9e sched: take account of irq preemption when calculating irqload delta
If irq raises while sched_irqload() is calculating irqload delta,
sched_account_irqtime() can update rq's irqload_ts which can be greater
than the jiffies stored in sched_irqload()'s context so delta can be
negative.  This negative delta means there was recent irq occurence.
So remove improper BUG_ON().

CRs-fixed: 771894
Change-Id: I5bb01b50ec84c14bf9f26dd9c95de82ec2cd19b5
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2014-12-16 16:56:50 -08:00
Joonwoo Park 2cec55a2e2 sched: Prevent race conditions where upmigrate_min_nice changes
When upmigrate_min_nice is changed dec_nr_big_small_task() can trigger
BUG_ON(rq->nr_big_tasks < 0).  This happens when there is a task which was
considered as non-big task due to its nice > upmigrate_min_nice and later
upmigrate_min_nice is changed to higher value so the task becomes big task.
In this case runqueue still has nr_big_tasks = 0 incorrectly with current
implementation.  Consequently next scheduler tick sees a big task to
schedule and try to decrease nr_big_tasks which is already 0.

Introduce sched_upmigrate_min_nice which is updated atomically and re-count
the number of big and small tasks to fix BUG_ON() triggering.

Change-Id: I6f5fc62ed22bbe5c52ec71613082a6e64f406e58
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2014-12-16 11:05:22 -08:00
Olav Haugan 4beca1fd4d sched: Avoid frequent task migration due to EA in lb
A new tunable exists that allow task migration to be throttled when the
scheduler tries to do task migrations due to Energy Awareness (EA). This
tunable is only taken into account when migrations occur in the tick
path. Extend the usage of the tunable to take into account the load
balancer (lb) path also.

In addition ensure that the start of task execution on a CPU is updated
correctly. If a task is preempted but still runnable on the same CPU the
start of execution should not be updated. Only update the start of
execution when a task wakes up after sleep or moves to a new CPU.

Change-Id: I6b2a8e06d8d2df8e0f9f62b7aba3b4ee4b2c1c4d
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2014-12-13 06:43:49 -08:00
Olav Haugan a7bc092692 sched: Avoid migrating tasks to little cores due to EA
If during the check whether migration is needed we find that there is a
lower power CPU available we commence to find a new CPU for this task.
However, by the time we search for a new CPU the lower power CPU might
no longer be available. We should abort the attempt to migrate a task in
this case.

CRs-Fixed: 764788
Change-Id: I867923a82b95c599278b81cd73bb102b6aff4d03
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2014-12-13 06:43:48 -08:00
Olav Haugan 2c320f2ffa sched: Add temperature to cpu_load trace point
Add the current CPU temperature to the sched_cpu_load trace point.
This will allow us to track the CPU temperature.

CRs-Fixed: 764788
Change-Id: Ib2e3559bbbe3fe07a6b7c8115db606828bc36254
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2014-12-13 06:43:48 -08:00
Olav Haugan 0c0d18bb15 sched: Only do EA migration when CPU throttling is imminent
We do not want to migrate tasks unnecessary to avoid cache hit and other
migration latencies that could affect the performance of the system. Add
a check to only try EA migration when CPU frequency throttling is
imminent.

CRs-Fixed: 764788
Change-Id: I92e86e62da10ce15f1e76a980df3545e93d76348
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2014-12-13 06:34:55 -08:00
Srivatsa Vaddagiri 32c6ac7c62 sched: Avoid frequent migration of running task
Power values for cpus can drop quite considerably when it goes idle.
As a result, the best choice for running a single task in a cluster
can vary quite rapidly. As the task keeps hopping cpus, other cpus go
idle and start being seen as more favorable target for running a task,
leading to task migrating almost every scheduler tick!

Prevent this by keeping track of when a task started running on a cpu
and allowing task migration in tick path (migration_needed()) on
account of energy efficiency reasons only if the task has run
sufficiently long (as determined by sysctl_sched_min_runtime
variable).

Note that currently sysctl_sched_min_runtime setting is considered
only in scheduler_tick()->migration_needed() path and not in
idle_balance() path. In other words, a task could be migrated to
another cpu which did a idle_balance(). This limitation should not
affect high-frequency migrations seen typically (when a single
high-demand task runs on high-performance cpu).

CRs-Fixed: 756570
Change-Id: I96413b7a81b623193c3bbcec6f3fa9dfec367d99
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-12-13 06:34:55 -08:00
Steve Muckle 1bfb9a0dd3 sched: treat sync waker CPUs with 1 task as idle
When a CPU with one task performs a sync wakeup, its
one task is expected to sleep immediately so this CPU
should be treated as idle for the purposes of CPU selection
for the waking task.

This is only done when idle CPUs are the preferred targets
for non-small task wakeups. When prefer_idle is 0, the
CPU is left as non-idle in the selection logic so it is still
a preferred candidate for the sync wakeup.

Change-Id: I65c6535169293e8ba0c37fb5e88aec336338f7d7
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2014-12-10 23:53:58 -08:00
Syed Rameez Mustafa b3c5c54d72 sched: extend sched_task_load tracepoint to indicate prefer_idle
Prefer idle determines whether the scheduler prefers an idle CPU
over a busy CPU or not to wake up a task on. Knowing the correct
value of this tunable is essential in understanding placement
decisions made in select_best_cpu().

Change-Id: I955d7577061abccb65d01f560e1911d9db70298a
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-12-10 23:53:57 -08:00
Steve Muckle 84370f934b sched: extend sched_task_load tracepoint to indicate sync wakeup
Sync wakeups provide a hint to the scheduler about upcoming task
activity. Knowing which wakeups are sync wakeups from logs will
assist in workload analysis.

Change-Id: I6ffe73f2337e56b8234d4097069d5d70ab045eda
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2014-12-10 23:53:56 -08:00
Steve Muckle ee9ddb5f3c sched: add sync wakeup recognition in select_best_cpu
If a wakeup is a sync wakeup, we need to discount the currently
running task's load from the waker's CPU as we calculate the best
CPU for the waking task to land on.

Change-Id: I00c5df626d17868323d60fb90b4513c0dd314825
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2014-12-10 23:53:55 -08:00
Srivatsa Vaddagiri 6e778f0cdc sched: Provide knob to prefer mostly_idle over idle cpus
sysctl_sched_prefer_idle lets the scheduler bias selection of
idle cpus over mostly idle cpus for tasks. This knob could be
useful to control balance between power and performance.

Change-Id: Ide6eef684ef94ac8b9927f53c220ccf94976fe67
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-12-10 23:53:54 -08:00
Steve Muckle 75d1c94217 sched: make sched_cpu_high_irqload a runtime tunable
It may be desirable to be able to alter the scehd_cpu_high_irqload
setting easily, so make it a runtime tunable value.

Change-Id: I832030eec2aafa101f0f435a4fd2d401d447880d
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2014-12-10 23:53:53 -08:00
Steve Muckle 00acd0448b sched: trace: extend sched_cpu_load to print irqload
The irqload is used in determining whether CPUs are mostly idle
so it is useful to know this value while viewing scheduler traces.

Change-Id: Icbb74fc1285be878f254ae54886bdb161b14a270
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2014-12-10 23:53:51 -08:00
Steve Muckle 51f0d7663b sched: avoid CPUs with high irq activity
CPUs with significant IRQ activity will not be able to serve tasks
quickly. Avoid them if possible by disqualifying such CPUs from
being recognized as mostly idle.

Change-Id: I2c09272a4f259f0283b272455147d288fce11982
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2014-12-10 23:53:47 -08:00
Steve Muckle a14e01109a sched: refresh sched_clock() after acquiring rq lock in irq path
The wallclock time passed to sched_account_irqtime() may be stale
after we wait to acquire the runqueue lock. This could cause problems
in update_task_ravg because a different CPU may have advanced
this CPU's window_start based on a more up-to-date wallclock value,
triggering a BUG_ON(window_start > wallclock).

Change-Id: I316af62d1716e9b59c4a2898a2d9b44d6c7a75d8
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2014-12-10 19:50:46 -08:00
Steve Muckle 5fdc1d3aaa sched: track soft/hard irqload per-RQ with decaying avg
The scheduler currently ignores irq activity when deciding which
CPUs to place tasks on. If a CPU is getting hammered with IRQ activity
but has no tasks it will look attractive to the scheduler as it will
not be in a low power mode.

Track irqload with a decaying average. This quantity can be used
in the task placement logic to avoid CPUs which are under high
irqload. The decay factor is 3/4. Note that with this algorithm the
tracked irqload quantity will be higher than the actual irq time
observed in any single window. Some sample outcomes with steady
irqloads per 10ms window and the 3/4 decay factor (irqload of 10 is
used as a threshold in a subsequent patch):

irqload per window        load value asymptote      # windows to > 10
2ms			  8			    n/a
3ms			  12			    7
4ms			  16			    4
5ms			  20			    3

Of course irqload will not be constant in each window, these are just
given as simple examples.

Change-Id: I9dba049f5dfdcecc04339f727c8dd4ff554e01a5
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2014-12-10 19:50:45 -08:00
Steve Muckle c5c90f6099 sched: do not set window until sched_clock is fully initialized
The system initially uses a jiffy-based sched clock. When the platform
registers a new timer for sched_clock, sched_clock can jump backwards.
Once sched_clock_postinit() runs it should be safe to rely on it.

Also sched_clock_cpu() relies on completion of sched_clock_init()
and until that happens sched_clock_cpu() returns zero. This is used
in the irq accounting path which window-based stats relies upon.
So do not set window_start until sched_clock_cpu() is working.

Change-Id: Ided349de8f8554f80a027ace0f63ea52b1c38c68
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2014-12-10 19:50:44 -08:00
Linux Build Service Account fea8806a70 Merge "sched: Fix inaccurate accounting for real-time task" 2014-12-06 14:38:08 -08:00
Linux Build Service Account cd2d717655 Merge "Revert "sched: update_rq_clock() must skip ONE update"" 2014-12-06 14:38:07 -08:00
Linux Build Service Account b4229d736e Merge "sched: Make RT tasks eligible for boost" 2014-12-05 00:05:48 -08:00
Syed Rameez Mustafa fce95c9a12 sched: Make RT tasks eligible for boost
During sched boost RT tasks currently end up going to the lowest
power cluster. This can be a performance bottleneck especially if
the frequency and IPC differences between clusters are high.
Furthermore, when RT tasks go over to the little cluster during
boost, the load balancer keeps attempting to pull work over to the
big cluster. This results in pre-emption of the executing RT task
causing more delays. Finally, containing more work on a single
cluster during boost might help save some power if the little
cluster can then enter deeper low power modes.

Change-Id: I177b2e81be5657c23e7ac43889472561ce9993a9
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-12-03 19:50:25 -08:00
Linux Build Service Account a5f4e12c8d Merge "sched: Limit LBF_PWR_ACTIVE_BALANCE to within cluster" 2014-12-03 16:31:51 -08:00
Linux Build Service Account 6be10b2d68 Merge "sched: Packing support until a frequency threshold" 2014-12-03 16:31:32 -08:00
Srivatsa Vaddagiri 66b5ce9db0 sched: Limit LBF_PWR_ACTIVE_BALANCE to within cluster
When higher power (performance) cluster has only one online cpu, we
currently let an idle cpu in lower power cluster pull a running task
from performance cluster via active balance. Active balance for
power-aware reasons is supposed to be restricted to balance within
cluster, the check for which is not correctly implemented.

Change-Id: I5fba7f01ad80c082a9b27e89b7f6b17a6d9cde14
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-12-02 15:28:14 +05:30
Srivatsa Vaddagiri 8fd5aa3bf2 sched: Fix inaccurate accounting for real-time task
It is possible that rq->clock_task was not updated in put_prev_task()
in which case we can potentially overcharge a real-time task for time
it did not run. This is because clock_task could be stale and not
represent the exact time real-time task started running.

Fix this by forcing update of rq->clock_task when real-time task
starts running.

Change-Id: I8320bb4e47924368583127b950d987925e8e6a6c
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-12-02 15:09:11 +05:30
Srivatsa Vaddagiri 21357f54c1 Revert "sched: update_rq_clock() must skip ONE update"
This reverts commit ab2ff007fe
as it was found to cause some performance regressions

Change-Id: Idd71fb04c77f5c9b0bc6bccc66b94ab5a7368471
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-12-02 14:46:32 +05:30
Srivatsa Vaddagiri 57da62614c sched: Packing support until a frequency threshold
Add another dimension for task packing based on frequency. This patch
adds a per-cpu tunable, rq->mostly_idle_freq, which when set will
result in tasks being packed on a single cpu in cluster as long as
cluster frequency is less than set threshold.

Change-Id: I318e9af6c8788ddf5dfcda407d621449ea5343c0
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-12-02 11:48:30 +05:30
Linux Build Service Account 72de2d6cb8 Merge "sched: update_rq_clock() must skip ONE update" 2014-11-30 16:19:16 -08:00
Linux Build Service Account b4b0ebc5f9 Merge "sched: tighten up jiffy to sched_clock mapping" 2014-11-29 17:17:43 -08:00
Srivatsa Vaddagiri ab2ff007fe sched: update_rq_clock() must skip ONE update
Prevent large wakeup latencies from being accounted to the wrong task.

Change-Id: Ie9932acb8a733989441ff2dd51c50a2626cfe5c5
Cc: <stable@vger.kernel.org>
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
CRs-Fixed: 755576
Patch-mainline: http://permalink.gmane.org/gmane.linux.kernel/1677324
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-11-25 12:28:35 +05:30
Linux Build Service Account aa285b577f Merge "sched: per-cpu mostly_idle threshold" 2014-11-20 15:36:30 -08:00
Linux Build Service Account 62b1d26801 Merge "sched: Add API to set task's initial task load" 2014-11-20 15:36:29 -08:00
Steve Muckle f17fe85baf sched: tighten up jiffy to sched_clock mapping
The tick code already tracks exact time a tick is expected
to arrive. This can be used to eliminate slack in the jiffy
to sched_clock mapping that aligns windows between a caller
of sched_set_window and the scheduler itself.

Change-Id: I9d47466658d01e6857d7457405459436d504a2ca
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2014-11-19 15:06:33 -08:00
Syed Rameez Mustafa a40d3ce56e sched: Avoid unnecessary load balance when tasks don't fit on dst_cpu
When considering to pull over a task that does not fit on the
destination CPU make sure that the busiest group has exceeded its
capacity. While the change is applicable to all groups, the biggest
impact will be on migrating big tasks to little CPUs. This should
only happen when the big cluster is no longer capable of balancing
load within the cluster. This change should have no impact on single
cluster systems.

Change-Id: I6d1ef0e0d878460530f036921ce4a4a9c1e1394b
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-11-13 12:24:31 -08:00
Steve Muckle e3d8a00dab sched: print sched_cpu_load tracepoint for all CPUs
When select_best_cpu() is called because a task is on a suboptimal
CPU, certain CPUs are skipped because moving the task there would
not make things any better. For the purposes of debugging though it
is useful to always see the state of all CPUs.

Change-Id: I76965663c1feef5c4cfab9909e477b0dcf67272d
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2014-11-10 19:22:51 -08:00
Srivatsa Vaddagiri ed7d7749e9 sched: per-cpu mostly_idle threshold
sched_mostly_idle_load and sched_mostly_idle_nr_run knobs help pack
tasks on cpus to some extent. In some cases, it may be desirable to
have different packing limits for different cpus. For example, pack to
a higher limit on high-performance cpus compared to power-efficient
cpus.

This patch removes the global mostly_idle tunables and makes them
per-cpu, thus letting task packing behavior to be controlled in a
fine-grained manner.

Change-Id: Ifc254cda34b928eae9d6c342ce4c0f64e531e6c2
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-11-06 15:27:00 +05:30
Srivatsa Vaddagiri f0e281597c sched: Add API to set task's initial task load
Add a per-task attribute, init_load_pct, that is used to initialize
newly created children's initial task load. This helps important
applications launch their child tasks on cpus with highest capacity.

Change-Id: Ie9665fd2aeb15203f95fd7f211c50bebbaa18727
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-11-05 14:26:59 +05:30
Syed Rameez Mustafa 297c4ccce8 sched: use C-states in non-small task wakeup placement logic
Currently when a non-small task wakes up, the task placement logic
first tries to find the least loaded CPU before breaking any ties
via the power cost of running the task on those CPUs. When the power
cost is also same, however, the scheduler just selects the first CPU
it came across. Use C-states to further break ties when the power
cost is the same for multiple CPUs. The scheduler will now pick a
CPU in the shallowest C-state.

Change-Id: Ie1401b305fa02758a2f7b30cfca1afe64459fc2b
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-11-04 14:11:24 -08:00
Linux Build Service Account c44483c313 Merge "sched: Provide an easy method to log context switch latencies" 2014-10-24 22:51:02 -07:00
Linux Build Service Account 0f1e07cdf9 Merge "sched: take rq lock prior to saving idle task's mark_start" 2014-10-24 16:09:25 -07:00
Syed Rameez Mustafa a2006e83ba sched: Provide an easy method to log context switch latencies
Allow logging of various sections of context switch in order to derive
the worst case latencies associated with them. This is required for
scheduler profiling.

Change-Id: I3a5009cb3088cc7ace2cd3130d4d7b24e957bada
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-10-23 17:38:22 -07:00
Linux Build Service Account 27c06362c4 Merge "sched: update governor notification logic" 2014-10-22 15:58:33 -07:00
Steve Muckle eed96dfa2a sched: take rq lock prior to saving idle task's mark_start
When the idle task is being re-initialized during hotplug its
mark_start value must be retained. The runqueue lock must be
held when reading this value though to serialize this with
other CPUs that could update the idle task's window-based
statistics.

CRs-Fixed: 743991
Change-Id: I1bca092d9ebc32a808cea2b9fe890cd24dc868cd
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2014-10-22 15:12:41 -07:00
Srivatsa Vaddagiri f3386c7cfb sched: update governor notification logic
Make criteria for notifying governor to be per-cpu. Governor is
notified of any large change in cpu's busy time statistics
(rq->prev_runnable_sum) since the last reported value.

Change-Id: I727354d994d909b166d093b94d3dade7c7dddc0d
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-10-15 14:57:18 -07:00
Srivatsa Vaddagiri f99927a703 sched: window-stats: Retain idle thread's mark_start
init_idle() is called on a cpu's idle-thread once at bootup and
subsequently everytime the cpu is hot-added. Since init_idle() calls
__sched_fork(), we end up blowing idle thread's ravg.mark_start value.
As a result we will fail to accurately maintain cpu's
curr/prev_runnable_sum counters. Below example illustrates such a
failure:

CS = curr_runnable_sum, PS = prev_runnable_sum

t0 -> New window starts for CPU2
<after some_task_activity> CS = X, PS = Y
t1 -> <cpu2 is hot-removed. idle_task start's running on cpu2>
      At this time, cpu2_idle_thread.ravg.mark_start = t1

t1 -> t0 + W. One window elapses. CPU2 still hot-removed. We
	defer swapping CS and PS until some future task event occurs

t2 -> CPU2 hot-added.  _cpu_up()->idle_thread_get()->init_idle()
	->__sched_fork() results in cpu2_idle_thread.ravg.mark_start = 0

t3 -> Some task wakes on cpu2. Since mark_start = 0, we don't swap CS
	and PS => which is a BUG!

Fix this by retaining idle task's original mark_start value during
init_idle() call.

Change-Id: I4ac9bfe3a58fb5da8a6c7bc378c79d9930d17942
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-10-13 16:11:20 -07:00
Linux Build Service Account 53d2a04a26 Merge "sched: Stop task migration to busy CPUs due to power active balance" 2014-10-12 08:39:38 -07:00
Olav Haugan 72bbd4b7cb sched: Add checks for frequency change
We need to check for frequency change when a task is migrated due to
affinity change and during active balance.

Change-Id: I96676db04d34b5b91edd83431c236a1c28166985
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2014-10-09 15:37:22 -07:00
Srivatsa Vaddagiri 19b3f3f871 sched: Use absolute scale for notifying governor
Make the tunables used for deciding the need for notification to be on
absolute scale. The earlier scale (in percent terms relative to
cur_freq) does not work well with available range of frequencies. For
example, 100% tunable value would work well for lower range of
frequencies and not for higher range. Having the tunable to be on
absolute scale makes tuning more realistic.

Change-Id: I35a8c4e2f2e9da57f4ca4462072276d06ad386f1
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-10-03 14:03:56 -07:00
Srivatsa Vaddagiri 2568673dd6 sched: window-stats: Enhance cpu busy time accounting
rq->curr/prev_runnable_sum counters represent cpu demand from various
tasks that have run on a cpu. Any task that runs on a cpu will have a
representation in rq->curr_runnable_sum. Their partial_demand value
will be included in rq->curr_runnable_sum. Since partial_demand is
derived from historical load samples for a task, rq->curr_runnable_sum
could represent "inflated/un-realistic" cpu usage. As an example, lets
say that task with partial_demand of 10ms runs for only 1ms on a cpu.
What is included in rq->curr_runnable_sum is 10ms (and not the actual
execution time of 1ms). This leads to cpu busy time being reported on
the upside causing frequency to stay higher than necessary.

This patch fixes cpu busy accounting scheme to strictly represent
actual usage. It also provides for conditional fixup of busy time upon
migration and upon heavy-task wakeup.

CRs-Fixed: 691443
Change-Id: Ic4092627668053934049af4dfef65d9b6b901e6b
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-10-03 14:03:51 -07:00
Srivatsa Vaddagiri dababc266f sched: window-stats: ftrace event improvements
Add two new ftrace event:

* trace_sched_freq_alert, to log notifications sent
  to governor for requesting change in frequency.
* trace_sched_get_busy, to log cpu busytime information returned by
  scheduler

Extend existing ftrace events as follows:

* sched_update_task_ravg() event to log irqtime parameter
* sched_migration_update_sum() to log threadid which is being migrated
  (and thus responsible for update of curr_runnable_sum and
  prev_runnable_sum counters)

Change-Id: Ia68ce0953a2d21d319a1db7f916c51ff6a91557c
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-10-03 13:47:29 -07:00
Srivatsa Vaddagiri 86df733742 sched: improve logic for alerting governor
Currently we send notification to governor not taking note of cpus
that are synchronized with regard to their frequency. As a result,
scheduler could send pointless notifications (notification spam!).

Avoid this by considering synchronized cpus and alerting governor only
when the highest demand of any cpu within cluster far exceeds or falls
behind current frequency.

Change-Id: I74908b5a212404ca56b38eb94548f9b1fbcca33d
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-10-03 13:46:18 -07:00
Syed Rameez Mustafa 0b013c8593 sched: Stop task migration to busy CPUs due to power active balance
Power active balance should only be invoked when the destination CPU
is calling load balance with either a CPU_IDLE or a CPU_NEWLY_IDLE
environment. We do not want to push tasks towards busy CPUs even they
are a more power efficient place to run that task. This can cause
higher scheduling latencies due to the resulting load imbalance.

Change-Id: I8e0f242338887d189e2fc17acfb63586e7c40839
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-10-02 17:37:09 -07:00
Srivatsa Vaddagiri 6cb3d32976 sched: window-stats: Fix accounting bug in legacy mode
TASK_UPDATE event currently does not result in increment of
rq->curr_runnable_sum in legacy mode, which is wrong. As a result,
cpu busy time reported under legacy mode could be incorrect.

Change-Id: Ifa76c735a0ead23062c1a64faf97e7b801b66bf9
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-09-15 12:00:46 +05:30
Srivatsa Vaddagiri 802a513d90 sched: window-stats: Note legacy mode in fork() and exit()
In legacy mode, mark_task_starting() should avoid adding (new) task's
(initial) demand to rq->curr_runnable_sum and rq->prev_runnable_sum.
Similarly exit() should avoid removing (exiting) task's demand from
rq->curr_runnable_sum and rq->prev_runnable_sum (as those counters
don't include task's demand and partial_demand values in legacy mode).

Change-Id: I26820b1ac5885a9d681d363ec53d6866a2ea2e6f
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-09-15 12:00:46 +05:30
Srivatsa Vaddagiri 718293c53c sched: Fix reference to stale task_struct in try_to_wake_up()
try_to_wake_up() currently drops p->pi_lock and later checks for need
to notify cpufreq governor on task migrations or wakeups. However the
woken task could exit between the time p->pi_lock is released and the
time the test for notification is run. As a result, the test for
notification could refer to an exited task. task_notify_on_migrate(p)
could thus lead to invalid memory reference.

Fix this by running the test for notification with task's pi_lock
held.

Change-Id: I1c7a337473d2d8e79342a015a179174ce00702e1
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-09-15 12:00:46 +05:30
Syed Rameez Mustafa 37c0e84719 sched: Remove hack to enable/disable HMP scheduling extensions
The current method of turning HMP scheduling extensions on or off
based on the number of CPUs is inappropriate as there may be SoCs with
4 or less cores that require the use of these extensions. Remove this
hack as HMP extensions will now be enabled/disabled via command line
options.

Change-Id: Id44b53c2c3b3c3b83e1911a834e2c824f3958135
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-09-11 09:18:03 -07:00
Linux Build Service Account 7a62303fd9 Merge "sched: add check for cpu idleness when using C-state information" 2014-09-11 01:52:32 -07:00
Linux Build Service Account e81a3dc7f7 Merge "sched: extend sched_task_load tracepoint to indicate small tasks" 2014-09-11 01:52:31 -07:00
Linux Build Service Account cef2bfadb1 Merge "sched: Add C-state tracking to the sched_cpu_load trace event" 2014-09-09 04:48:06 -07:00
Linux Build Service Account 0dbd5f1b7b Merge "sched: window-stats: add a new AVG policy" 2014-09-09 04:47:32 -07:00
Linux Build Service Account 672d3eb95f Merge "sched: fix wrong load_scale_factor/capacity/nr_big/small_tasks" 2014-09-09 00:57:10 -07:00
Srivatsa Vaddagiri 9e37153f17 sched: fix wrong load_scale_factor/capacity/nr_big/small_tasks
A couple bugs exist with incorrect use of cpu_online_mask in
pre/post_big_small_task() functions, leading to potentially incorrect
computation of load_scale_factor/capacity/nr_big/small_tasks.

pre/post_big_small_task_count_change() use cpu_online_mask in an
unreliable manner. While local_irq_disable() in
pre_big_small_task_count_change() ensures a cpu won't go away in
cpu_online_mask, nothing prevents a cpu from coming online
concurrently. As a result, cpu_online_mask used in
pre_big_small_task_count_change() can be inconsistent with that used
in post_big_small_task_count_change() which can lead to an attempt to
unlock rq->lock which was not taken before.

Secondly, when either max_possible_freq or min_max_freq is changing,
it needs to trigger recomputation of load_scale_factor and capacity
for *all* cpus, even if some are offline. Otherwise, an offline cpu
could later come online with incorrect load_scale_factor/capacity.

While it should be sufficient to scan online cpus for
updating their nr_big/small_tasks in
post_big_small_task_count_change(), unfortunately it sounds pretty
hard to provide a stable cpu_online_mask when its called from
cpufreq_notifier_policy(). cpufreq framework can trigger a
CPUFREQ_NOTIFY notification in multiple contexts, some in cpu-hotplug
paths, which makes it pretty hard to guess whether get_online_cpus()
can be taken without causing deadlocks or not. To workaround the
insufficient information we have about the hotplug-safety context when
CPUFREQ_NOTIFY is issued, have post_big_small_task_count_change()
traverse all possible cpus in updating nr_big/small_task_count.

CRs-Fixed: 717134
Change-Id: Ife8f3f7cdfd77d5a21eee63627d7a3465930aed5
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-09-08 17:18:24 -07:00
Syed Rameez Mustafa 04953f4035 sched: add check for cpu idleness when using C-state information
Task enqueue on a CPU occurs prior to that CPU exiting an idle state.
For the time duration between enqueue and idle exit, the CPU C-state
information can no longer be relied on for further task placement
since already enqueued/waiting tasks are not taken into account. The
small task placement algorithm implicitly assumes a non zero C-state
implies an idle CPU. Since this assumption is incorrect for the
duration described above, make the cpu_idle() check explicit. This
problem can lead to task packing beyond the mostly_idle threshold.

Change-Id: Idb5be85705d6b15f187d011ea2196e1bfe31dbf2
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-09-08 15:25:11 -07:00
Syed Rameez Mustafa 444e5dee14 sched: extend sched_task_load tracepoint to indicate small tasks
While debugging its always useful to know whether a task is small or
not to determine the scheduling algorithm being used. Have the
sched_task_load tracepoint indicate this information rather than
having to do manual calculations for every task placement.

Change-Id: Ibf390095f05c7da80df1ebfe00f4c5af66c97d12
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-09-08 14:40:58 -07:00
Syed Rameez Mustafa e85e73f1d7 sched: Add C-state tracking to the sched_cpu_load trace event
C-state information is used by the scheduler for small task placement
decisions. Track this information in the sched_cpu_load trace event.
Also add the trace event in best_small_task_cpu(). This will help
better understand small task placement decisions.

Change-Id: Ife5f05bba59f85c968fab999bd13b9fb6b1c184e
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-09-08 11:29:57 -07:00
Syed Rameez Mustafa bf3e6c0e55 sched: window-stats: add a new AVG policy
The current WINDOW_STATS_AVG policy is actually a misnomer since it
uses the maximum value of the runtime in the recent window and the
average of the past ravg_hist_size windows. Add a policy that only
uses the average and call it WINDOW_STATS_AVG policy. Rename all the
other polices to make them shorter and unambiguous.

Change-Id: I080a4ea072a84a88858ca9da59a4151dfbdbe62c
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-09-08 11:07:41 -07:00
Linux Build Service Account c5bc590f13 Merge "sched: Fix compile error" 2014-09-07 08:21:53 -07:00
Srivatsa Vaddagiri 594ce07f48 sched: Fix compile error
sched_get_busy(), sched_set_io_is_busy() and sched_set_window() need
to be defined only when CONFIG_SCHED_FREQ_INPUT is defined, otherwise
we get compilation error related to dual definition of those routines

Change-Id: Ifd5c9b6675b78d04c2f7ef0e24efeae70f7ce19b
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-09-04 12:14:38 +05:30
Syed Rameez Mustafa e4600ab9eb sched: update ld_moved for active balance from the load balancer
ld_moved is currently left set to 0 when the load balancer calls upon
active balance. This behavior is incorrect as it prevents the termination
of load balance for parent sched domains. Currently the feature is used
quite frequently for power active balance and sched boost. This means that
while sched boost is in effect we could run into a scenario where a more
power efficient newly idle big CPU first triggers active migration from a
less power efficient busy big CPU. It then continues to load balance at the
cluster level causing active migration for a task running on a little CPU.
Consequently the more power efficient big CPU ends up with two tasks where
as the less power efficient big CPU may become idle. Fix this problem by
updating ld_moved when active migration has been requested.

Change-Id: I52e84eafb77249fd9378ebe531abe2d694178537
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-09-03 20:01:44 -07:00
Syed Rameez Mustafa 5f5ecf01d3 sched: actively migrate tasks to idle big CPUs during sched boost
The sched boost feature is currently tick driven, i.e. task placement
decisions only take place at a tick (or wakeup). The load balancer
does not have any knowledge of boost being in effect. Tasks that are
woken up on a little CPU when all big CPUs are busy will continue
executing there at least until the next tick even if one of the big
CPUs becomes idle. Reduce this latency by adding support for detecting
whether boost is in effect or not in the load balancer.  If boost is
in effect any big CPU running idle balance will trigger active
migration from a little CPU with the highest task load.

Change-Id: Ib2828809efa0f9857f5009b29931f63b276a59f3
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-09-03 19:42:15 -07:00
Syed Rameez Mustafa d3990aabb5 sched: always do idle balance with a NEWLY_IDLE idle environment
With the introduction of energy aware scheduling, if idle_balance() is
to be called on behalf of a different CPU which is idle, CPU_IDLE is
used in the environment for load_balance(). This, however, introduces
subtle differences in load calculations and policies in the load
balancer. For example there are restrictions on which CPU is permitted
to do load balancing during !CPU_NEWLY_IDLE (see update_sg_lb_stats)
and find_busiest_group() uses different criteria to detect the
presence of a busy group. There are other differences as well. Revert
back to using the NEWLY_IDLE environment irrespective of whether
idle_balance() is called for the newly idle CPU or on behalf on
already existing idle CPU. This will ensure that task movement logic
while doing idle balance remains unaffected.

Change-Id: I388b0ad9a38ca550667895c8ed19628f3d25ce1a
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-09-03 19:23:41 -07:00
Syed Rameez Mustafa 9c37494817 sched: fix bail condition in bail_inter_cluster_balance()
Following commit efcad25cbfb (revert "sched: influence cpu_power based
on max_freq and efficiency), all CPUs in the system have the same
cpu_power and consequently the same group capacity. Therefore, the
check in bail_inter_cluster_balance() can now no longer be used to
distinguish a higher performance cluster from one with lower
performance. The check is currently broken and always returns true for
every load balancing attempt. Fix this by using runqueue capacity
instead which can still be used as a good measure of cluster
capabilities.

Also the logic for distinguishing between idle environments and using
a different sched group capacity in update_sd_pick_busiest() is
redundant. sgs->group_capacity would now always be equal to the number
of CPUs in the group. Use sgs->group_capacity directly in conditonal
checks in that function.

Change-Id: Idecfd1ed221d27d4324b20539e5224a92bf8b751
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-09-03 19:23:40 -07:00
Srivatsa Vaddagiri 1b36dc118d sched: Initialize env->loop variable to 0
load_balance() function does not explicitly initialize env->loop
variable to 0. As a result, there is a vague possibility of
move_tasks() hitting a very long (unnecessary) loop when its unable to
move tasks from src_cpu. This can lead to unpleasant results like a
watchdog bark. Fix this by explicitly initializing env->loop variable
to 0 (in both load_balance() and active_load_balance_cpu_stop()).

Change-Id: I36b84c91a9753870fa16ef9c9339db7b706527be
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-08-25 16:07:57 +05:30
Linux Build Service Account a4e6dcf42b Merge "sched: window-stats: use policy_mutex in sched_set_window()" 2014-08-24 20:01:43 -07:00
Linux Build Service Account 355e55afc6 Merge "sched: window-stats: Avoid taking all cpu's rq->lock for long" 2014-08-24 20:01:43 -07:00
Linux Build Service Account d7bca8f374 Merge "sched: window_stats: Add "disable" mode support" 2014-08-24 20:01:41 -07:00
Linux Build Service Account bf7b729348 Merge "sched: window-stats: Fix exit race" 2014-08-24 20:01:41 -07:00