Commit Graph

214 Commits

Author SHA1 Message Date
Shakeel Butt 8ef718e452 mm, oom: fix use-after-free in oom_kill_process
commit cefc7ef3c87d02fc9307835868ff721ea12cc597 upstream.

Syzbot instance running on upstream kernel found a use-after-free bug in
oom_kill_process.  On further inspection it seems like the process
selected to be oom-killed has exited even before reaching
read_lock(&tasklist_lock) in oom_kill_process().  More specifically the
tsk->usage is 1 which is due to get_task_struct() in oom_evaluate_task()
and the put_task_struct within for_each_thread() frees the tsk and
for_each_thread() tries to access the tsk.  The easiest fix is to do
get/put across the for_each_thread() on the selected task.

Now the next question is should we continue with the oom-kill as the
previously selected task has exited? However before adding more
complexity and heuristics, let's answer why we even look at the children
of oom-kill selected task? The select_bad_process() has already selected
the worst process in the system/memcg.  Due to race, the selected
process might not be the worst at the kill time but does that matter?
The userspace can use the oom_score_adj interface to prefer children to
be killed before the parent.  I looked at the history but it seems like
this is there before git history.

Link: http://lkml.kernel.org/r/20190121215850.221745-1-shakeelb@google.com
Reported-by: syzbot+7fbbfa368521945f0e3d@syzkaller.appspotmail.com
Fixes: 6b0c81b3be ("mm, oom: reduce dependency on tasklist_lock")
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-27 22:05:56 +02:00
Ian Maund 8b08aa9e75 This is the 3.10.67 stable release
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJUyuGRAAoJEDjbvchgkmk+7EwQALYPOeh+AManQFB1MQvFuOgZ
 /4ulpjhGXw/RPTKHMeyHo8vRfUhMOx8UPF62uql+g1l9b/Zt2bs6qXu4QcxRRsQc
 trSTUpi+U14y1hkgqOVOcFYP2ZaTjNEBQgLJ4eGn46CliLqme+rfoyRYm2GXzcR4
 6cbSAr3mufdFIpi9/8Dn62Gv0aws5lIv3qkHJXznyuux3tisPT5y6Ux2KJoivPn/
 SqADtRpwo+7lTjl15fE++9AqNsGMorV6toT2OO/7nXP+824psInKLmREAT2qC99b
 BG61vcYdxOuHtzmwrvCf1jSRjxhvZT0j2xhBr/vCKcxy08AT0vDv68zrV1r6TIuu
 U7/CKXtFBY95cjfnkTLJuswBSuIA/+sQHV6DaddH0V8fcZ6rQMLrblQ9ZcFFFkmT
 2SG6lmlXqZvcEKYGMnL/Dcow1rkRhB5stiGgTkYxjiRSRpzAHISRJ/GGpsT+rRqK
 HpBs5p9JshvRl7RWKwAu+DNGaEK1X/WYxc4/jw6dZFWX7lEWSMIPlr9zXgZCZ39y
 V6lV1VVlT9/CSs1swKHUyhHHehlFsnIlQ6Fkiycr/KkuqBLs92Hyb7WhpVa819yX
 osXdxSm6J54skiOLKYpBWHpnY09Tc+p28VEfMpErTExgp2oE8F34K7kdhoQPQb97
 2mHiXNa+J4CLUNQ+sRmw
 =HDBo
 -----END PGP SIGNATURE-----

Merge commit 'v3.10.67' into msm-3.10

This merge brings us up to date with upstream kernel.org tag v3.10.67.
It also contains changes to allow forbidden warnings introduced in
the commit 'core, nfqueue, openvswitch: Orphan frags in skb_zerocopy
and handle errors'. Once upstream has corrected these warnings, the
changes to scripts/gcc-wrapper.py, in this commit, can be reverted.

* commit 'v3.10.67' (915 commits)
  Linux 3.10.67
  md/raid5: fetch_block must fetch all the blocks handle_stripe_dirtying wants.
  ext4: fix warning in ext4_da_update_reserve_space()
  quota: provide interface for readding allocated space into reserved space
  crypto: add missing crypto module aliases
  crypto: include crypto- module prefix in template
  crypto: prefix module autoloading with "crypto-"
  drbd: merge_bvec_fn: properly remap bvm->bi_bdev
  Revert "swiotlb-xen: pass dev_addr to swiotlb_tbl_unmap_single"
  ipvs: uninitialized data with IP_VS_IPV6
  KEYS: close race between key lookup and freeing
  sata_dwc_460ex: fix resource leak on error path
  x86/asm/traps: Disable tracing and kprobes in fixup_bad_iret and sync_regs
  x86, tls: Interpret an all-zero struct user_desc as "no segment"
  x86, tls, ldt: Stop checking lm in LDT_empty
  x86/tsc: Change Fast TSC calibration failed from error to info
  x86, hyperv: Mark the Hyper-V clocksource as being continuous
  clocksource: exynos_mct: Fix bitmask regression for exynos4_mct_write
  can: dev: fix crtlmode_supported check
  bus: mvebu-mbus: fix support of MBus window 13
  ARM: dts: imx25: Fix PWM "per" clocks
  time: adjtimex: Validate the ADJ_FREQUENCY values
  time: settimeofday: Validate the values of tv from user
  dm cache: share cache-metadata object across inactive and active DM tables
  ipr: wait for aborted command responses
  drm/i915: Fix mutex->owner inspection race under DEBUG_MUTEXES
  scripts/recordmcount.pl: There is no -m32 gcc option on Super-H anymore
  ALSA: usb-audio: Add mic volume fix quirk for Logitech Webcam C210
  libata: prevent HSM state change race between ISR and PIO
  pinctrl: Fix two deadlocks
  gpio: sysfs: fix gpio device-attribute leak
  gpio: sysfs: fix gpio-chip device-attribute leak
  Linux 3.10.66
  s390/3215: fix tty output containing tabs
  s390/3215: fix hanging console issue
  fsnotify: next_i is freed during fsnotify_unmount_inodes.
  netfilter: ipset: small potential read beyond the end of buffer
  mmc: sdhci: Fix sleep in atomic after inserting SD card
  LOCKD: Fix a race when initialising nlmsvc_timeout
  x86, um: actually mark system call tables readonly
  um: Skip futex_atomic_cmpxchg_inatomic() test
  decompress_bunzip2: off by one in get_next_block()
  ARM: shmobile: sh73a0 legacy: Set .control_parent for all irqpin instances
  ARM: omap5/dra7xx: Fix frequency typos
  ARM: clk-imx6q: fix video divider for rev T0 1.0
  ARM: imx6q: drop unnecessary semicolon
  ARM: dts: imx25: Fix the SPI1 clocks
  Input: I8042 - add Acer Aspire 7738 to the nomux list
  Input: i8042 - reset keyboard to fix Elantech touchpad detection
  can: kvaser_usb: Don't send a RESET_CHIP for non-existing channels
  can: kvaser_usb: Reset all URB tx contexts upon channel close
  can: kvaser_usb: Don't free packets when tight on URBs
  USB: keyspan: fix null-deref at probe
  USB: cp210x: add IDs for CEL USB sticks and MeshWorks devices
  USB: cp210x: fix ID for production CEL MeshConnect USB Stick
  usb: dwc3: gadget: Stop TRB preparation after limit is reached
  usb: dwc3: gadget: Fix TRB preparation during SG
  OHCI: add a quirk for ULi M5237 blocking on reset
  gpiolib: of: Correct error handling in of_get_named_gpiod_flags
  NFSv4.1: Fix client id trunking on Linux
  ftrace/jprobes/x86: Fix conflict between jprobes and function graph tracing
  vfio-pci: Fix the check on pci device type in vfio_pci_probe()
  uvcvideo: Fix destruction order in uvc_delete()
  smiapp: Take mutex during PLL update in sensor initialisation
  af9005: fix kernel panic on init if compiled without IR
  smiapp-pll: Correct clock debug prints
  video/logo: prevent use of logos after they have been freed
  storvsc: ring buffer failures may result in I/O freeze
  iscsi-target: Fail connection on short sendmsg writes
  hp_accel: Add support for HP ZBook 15
  cfg80211: Fix 160 MHz channels with 80+80 and 160 MHz drivers
  ARC: [nsimosci] move peripherals to match model to FPGA
  drm/i915: Force the CS stall for invalidate flushes
  drm/i915: Invalidate media caches on gen7
  drm/radeon: properly filter DP1.2 4k modes on non-DP1.2 hw
  drm/radeon: check the right ring in radeon_evict_flags()
  drm/vmwgfx: Fix fence event code
  enic: fix rx skb checksum
  alx: fix alx_poll()
  tcp: Do not apply TSO segment limit to non-TSO packets
  tg3: tg3_disable_ints using uninitialized mailbox value to disable interrupts
  netlink: Don't reorder loads/stores before marking mmap netlink frame as available
  netlink: Always copy on mmap TX.
  Linux 3.10.65
  mm: Don't count the stack guard page towards RLIMIT_STACK
  mm: propagate error from stack expansion even for guard page
  mm, vmscan: prevent kswapd livelock due to pfmemalloc-throttled process being killed
  perf session: Do not fail on processing out of order event
  perf: Fix events installation during moving group
  perf/x86/intel/uncore: Make sure only uncore events are collected
  Btrfs: don't delay inode ref updates during log replay
  ARM: mvebu: disable I/O coherency on non-SMP situations on Armada 370/375/38x/XP
  scripts/kernel-doc: don't eat struct members with __aligned
  nilfs2: fix the nilfs_iget() vs. nilfs_new_inode() races
  nfsd4: fix xdr4 inclusion of escaped char
  fs: nfsd: Fix signedness bug in compare_blob
  serial: samsung: wait for transfer completion before clock disable
  writeback: fix a subtle race condition in I_DIRTY clearing
  cdc-acm: memory leak in error case
  genhd: check for int overflow in disk_expand_part_tbl()
  USB: cdc-acm: check for valid interfaces
  ALSA: hda - Fix wrong gpio_dir & gpio_mask hint setups for IDT/STAC codecs
  ALSA: hda - using uninitialized data
  ALSA: usb-audio: extend KEF X300A FU 10 tweak to Arcam rPAC
  driver core: Fix unbalanced device reference in drivers_probe
  x86, vdso: Use asm volatile in __getcpu
  x86_64, vdso: Fix the vdso address randomization algorithm
  HID: Add a new id 0x501a for Genius MousePen i608X
  HID: add battery quirk for USB_DEVICE_ID_APPLE_ALU_WIRELESS_2011_ISO keyboard
  HID: roccat: potential out of bounds in pyra_sysfs_write_settings()
  HID: i2c-hid: prevent buffer overflow in early IRQ
  HID: i2c-hid: fix race condition reading reports
  iommu/vt-d: Fix an off-by-one bug in __domain_mapping()
  UBI: Fix double free after do_sync_erase()
  UBI: Fix invalid vfree()
  pstore-ram: Allow optional mapping with pgprot_noncached
  pstore-ram: Fix hangs by using write-combine mappings
  PCI: Restore detection of read-only BARs
  ASoC: dwc: Ensure FIFOs are flushed to prevent channel swap
  ASoC: max98090: Fix ill-defined sidetone route
  ASoC: sigmadsp: Refuse to load firmware files with a non-supported version
  ath5k: fix hardware queue index assignment
  swiotlb-xen: pass dev_addr to swiotlb_tbl_unmap_single
  can: peak_usb: fix memset() usage
  can: peak_usb: fix cleanup sequence order in case of error during init
  ath9k: fix BE/BK queue order
  ath9k_hw: fix hardware queue allocation
  ocfs2: fix journal commit deadlock
  Linux 3.10.64
  Btrfs: fix fs corruption on transaction abort if device supports discard
  Btrfs: do not move em to modified list when unpinning
  eCryptfs: Remove buggy and unnecessary write in file name decode routine
  eCryptfs: Force RO mount when encrypted view is enabled
  udf: Verify symlink size before loading it
  exit: pidns: alloc_pid() leaks pid_namespace if child_reaper is exiting
  ncpfs: return proper error from NCP_IOC_SETROOT ioctl
  crypto: af_alg - fix backlog handling
  userns: Unbreak the unprivileged remount tests
  userns: Allow setting gid_maps without privilege when setgroups is disabled
  userns: Add a knob to disable setgroups on a per user namespace basis
  userns: Rename id_map_mutex to userns_state_mutex
  userns: Only allow the creator of the userns unprivileged mappings
  userns: Check euid no fsuid when establishing an unprivileged uid mapping
  userns: Don't allow unprivileged creation of gid mappings
  userns: Don't allow setgroups until a gid mapping has been setablished
  userns: Document what the invariant required for safe unprivileged mappings.
  groups: Consolidate the setgroups permission checks
  umount: Disallow unprivileged mount force
  mnt: Update unprivileged remount test
  mnt: Implicitly add MNT_NODEV on remount when it was implicitly added by mount
  mac80211: free management frame keys when removing station
  mac80211: fix multicast LED blinking and counter
  KEYS: Fix stale key registration at error path
  isofs: Fix unchecked printing of ER records
  x86/tls: Don't validate lm in set_thread_area() after all
  dm space map metadata: fix sm_bootstrap_get_nr_blocks()
  dm bufio: fix memleak when using a dm_buffer's inline bio
  nfs41: fix nfs4_proc_layoutget error handling
  megaraid_sas: corrected return of wait_event from abort frame path
  mmc: block: add newline to sysfs display of force_ro
  mfd: tc6393xb: Fail ohci suspend if full state restore is required
  md/bitmap: always wait for writes on unplug.
  x86, kvm: Clear paravirt_enabled on KVM guests for espfix32's benefit
  x86_64, switch_to(): Load TLS descriptors before switching DS and ES
  x86/tls: Disallow unusual TLS segments
  x86/tls: Validate TLS entries to protect espfix
  isofs: Fix infinite looping over CE entries
  Linux 3.10.63
  ALSA: usb-audio: Don't resubmit pending URBs at MIDI error recovery
  powerpc: 32 bit getcpu VDSO function uses 64 bit instructions
  ARM: sched_clock: Load cycle count after epoch stabilizes
  igb: bring link up when PHY is powered up
  ext2: Fix oops in ext2_get_block() called from ext2_quota_write()
  nEPT: Nested INVEPT
  net: sctp: use MAX_HEADER for headroom reserve in output path
  net: mvneta: fix Tx interrupt delay
  rtnetlink: release net refcnt on error in do_setlink()
  net/mlx4_core: Limit count field to 24 bits in qp_alloc_res
  tg3: fix ring init when there are more TX than RX channels
  ipv6: gre: fix wrong skb->protocol in WCCP
  sata_fsl: fix error handling of irq_of_parse_and_map
  ahci: disable MSI on SAMSUNG 0xa800 SSD
  AHCI: Add DeviceIDs for Sunrise Point-LP SATA controller
  media: smiapp: Only some selection targets are settable
  drm/i915: Unlock panel even when LVDS is disabled
  drm/radeon: kernel panic in drm_calc_vbltimestamp_from_scanoutpos with 3.18.0-rc6
  i2c: davinci: generate STP always when NACK is received
  i2c: omap: fix i207 errata handling
  i2c: omap: fix NACK and Arbitration Lost irq handling
  xen-netfront: Remove BUGs on paged skb data which crosses a page boundary
  mm: fix swapoff hang after page migration and fork
  mm: frontswap: invalidate expired data on a dup-store failure
  Linux 3.10.62
  nfsd: Fix ACL null pointer deref
  powerpc/powernv: Honor the generic "no_64bit_msi" flag
  bnx2fc: do not add shared skbs to the fcoe_rx_list
  nfsd4: fix leak of inode reference on delegation failure
  nfsd: Fix slot wake up race in the nfsv4.1 callback code
  rt2x00: do not align payload on modern H/W
  can: dev: avoid calling kfree_skb() from interrupt context
  spi: dw: Fix dynamic speed change.
  iser-target: Handle DEVICE_REMOVAL event on network portal listener correctly
  target: Don't call TFO->write_pending if data_length == 0
  srp-target: Retry when QP creation fails with ENOMEM
  Input: xpad - use proper endpoint type
  ARM: 8222/1: mvebu: enable strex backoff delay
  ARM: 8216/1: xscale: correct auxiliary register in suspend/resume
  ALSA: usb-audio: Add ctrl message delay quirk for Marantz/Denon devices
  can: esd_usb2: fix memory leak on disconnect
  USB: xhci: don't start a halted endpoint before its new dequeue is set
  usb-quirks: Add reset-resume quirk for MS Wireless Laser Mouse 6000
  usb: serial: ftdi_sio: add PIDs for Matrix Orbital products
  USB: serial: cp210x: add IDs for CEL MeshConnect USB Stick
  USB: keyspan: fix tty line-status reporting
  USB: keyspan: fix overrun-error reporting
  USB: ssu100: fix overrun-error reporting
  iio: Fix IIO_EVENT_CODE_EXTRACT_DIR bit mask
  powerpc/pseries: Fix endiannes issue in RTAS call from xmon
  powerpc/pseries: Honor the generic "no_64bit_msi" flag
  of/base: Fix PowerPC address parsing hack
  ASoC: wm_adsp: Avoid attempt to free buffers that might still be in use
  ASoC: sgtl5000: Fix SMALL_POP bit definition
  PCI/MSI: Add device flag indicating that 64-bit MSIs don't work
  ipx: fix locking regression in ipx_sendmsg and ipx_recvmsg
  pptp: fix stack info leak in pptp_getname()
  qmi_wwan: Add support for HP lt4112 LTE/HSPA+ Gobi 4G Modem
  ieee802154: fix error handling in ieee802154fake_probe()
  ipv4: Fix incorrect error code when adding an unreachable route
  inetdevice: fixed signed integer overflow
  sparc64: Fix constraints on swab helpers.
  uprobes, x86: Fix _TIF_UPROBE vs _TIF_NOTIFY_RESUME
  x86, mm: Set NX across entire PMD at boot
  x86: Require exact match for 'noxsave' command line option
  x86_64, traps: Rework bad_iret
  x86_64, traps: Stop using IST for #SS
  x86_64, traps: Fix the espfix64 #DF fixup and rewrite it in C
  MIPS: Loongson: Make platform serial setup always built-in.
  MIPS: oprofile: Fix backtrace on 64-bit kernel
  Linux 3.10.61
  mm: memcg: handle non-error OOM situations more gracefully
  mm: memcg: do not trap chargers with full callstack on OOM
  mm: memcg: rework and document OOM waiting and wakeup
  mm: memcg: enable memcg OOM killer only for user faults
  x86: finish user fault error path with fatal signal
  arch: mm: pass userspace fault flag to generic fault handler
  arch: mm: do not invoke OOM killer on kernel fault OOM
  arch: mm: remove obsolete init OOM protection
  mm: invoke oom-killer from remaining unconverted page fault handlers
  net: sctp: fix skb_over_panic when receiving malformed ASCONF chunks
  net: sctp: fix panic on duplicate ASCONF chunks
  net: sctp: fix remote memory pressure from excessive queueing
  KVM: x86: Don't report guest userspace emulation error to userspace
  SCSI: hpsa: fix a race in cmd_free/scsi_done
  net/mlx4_en: Fix BlueFlame race
  ARM: Correct BUG() assembly to ensure it is endian-agnostic
  perf/x86/intel: Use proper dTLB-load-misses event on IvyBridge
  mei: bus: fix possible boundaries violation
  perf: Handle compat ioctl
  MIPS: Fix forgotten preempt_enable() when CPU has inclusive pcaches
  dell-wmi: Fix access out of memory
  ARM: probes: fix instruction fetch order with <asm/opcodes.h>
  br: fix use of ->rx_handler_data in code executed on non-rx_handler path
  netfilter: nf_nat: fix oops on netns removal
  netfilter: xt_bpf: add mising opaque struct sk_filter definition
  netfilter: nf_log: release skbuff on nlmsg put failure
  netfilter: nfnetlink_log: fix maximum packet length logged to userspace
  netfilter: nf_log: account for size of NLMSG_DONE attribute
  ipc: always handle a new value of auto_msgmni
  clocksource: Remove "weak" from clocksource_default_clock() declaration
  kgdb: Remove "weak" from kgdb_arch_pc() declaration
  media: ttusb-dec: buffer overflow in ioctl
  NFSv4: Fix races between nfs_remove_bad_delegation() and delegation return
  nfs: Fix use of uninitialized variable in nfs_getattr()
  NFS: Don't try to reclaim delegation open state if recovery failed
  NFSv4: Ensure that we remove NFSv4.0 delegations when state has expired
  Input: alps - allow up to 2 invalid packets without resetting device
  Input: alps - ignore potential bare packets when device is out of sync
  dm raid: ensure superblock's size matches device's logical block size
  dm btree: fix a recursion depth bug in btree walking code
  block: Fix computation of merged request priority
  parisc: Use compat layer for msgctl, shmat, shmctl and semtimedop syscalls
  scsi: only re-lock door after EH on devices that were reset
  nfs: fix pnfs direct write memory leak
  firewire: cdev: prevent kernel stack leaking into ioctl arguments
  arm64: __clear_user: handle exceptions on strb
  ARM: 8198/1: make kuser helpers depend on MMU
  drm/radeon: add missing crtc unlock when setting up the MC
  mac80211: fix use-after-free in defragmentation
  macvtap: Fix csum_start when VLAN tags are present
  iwlwifi: configure the LTR
  libceph: do not crash on large auth tickets
  xtensa: re-wire umount syscall to sys_oldumount
  ALSA: usb-audio: Fix memory leak in FTU quirk
  ahci: disable MSI instead of NCQ on Samsung pci-e SSDs on macbooks
  ahci: Add Device IDs for Intel Sunrise Point PCH
  audit: keep inode pinned
  x86, x32, audit: Fix x32's AUDIT_ARCH wrt audit
  sparc32: Implement xchg and atomic_xchg using ATOMIC_HASH locks
  sparc64: Do irq_{enter,exit}() around generic_smp_call_function*().
  sparc64: Fix crashes in schizo_pcierr_intr_other().
  sunvdc: don't call VD_OP_GET_VTOC
  vio: fix reuse of vio_dring slot
  sunvdc: limit each sg segment to a page
  sunvdc: compute vdisk geometry from capacity
  sunvdc: add cdrom and v1.1 protocol support
  net: sctp: fix memory leak in auth key management
  net: sctp: fix NULL pointer dereference in af->from_addr_param on malformed packet
  gre6: Move the setting of dev->iflink into the ndo_init functions.
  ip6_tunnel: Use ip6_tnl_dev_init as the ndo_init function.
  Linux 3.10.60
  libceph: ceph-msgr workqueue needs a resque worker
  Btrfs: fix kfree on list_head in btrfs_lookup_csums_range error cleanup
  of: Fix overflow bug in string property parsing functions
  sysfs: driver core: Fix glue dir race condition by gdp_mutex
  i2c: at91: don't account as iowait
  acer-wmi: Add acpi_backlight=video quirk for the Acer KAV80
  rbd: Fix error recovery in rbd_obj_read_sync()
  drm/radeon: remove invalid pci id
  usb: gadget: udc: core: fix kernel oops with soft-connect
  usb: gadget: function: acm: make f_acm pass USB20CV Chapter9
  usb: dwc3: gadget: fix set_halt() bug with pending transfers
  crypto: algif - avoid excessive use of socket buffer in skcipher
  mm: Remove false WARN_ON from pagecache_isize_extended()
  x86, apic: Handle a bad TSC more gracefully
  posix-timers: Fix stack info leak in timer_create()
  mac80211: fix typo in starting baserate for rts_cts_rate_idx
  PM / Sleep: fix recovery during resuming from hibernation
  tty: Fix high cpu load if tty is unreleaseable
  quota: Properly return errors from dquot_writeback_dquots()
  ext3: Don't check quota format when there are no quota files
  nfsd4: fix crash on unknown operation number
  cpc925_edac: Report UE events properly
  e7xxx_edac: Report CE events properly
  i3200_edac: Report CE events properly
  i82860_edac: Report CE events properly
  scsi: Fix error handling in SCSI_IOCTL_SEND_COMMAND
  lib/bitmap.c: fix undefined shift in __bitmap_shift_{left|right}()
  cgroup/kmemleak: add kmemleak_free() for cgroup deallocations.
  usb: Do not allow usb_alloc_streams on unconfigured devices
  USB: opticon: fix non-atomic allocation in write path
  usb-storage: handle a skipped data phase
  spi: pxa2xx: toggle clocks on suspend if not disabled by runtime PM
  spi: pl022: Fix incorrect dma_unmap_sg
  usb: dwc3: gadget: Properly initialize LINK TRB
  wireless: rt2x00: add new rt2800usb device
  USB: option: add Haier CE81B CDMA modem
  usb: option: add support for Telit LE910
  USB: cdc-acm: only raise DTR on transitions from B0
  USB: cdc-acm: add device id for GW Instek AFG-2225
  usb: serial: ftdi_sio: add "bricked" FTDI device PID
  usb: serial: ftdi_sio: add Awinda Station and Dongle products
  USB: serial: cp210x: add Silicon Labs 358x VID and PID
  serial: Fix divide-by-zero fault in uart_get_divisor()
  staging:iio:ade7758: Remove "raw" from channel name
  staging:iio:ade7758: Fix check if channels are enabled in prenable
  staging:iio:ade7758: Fix NULL pointer deref when enabling buffer
  staging:iio:ad5933: Drop "raw" from channel names
  staging:iio:ad5933: Fix NULL pointer deref when enabling buffer
  OOM, PM: OOM killed task shouldn't escape PM suspend
  freezer: Do not freeze tasks killed by OOM killer
  ext4: fix oops when loading block bitmap failed
  cpufreq: intel_pstate: Fix setting max_perf_pct in performance policy
  ext4: fix overflow when updating superblock backups after resize
  ext4: check s_chksum_driver when looking for bg csum presence
  ext4: fix reservation overflow in ext4_da_write_begin
  ext4: add ext4_iget_normal() which is to be used for dir tree lookups
  ext4: grab missed write_count for EXT4_IOC_SWAP_BOOT
  ext4: don't check quota format when there are no quota files
  ext4: check EA value offset when loading
  jbd2: free bh when descriptor block checksum fails
  MIPS: tlbex: Properly fix HUGE TLB Refill exception handler
  target: Fix APTPL metadata handling for dynamic MappedLUNs
  target: Fix queue full status NULL pointer for SCF_TRANSPORT_TASK_SENSE
  qla_target: don't delete changed nacls
  ARC: Update order of registers in KGDB to match GDB 7.5
  ARC: [nsimosci] Allow "headless" models to boot
  KVM: x86: Emulator fixes for eip canonical checks on near branches
  KVM: x86: Fix wrong masking on relative jump/call
  kvm: x86: don't kill guest on unknown exit reason
  KVM: x86: Check non-canonical addresses upon WRMSR
  KVM: x86: Improve thread safety in pit
  KVM: x86: Prevent host from panicking on shared MSR writes.
  kvm: fix excessive pages un-pinning in kvm_iommu_map error path.
  media: tda7432: Fix setting TDA7432_MUTE bit for TDA7432_RF register
  media: ds3000: fix LNB supply voltage on Tevii S480 on initialization
  media: em28xx-v4l: give back all active video buffers to the vb2 core properly on streaming stop
  media: v4l2-common: fix overflow in v4l_bound_align_image()
  drm/nouveau/bios: memset dcb struct to zero before parsing
  drm/tilcdc: Fix the error path in tilcdc_load()
  drm/ast: Fix HW cursor image
  Input: i8042 - quirks for Fujitsu Lifebook A544 and Lifebook AH544
  Input: i8042 - add noloop quirk for Asus X750LN
  framebuffer: fix border color
  modules, lock around setting of MODULE_STATE_UNFORMED
  dm log userspace: fix memory leak in dm_ulog_tfr_init failure path
  block: fix alignment_offset math that assumes io_min is a power-of-2
  drbd: compute the end before rb_insert_augmented()
  dm bufio: update last_accessed when relinking a buffer
  virtio_pci: fix virtio spec compliance on restore
  selinux: fix inode security list corruption
  pstore: Fix duplicate {console,ftrace}-efi entries
  mfd: rtsx_pcr: Fix MSI enable error handling
  mnt: Prevent pivot_root from creating a loop in the mount tree
  UBI: add missing kmem_cache_free() in process_pool_aeb error path
  random: add and use memzero_explicit() for clearing data
  crypto: more robust crypto_memneq
  fix misuses of f_count() in ppp and netlink
  kill wbuf_queued/wbuf_dwork_lock
  ALSA: pcm: Zero-clear reserved fields of PCM status ioctl in compat mode
  evm: check xattr value length and type in evm_inode_setxattr()
  x86, pageattr: Prevent overflow in slow_virt_to_phys() for X86_PAE
  x86_64, entry: Fix out of bounds read on sysenter
  x86_64, entry: Filter RFLAGS.NT on entry from userspace
  x86, flags: Rename X86_EFLAGS_BIT1 to X86_EFLAGS_FIXED
  x86, fpu: shift drop_init_fpu() from save_xstate_sig() to handle_signal()
  x86, fpu: __restore_xstate_sig()->math_state_restore() needs preempt_disable()
  x86: Reject x32 executables if x32 ABI not supported
  vfs: fix data corruption when blocksize < pagesize for mmaped data
  UBIFS: fix free log space calculation
  UBIFS: fix a race condition
  UBIFS: remove mst_mutex
  fs: Fix theoretical division by 0 in super_cache_scan().
  fs: make cont_expand_zero interruptible
  mmc: rtsx_pci_sdmmc: fix incorrect last byte in R2 response
  libata-sff: Fix controllers with no ctl port
  pata_serverworks: disable 64-KB DMA transfers on Broadcom OSB4 IDE Controller
  Revert "percpu: free percpu allocation info for uniprocessor system"
  lockd: Try to reconnect if statd has moved
  drivers/net: macvtap and tun depend on INET
  ipv4: dst_entry leak in ip_send_unicast_reply()
  ax88179_178a: fix bonding failure
  ipv4: fix nexthop attlen check in fib_nh_match
  tracing/syscalls: Ignore numbers outside NR_syscalls' range
  Linux 3.10.59
  ecryptfs: avoid to access NULL pointer when write metadata in xattr
  ARM: at91/PMC: don't forget to write PMC_PCDR register to disable clocks
  ALSA: usb-audio: Add support for Steinberg UR22 USB interface
  ALSA: emu10k1: Fix deadlock in synth voice lookup
  ALSA: pcm: use the same dma mmap codepath both for arm and arm64
  arm64: compat: fix compat types affecting struct compat_elf_prpsinfo
  spi: dw-mid: terminate ongoing transfers at exit
  kernel: add support for gcc 5
  fanotify: enable close-on-exec on events' fd when requested in fanotify_init()
  mm: clear __GFP_FS when PF_MEMALLOC_NOIO is set
  Bluetooth: Fix issue with USB suspend in btusb driver
  Bluetooth: Fix HCI H5 corrupted ack value
  rt2800: correct BBP1_TX_POWER_CTRL mask
  PCI: Generate uppercase hex for modalias interface class
  PCI: Increase IBM ipr SAS Crocodile BARs to at least system page size
  iwlwifi: Add missing PCI IDs for the 7260 series
  NFSv4.1: Fix an NFSv4.1 state renewal regression
  NFSv4: fix open/lock state recovery error handling
  NFSv4: Fix lock recovery when CREATE_SESSION/SETCLIENTID_CONFIRM fails
  lzo: check for length overrun in variable length encoding.
  Revert "lzo: properly check for overruns"
  Documentation: lzo: document part of the encoding
  m68k: Disable/restore interrupts in hwreg_present()/hwreg_write()
  Drivers: hv: vmbus: Fix a bug in vmbus_open()
  Drivers: hv: vmbus: Cleanup vmbus_establish_gpadl()
  Drivers: hv: vmbus: Cleanup vmbus_teardown_gpadl()
  Drivers: hv: vmbus: Cleanup vmbus_post_msg()
  firmware_class: make sure fw requests contain a name
  qla2xxx: Use correct offset to req-q-out for reserve calculation
  mptfusion: enable no_write_same for vmware scsi disks
  be2iscsi: check ip buffer before copying
  regmap: fix NULL pointer dereference in _regmap_write/read
  regmap: debugfs: fix possbile NULL pointer dereference
  spi: dw-mid: check that DMA was inited before exit
  spi: dw-mid: respect 8 bit mode
  x86/intel/quark: Switch off CR4.PGE so TLB flush uses CR3 instead
  kvm: don't take vcpu mutex for obviously invalid vcpu ioctls
  KVM: s390: unintended fallthrough for external call
  kvm: x86: fix stale mmio cache bug
  fs: Add a missing permission check to do_umount
  Btrfs: fix race in WAIT_SYNC ioctl
  Btrfs: fix build_backref_tree issue with multiple shared blocks
  Btrfs: try not to ENOSPC on log replay
  Linux 3.10.58
  USB: cp210x: add support for Seluxit USB dongle
  USB: serial: cp210x: added Ketra N1 wireless interface support
  USB: Add device quirk for ASUS T100 Base Station keyboard
  ipv6: reallocate addrconf router for ipv6 address when lo device up
  tcp: fixing TLP's FIN recovery
  sctp: handle association restarts when the socket is closed.
  ip6_gre: fix flowi6_proto value in xmit path
  hyperv: Fix a bug in netvsc_start_xmit()
  tg3: Allow for recieve of full-size 8021AD frames
  tg3: Work around HW/FW limitations with vlan encapsulated frames
  l2tp: fix race while getting PMTU on PPP pseudo-wire
  openvswitch: fix panic with multiple vlan headers
  packet: handle too big packets for PACKET_V3
  tcp: fix tcp_release_cb() to dispatch via address family for mtu_reduced()
  sit: Fix ipip6_tunnel_lookup device matching criteria
  myri10ge: check for DMA mapping errors
  Linux 3.10.57
  cpufreq: ondemand: Change the calculation of target frequency
  cpufreq: Fix wrong time unit conversion
  nl80211: clear skb cb before passing to netlink
  drbd: fix regression 'out of mem, failed to invoke fence-peer helper'
  jiffies: Fix timeval conversion to jiffies
  md/raid5: disable 'DISCARD' by default due to safety concerns.
  media: vb2: fix VBI/poll regression
  mm: numa: Do not mark PTEs pte_numa when splitting huge pages
  mm, thp: move invariant bug check out of loop in __split_huge_page_map
  ring-buffer: Fix infinite spin in reading buffer
  init/Kconfig: Fix HAVE_FUTEX_CMPXCHG to not break up the EXPERT menu
  perf: fix perf bug in fork()
  udf: Avoid infinite loop when processing indirect ICBs
  Linux 3.10.56
  vm_is_stack: use for_each_thread() rather then buggy while_each_thread()
  oom_kill: add rcu_read_lock() into find_lock_task_mm()
  oom_kill: has_intersects_mems_allowed() needs rcu_read_lock()
  oom_kill: change oom_kill.c to use for_each_thread()
  introduce for_each_thread() to replace the buggy while_each_thread()
  kernel/fork.c:copy_process(): unify CLONE_THREAD-or-thread_group_leader code
  arm: multi_v7_defconfig: Enable Zynq UART driver
  ext2: Fix fs corruption in ext2_get_xip_mem()
  serial: 8250_dma: check the result of TX buffer mapping
  ARM: 7748/1: oabi: handle faults when loading swi instruction from userspace
  netfilter: nf_conntrack: avoid large timeout for mid-stream pickup
  PM / sleep: Use valid_state() for platform-dependent sleep states only
  PM / sleep: Add state field to pm_states[] entries
  ipvs: fix ipv6 hook registration for local replies
  ipvs: Maintain all DSCP and ECN bits for ipv6 tun forwarding
  ipvs: avoid netns exit crash on ip_vs_conn_drop_conntrack
  md/raid1: fix_read_error should act on all non-faulty devices.
  media: cx18: fix kernel oops with tda8290 tuner
  Fix nasty 32-bit overflow bug in buffer i/o code.
  perf kmem: Make it work again on non NUMA machines
  perf: Fix a race condition in perf_remove_from_context()
  alarmtimer: Lock k_itimer during timer callback
  alarmtimer: Do not signal SIGEV_NONE timers
  parisc: Only use -mfast-indirect-calls option for 32-bit kernel builds
  powerpc/perf: Fix ABIv2 kernel backtraces
  sched: Fix unreleased llc_shared_mask bit during CPU hotplug
  ocfs2/dlm: do not get resource spinlock if lockres is new
  nilfs2: fix data loss with mmap()
  fs/notify: don't show f_handle if exportfs_encode_inode_fh failed
  fsnotify/fdinfo: use named constants instead of hardcoded values
  kcmp: fix standard comparison bug
  Revert "mac80211: disable uAPSD if all ACs are under ACM"
  usb: dwc3: core: fix ordering for PHY suspend
  usb: dwc3: core: fix order of PM runtime calls
  usb: host: xhci: fix compliance mode workaround
  genhd: fix leftover might_sleep() in blk_free_devt()
  lockd: fix rpcbind crash on lockd startup failure
  rtlwifi: rtl8192cu: Add new ID
  percpu: perform tlb flush after pcpu_map_pages() failure
  percpu: fix pcpu_alloc_pages() failure path
  percpu: free percpu allocation info for uniprocessor system
  ata_piix: Add Device IDs for Intel 9 Series PCH
  Input: i8042 - add nomux quirk for Avatar AVIU-145A6
  Input: i8042 - add Fujitsu U574 to no_timeout dmi table
  Input: atkbd - do not try 'deactivate' keyboard on any LG laptops
  Input: elantech - fix detection of touchpad on ASUS s301l
  Input: synaptics - add support for ForcePads
  Input: serport - add compat handling for SPIOCSTYPE ioctl
  dm crypt: fix access beyond the end of allocated space
  block: Fix dev_t minor allocation lifetime
  workqueue: apply __WQ_ORDERED to create_singlethread_workqueue()
  Revert "iwlwifi: dvm: don't enable CTS to self"
  SCSI: libiscsi: fix potential buffer overrun in __iscsi_conn_send_pdu
  NFC: microread: Potential overflows in microread_target_discovered()
  iscsi-target: Fix memory corruption in iscsit_logout_post_handler_diffcid
  iscsi-target: avoid NULL pointer in iscsi_copy_param_list failure
  Target/iser: Don't put isert_conn inside disconnected handler
  Target/iser: Get isert_conn reference once got to connected_handler
  iio:inkern: fix overwritten -EPROBE_DEFER in of_iio_channel_get_by_name
  iio:magnetometer: bugfix magnetometers gain values
  iio: adc: ad_sigma_delta: Fix indio_dev->trig assignment
  iio: st_sensors: Fix indio_dev->trig assignment
  iio: meter: ade7758: Fix indio_dev->trig assignment
  iio: inv_mpu6050: Fix indio_dev->trig assignment
  iio: gyro: itg3200: Fix indio_dev->trig assignment
  iio:trigger: modify return value for iio_trigger_get
  CIFS: Fix SMB2 readdir error handling
  CIFS: Fix directory rename error
  ASoC: davinci-mcasp: Correct rx format unit configuration
  shmem: fix nlink for rename overwrite directory
  x86 early_ioremap: Increase FIX_BTMAPS_SLOTS to 8
  KVM: x86: handle idiv overflow at kvm_write_tsc
  regmap: Fix handling of volatile registers for format_write() chips
  ACPICA: Update to GPIO region handler interface.
  MIPS: mcount: Adjust stack pointer for static trace in MIPS32
  MIPS: ZBOOT: add missing <linux/string.h> include
  ARM: 8165/1: alignment: don't break misaligned NEON load/store
  ARM: 7897/1: kexec: Use the right ISA for relocate_new_kernel
  ARM: 8133/1: use irq_set_affinity with force=false when migrating irqs
  ARM: 8128/1: abort: don't clear the exclusive monitors
  NFSv4: Fix another bug in the close/open_downgrade code
  NFSv4: nfs4_state_manager() vs. nfs_server_remove_lists()
  usb:hub set hub->change_bits when over-current happens
  usb: dwc3: omap: fix ordering for runtime pm calls
  USB: EHCI: unlink QHs even after the controller has stopped
  USB: storage: Add quirks for Entrega/Xircom USB to SCSI converters
  USB: storage: Add quirk for Ariston Technologies iConnect USB to SCSI adapter
  USB: storage: Add quirk for Adaptec USBConnect 2000 USB-to-SCSI Adapter
  storage: Add single-LUN quirk for Jaz USB Adapter
  usb: hub: take hub->hdev reference when processing from eventlist
  xhci: fix oops when xhci resumes from hibernate with hw lpm capable devices
  xhci: Fix null pointer dereference if xhci initialization fails
  USB: zte_ev: fix removed PIDs
  USB: ftdi_sio: add support for NOVITUS Bono E thermal printer
  USB: sierra: add 1199:68AA device ID
  USB: sierra: avoid CDC class functions on "68A3" devices
  USB: zte_ev: remove duplicate Qualcom PID
  USB: zte_ev: remove duplicate Gobi PID
  Revert "USB: option,zte_ev: move most ZTE CDMA devices to zte_ev"
  USB: option: add VIA Telecom CDS7 chipset device id
  USB: option: reduce interrupt-urb logging verbosity
  USB: serial: fix potential heap buffer overflow
  USB: sisusb: add device id for Magic Control USB video
  USB: serial: fix potential stack buffer overflow
  USB: serial: pl2303: add device id for ztek device
  xtensa: fix a6 and a7 handling in fast_syscall_xtensa
  xtensa: fix TLBTEMP_BASE_2 region handling in fast_second_level_miss
  xtensa: fix access to THREAD_RA/THREAD_SP/THREAD_DS
  xtensa: fix address checks in dma_{alloc,free}_coherent
  xtensa: replace IOCTL code definitions with constants
  drm/radeon: add connector quirk for fujitsu board
  drm/vmwgfx: Fix a potential infinite spin waiting for fifo idle
  drm/ast: AST2000 cannot be detected correctly
  drm/i915: Wait for vblank before enabling the TV encoder
  drm/i915: Remove bogus __init annotation from DMI callbacks
  HID: logitech-dj: prevent false errors to be shown
  HID: magicmouse: sanity check report size in raw_event() callback
  HID: picolcd: sanity check report size in raw_event() callback
  cfq-iosched: Fix wrong children_weight calculation
  ALSA: pcm: fix fifo_size frame calculation
  ALSA: hda - Fix invalid pin powermap without jack detection
  ALSA: hda - Fix COEF setups for ALC1150 codec
  ALSA: core: fix buffer overflow in snd_info_get_line()
  arm64: ptrace: fix compat hardware watchpoint reporting
  trace: Fix epoll hang when we race with new entries
  i2c: at91: Fix a race condition during signal handling in at91_do_twi_xfer.
  i2c: at91: add bound checking on SMBus block length bytes
  arm64: flush TLS registers during exec
  ibmveth: Fix endian issues with rx_no_buffer statistic
  ahci: add pcid for Marvel 0x9182 controller
  ahci: Add Device IDs for Intel 9 Series PCH
  pata_scc: propagate return value of scc_wait_after_reset
  drm/i915: read HEAD register back in init_ring_common() to enforce ordering
  drm/radeon: load the lm63 driver for an lm64 thermal chip.
  drm/ttm: Choose a pool to shrink correctly in ttm_dma_pool_shrink_scan().
  drm/ttm: Fix possible division by 0 in ttm_dma_pool_shrink_scan().
  drm/tilcdc: fix double kfree
  drm/tilcdc: fix release order on exit
  drm/tilcdc: panel: fix leak when unloading the module
  drm/tilcdc: tfp410: fix dangling sysfs connector node
  drm/tilcdc: slave: fix dangling sysfs connector node
  drm/tilcdc: panel: fix dangling sysfs connector node
  carl9170: fix sending URBs with wrong type when using full-speed
  Linux 3.10.55
  libceph: gracefully handle large reply messages from the mon
  libceph: rename ceph_msg::front_max to front_alloc_len
  tpm: Provide a generic means to override the chip returned timeouts
  vfs: fix bad hashing of dentries
  dcache.c: get rid of pointless macros
  IB/srp: Fix deadlock between host removal and multipathd
  blkcg: don't call into policy draining if root_blkg is already gone
  mtd: nand: omap: Fix 1-bit Hamming code scheme, omap_calculate_ecc()
  mtd/ftl: fix the double free of the buffers allocated in build_maps()
  CIFS: Fix wrong restart readdir for SMB1
  CIFS: Fix wrong filename length for SMB2
  CIFS: Fix wrong directory attributes after rename
  CIFS: Possible null ptr deref in SMB2_tcon
  CIFS: Fix async reading on reconnects
  CIFS: Fix STATUS_CANNOT_DELETE error mapping for SMB2
  libceph: do not hard code max auth ticket len
  libceph: add process_one_ticket() helper
  libceph: set last_piece in ceph_msg_data_pages_cursor_init() correctly
  md/raid1,raid10: always abort recover on write error.
  xfs: don't zero partial page cache pages during O_DIRECT writes
  xfs: don't zero partial page cache pages during O_DIRECT writes
  xfs: don't dirty buffers beyond EOF
  xfs: quotacheck leaves dquot buffers without verifiers
  RDMA/iwcm: Use a default listen backlog if needed
  md/raid10: Fix memory leak when raid10 reshape completes.
  md/raid10: fix memory leak when reshaping a RAID10.
  md/raid6: avoid data corruption during recovery of double-degraded RAID6
  Bluetooth: Avoid use of session socket after the session gets freed
  Bluetooth: never linger on process exit
  mnt: Add tests for unprivileged remount cases that have found to be faulty
  mnt: Change the default remount atime from relatime to the existing value
  mnt: Correct permission checks in do_remount
  mnt: Move the test for MNT_LOCK_READONLY from change_mount_flags into do_remount
  mnt: Only change user settable mount flags in remount
  ring-buffer: Up rb_iter_peek() loop count to 3
  ring-buffer: Always reset iterator to reader page
  ACPI / cpuidle: fix deadlock between cpuidle_lock and cpu_hotplug.lock
  ACPI: Run fixed event device notifications in process context
  ACPICA: Utilities: Fix memory leak in acpi_ut_copy_iobject_to_iobject
  bfa: Fix undefined bit shift on big-endian architectures with 32-bit DMA address
  ASoC: pxa-ssp: drop SNDRV_PCM_FMTBIT_S24_LE
  ASoC: max98090: Fix missing free_irq
  ASoC: samsung: Correct I2S DAI suspend/resume ops
  ASoC: wm_adsp: Add missing MODULE_LICENSE
  ASoC: pcm: fix dpcm_path_put in dpcm runtime update
  openrisc: Rework signal handling
  MIPS: Fix accessing to per-cpu data when flushing the cache
  MIPS: OCTEON: make get_system_type() thread-safe
  MIPS: asm: thread_info: Add _TIF_SECCOMP flag
  MIPS: Cleanup flags in syscall flags handlers.
  MIPS: asm/reg.h: Make 32- and 64-bit definitions available at the same time
  MIPS: Remove BUG_ON(!is_fpu_owner()) in do_ade()
  MIPS: tlbex: Fix a missing statement for HUGETLB
  MIPS: Prevent user from setting FCSR cause bits
  MIPS: GIC: Prevent array overrun
  drivers: scsi: storvsc: Correctly handle TEST_UNIT_READY failure
  Drivers: scsi: storvsc: Implement a eh_timed_out handler
  powerpc/pseries: Failure on removing device node
  powerpc/mm: Use read barrier when creating real_pte
  powerpc/mm/numa: Fix break placement
  regulator: arizona-ldo1: remove bypass functionality
  mfd: omap-usb-host: Fix improper mask use.
  kernel/smp.c:on_each_cpu_cond(): fix warning in fallback path
  CAPABILITIES: remove undefined caps from all processes
  tpm: missing tpm_chip_put in tpm_get_random()
  firmware: Do not use WARN_ON(!spin_is_locked())
  spi: omap2-mcspi: Configure hardware when slave driver changes mode
  spi: orion: fix incorrect handling of cell-index DT property
  iommu/amd: Fix cleanup_domain for mass device removal
  media: media-device: Remove duplicated memset() in media_enum_entities()
  media: au0828: Only alt setting logic when needed
  media: xc4000: Fix get_frequency()
  media: xc5000: Fix get_frequency()
  Linux 3.10.54
  USB: fix build error with CONFIG_PM_RUNTIME disabled
  NFSv4: Fix problems with close in the presence of a delegation
  NFSv3: Fix another acl regression
  svcrdma: Select NFSv4.1 backchannel transport based on forward channel
  NFSD: Decrease nfsd_users in nfsd_startup_generic fail
  usb: hub: Prevent hub autosuspend if usbcore.autosuspend is -1
  USB: whiteheat: Added bounds checking for bulk command response
  USB: ftdi_sio: Added PID for new ekey device
  USB: ftdi_sio: add Basic Micro ATOM Nano USB2Serial PID
  ARM: OMAP2+: hwmod: Rearm wake-up interrupts for DT when MUSB is idled
  usb: xhci: amd chipset also needs short TX quirk
  xhci: Treat not finding the event_seg on COMP_STOP the same as COMP_STOP_INVAL
  Staging: speakup: Update __speakup_paste_selection() tty (ab)usage to match vt
  jbd2: fix infinite loop when recovering corrupt journal blocks
  mei: nfc: fix memory leak in error path
  mei: reset client state on queued connect request
  Btrfs: fix csum tree corruption, duplicate and outdated checksums
  hpsa: fix bad -ENOMEM return value in hpsa_big_passthru_ioctl
  x86/efi: Enforce CONFIG_RELOCATABLE for EFI boot stub
  x86_64/vsyscall: Fix warn_bad_vsyscall log output
  x86: don't exclude low BIOS area when allocating address space for non-PCI cards
  drm/radeon: add additional SI pci ids
  ext4: fix BUG_ON in mb_free_blocks()
  kvm: iommu: fix the third parameter of kvm_iommu_put_pages (CVE-2014-3601)
  Revert "KVM: x86: Increase the number of fixed MTRR regs to 10"
  KVM: nVMX: fix "acknowledge interrupt on exit" when APICv is in use
  KVM: x86: always exit on EOIs for interrupts listed in the IOAPIC redir table
  KVM: x86: Inter-privilege level ret emulation is not implemeneted
  crypto: ux500 - make interrupt mode plausible
  serial: core: Preserve termios c_cflag for console resume
  ext4: fix ext4_discard_allocated_blocks() if we can't allocate the pa struct
  drivers/i2c/busses: use correct type for dma_map/unmap
  hwmon: (dme1737) Prevent overflow problem when writing large limits
  hwmon: (ads1015) Fix out-of-bounds array access
  hwmon: (lm85) Fix various errors on attribute writes
  hwmon: (ads1015) Fix off-by-one for valid channel index checking
  hwmon: (gpio-fan) Prevent overflow problem when writing large limits
  hwmon: (lm78) Fix overflow problems seen when writing large temperature limits
  hwmon: (sis5595) Prevent overflow problem when writing large limits
  drm: omapdrm: fix compiler errors
  ARM: OMAP3: Fix choice of omap3_restore_es function in OMAP34XX rev3.1.2 case.
  mei: start disconnect request timer consistently
  ALSA: hda/realtek - Avoid setting wrong COEF on ALC269 & co
  ALSA: hda/ca0132 - Don't try loading firmware at resume when already failed
  ALSA: virtuoso: add Xonar Essence STX II support
  ALSA: hda - fix an external mic jack problem on a HP machine
  USB: Fix persist resume of some SS USB devices
  USB: ehci-pci: USB host controller support for Intel Quark X1000
  USB: serial: ftdi_sio: Add support for new Xsens devices
  USB: serial: ftdi_sio: Annotate the current Xsens PID assignments
  USB: OHCI: don't lose track of EDs when a controller dies
  isofs: Fix unbounded recursion when processing relocated directories
  HID: fix a couple of off-by-ones
  HID: logitech: perform bounds checking on device_id early enough
  stable_kernel_rules: Add pointer to netdev-FAQ for network patches
  Linux 3.10.53
  arch/sparc/math-emu/math_32.c: drop stray break operator
  sparc64: ldc_connect() should not return EINVAL when handshake is in progress.
  sunsab: Fix detection of BREAK on sunsab serial console
  bbc-i2c: Fix BBC I2C envctrl on SunBlade 2000
  sparc64: Guard against flushing openfirmware mappings.
  sparc64: Do not insert non-valid PTEs into the TSB hash table.
  sparc64: Add membar to Niagara2 memcpy code.
  sparc64: Fix huge TSB mapping on pre-UltraSPARC-III cpus.
  sparc64: Don't bark so loudly about 32-bit tasks generating 64-bit fault addresses.
  sparc64: Fix top-level fault handling bugs.
  sparc64: Handle 32-bit tasks properly in compute_effective_address().
  sparc64: Make itc_sync_lock raw
  sparc64: Fix argument sign extension for compat_sys_futex().
  sctp: fix possible seqlock seadlock in sctp_packet_transmit()
  iovec: make sure the caller actually wants anything in memcpy_fromiovecend
  net: Correctly set segment mac_len in skb_segment().
  macvlan: Initialize vlan_features to turn on offload support.
  net: sctp: inherit auth_capable on INIT collisions
  tcp: Fix integer-overflow in TCP vegas
  tcp: Fix integer-overflows in TCP veno
  net: sendmsg: fix NULL pointer dereference
  ip: make IP identifiers less predictable
  inetpeer: get rid of ip_id_count
  bnx2x: fix crash during TSO tunneling
  Linux 3.10.52
  x86/espfix/xen: Fix allocation of pages for paravirt page tables
  lib/btree.c: fix leak of whole btree nodes
  net/l2tp: don't fall back on UDP [get|set]sockopt
  net: mvneta: replace Tx timer with a real interrupt
  net: mvneta: add missing bit descriptions for interrupt masks and causes
  net: mvneta: do not schedule in mvneta_tx_timeout
  net: mvneta: use per_cpu stats to fix an SMP lock up
  net: mvneta: increase the 64-bit rx/tx stats out of the hot path
  Revert "mac80211: move "bufferable MMPDU" check to fix AP mode scan"
  staging: vt6655: Fix Warning on boot handle_irq_event_percpu.
  x86_64/entry/xen: Do not invoke espfix64 on Xen
  x86, espfix: Make it possible to disable 16-bit support
  x86, espfix: Make espfix64 a Kconfig option, fix UML
  x86, espfix: Fix broken header guard
  x86, espfix: Move espfix definitions into a separate header file
  x86-64, espfix: Don't leak bits 31:16 of %esp returning to 16-bit stack
  Revert "x86-64, modify_ldt: Make support for 16-bit segments a runtime option"
  timer: Fix lock inversion between hrtimer_bases.lock and scheduler locks
  printk: rename printk_sched to printk_deferred
  iio: buffer: Fix demux table creation
  staging: vt6655: Fix disassociated messages every 10 seconds
  mm, thp: do not allow thp faults to avoid cpuset restrictions
  scsi: handle flush errors properly
  rapidio/tsi721_dma: fix failure to obtain transaction descriptor
  cfg80211: fix mic_failure tracing
  ARM: 8115/1: LPAE: reduce damage caused by idmap to virtual memory layout
  crypto: af_alg - properly label AF_ALG socket
  Linux 3.10.51
  core, nfqueue, openvswitch: Orphan frags in skb_zerocopy and handle errors
  x86/efi: Include a .bss section within the PE/COFF headers
  s390/ptrace: fix PSW mask check
  Fix gcc-4.9.0 miscompilation of load_balance() in scheduler
  mm: hugetlb: fix copy_hugetlb_page_range()
  x86_32, entry: Store badsys error code in %eax
  hwmon: (smsc47m192) Fix temperature limit and vrm write operations
  parisc: Remove SA_RESTORER define
  coredump: fix the setting of PF_DUMPCORE
  Input: fix defuzzing logic
  slab_common: fix the check for duplicate slab names
  slab_common: Do not check for duplicate slab names
  tracing: Fix wraparound problems in "uptime" trace clock
  blkcg: don't call into policy draining if root_blkg is already gone
  ahci: add support for the Promise FastTrak TX8660 SATA HBA (ahci mode)
  libata: introduce ata_host->n_tags to avoid oops on SAS controllers
  libata: support the ata host which implements a queue depth less than 32
  block: don't assume last put of shared tags is for the host
  block: provide compat ioctl for BLKZEROOUT
  media: tda10071: force modulation to QPSK on DVB-S
  media: hdpvr: fix two audio bugs
  Linux 3.10.50
  ARC: Implement ptrace(PTRACE_GET_THREAD_AREA)
  sched: Fix possible divide by zero in avg_atom() calculation
  locking/mutex: Disable optimistic spinning on some architectures
  PM / sleep: Fix request_firmware() error at resume
  dm cache metadata: do not allow the data block size to change
  dm thin metadata: do not allow the data block size to change
  alarmtimer: Fix bug where relative alarm timers were treated as absolute
  drm/radeon: avoid leaking edid data
  drm/qxl: return IRQ_NONE if it was not our irq
  drm/radeon: set default bl level to something reasonable
  irqchip: gic: Fix core ID calculation when topology is read from DT
  irqchip: gic: Add support for cortex a7 compatible string
  ring-buffer: Fix polling on trace_pipe
  mwifiex: fix Tx timeout issue
  perf/x86/intel: ignore CondChgd bit to avoid false NMI handling
  ipv4: fix buffer overflow in ip_options_compile()
  dns_resolver: Null-terminate the right string
  dns_resolver: assure that dns_query() result is null-terminated
  sunvnet: clean up objects created in vnet_new() on vnet_exit()
  net: pppoe: use correct channel MTU when using Multilink PPP
  net: sctp: fix information leaks in ulpevent layer
  tipc: clear 'next'-pointer of message fragments before reassembly
  be2net: set EQ DB clear-intr bit in be_open()
  netlink: Fix handling of error from netlink_dump().
  net: mvneta: Fix big endian issue in mvneta_txq_desc_csum()
  net: mvneta: fix operation in 10 Mbit/s mode
  appletalk: Fix socket referencing in skb
  tcp: fix false undo corner cases
  igmp: fix the problem when mc leave group
  net: qmi_wwan: add two Sierra Wireless/Netgear devices
  net: qmi_wwan: Add ID for Telewell TW-LTE 4G v2
  ipv4: icmp: Fix pMTU handling for rare case
  tcp: Fix divide by zero when pushing during tcp-repair
  bnx2x: fix possible panic under memory stress
  net: fix sparse warning in sk_dst_set()
  ipv4: irq safe sk_dst_[re]set() and ipv4_sk_update_pmtu() fix
  ipv4: fix dst race in sk_dst_get()
  8021q: fix a potential memory leak
  net: sctp: check proc_dointvec result in proc_sctp_do_auth
  tcp: fix tcp_match_skb_to_sack() for unaligned SACK at end of an skb
  ip_tunnel: fix ip_tunnel_lookup
  shmem: fix splicing from a hole while it's punched
  shmem: fix faulting into a hole, not taking i_mutex
  shmem: fix faulting into a hole while it's punched
  iwlwifi: dvm: don't enable CTS to self
  igb: do a reset on SR-IOV re-init if device is down
  hwmon: (adt7470) Fix writes to temperature limit registers
  hwmon: (da9052) Don't use dash in the name attribute
  hwmon: (da9055) Don't use dash in the name attribute
  tracing: Add ftrace_trace_stack into __trace_puts/__trace_bputs
  tracing: Fix graph tracer with stack tracer on other archs
  fuse: handle large user and group ID
  Bluetooth: Ignore H5 non-link packets in non-active state
  Drivers: hv: util: Fix a bug in the KVP code
  media: gspca_pac7302: Add new usb-id for Genius i-Look 317
  usb: Check if port status is equal to RxDetect

Signed-off-by: Ian Maund <imaund@codeaurora.org>
2015-04-24 18:04:40 -07:00
Johannes Weiner f8a5117916 mm: memcg: handle non-error OOM situations more gracefully
commit 4942642080ea82d99ab5b653abb9a12b7ba31f4a upstream.

Commit 3812c8c8f395 ("mm: memcg: do not trap chargers with full
callstack on OOM") assumed that only a few places that can trigger a
memcg OOM situation do not return VM_FAULT_OOM, like optional page cache
readahead.  But there are many more and it's impractical to annotate
them all.

First of all, we don't want to invoke the OOM killer when the failed
allocation is gracefully handled, so defer the actual kill to the end of
the fault handling as well.  This simplifies the code quite a bit for
added bonus.

Second, since a failed allocation might not be the abrupt end of the
fault, the memcg OOM handler needs to be re-entrant until the fault
finishes for subsequent allocation attempts.  If an allocation is
attempted after the task already OOMed, allow it to bypass the limit so
that it can quickly finish the fault and invoke the OOM killer.

Reported-by: azurIt <azurit@pobox.sk>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-21 09:22:56 -08:00
Johannes Weiner f79d6a4689 mm: memcg: do not trap chargers with full callstack on OOM
commit 3812c8c8f3953921ef18544110dafc3505c1ac62 upstream.

The memcg OOM handling is incredibly fragile and can deadlock.  When a
task fails to charge memory, it invokes the OOM killer and loops right
there in the charge code until it succeeds.  Comparably, any other task
that enters the charge path at this point will go to a waitqueue right
then and there and sleep until the OOM situation is resolved.  The problem
is that these tasks may hold filesystem locks and the mmap_sem; locks that
the selected OOM victim may need to exit.

For example, in one reported case, the task invoking the OOM killer was
about to charge a page cache page during a write(), which holds the
i_mutex.  The OOM killer selected a task that was just entering truncate()
and trying to acquire the i_mutex:

OOM invoking task:
  mem_cgroup_handle_oom+0x241/0x3b0
  mem_cgroup_cache_charge+0xbe/0xe0
  add_to_page_cache_locked+0x4c/0x140
  add_to_page_cache_lru+0x22/0x50
  grab_cache_page_write_begin+0x8b/0xe0
  ext3_write_begin+0x88/0x270
  generic_file_buffered_write+0x116/0x290
  __generic_file_aio_write+0x27c/0x480
  generic_file_aio_write+0x76/0xf0           # takes ->i_mutex
  do_sync_write+0xea/0x130
  vfs_write+0xf3/0x1f0
  sys_write+0x51/0x90
  system_call_fastpath+0x18/0x1d

OOM kill victim:
  do_truncate+0x58/0xa0              # takes i_mutex
  do_last+0x250/0xa30
  path_openat+0xd7/0x440
  do_filp_open+0x49/0xa0
  do_sys_open+0x106/0x240
  sys_open+0x20/0x30
  system_call_fastpath+0x18/0x1d

The OOM handling task will retry the charge indefinitely while the OOM
killed task is not releasing any resources.

A similar scenario can happen when the kernel OOM killer for a memcg is
disabled and a userspace task is in charge of resolving OOM situations.
In this case, ALL tasks that enter the OOM path will be made to sleep on
the OOM waitqueue and wait for userspace to free resources or increase
the group's limit.  But a userspace OOM handler is prone to deadlock
itself on the locks held by the waiting tasks.  For example one of the
sleeping tasks may be stuck in a brk() call with the mmap_sem held for
writing but the userspace handler, in order to pick an optimal victim,
may need to read files from /proc/<pid>, which tries to acquire the same
mmap_sem for reading and deadlocks.

This patch changes the way tasks behave after detecting a memcg OOM and
makes sure nobody loops or sleeps with locks held:

1. When OOMing in a user fault, invoke the OOM killer and restart the
   fault instead of looping on the charge attempt.  This way, the OOM
   victim can not get stuck on locks the looping task may hold.

2. When OOMing in a user fault but somebody else is handling it
   (either the kernel OOM killer or a userspace handler), don't go to
   sleep in the charge context.  Instead, remember the OOMing memcg in
   the task struct and then fully unwind the page fault stack with
   -ENOMEM.  pagefault_out_of_memory() will then call back into the
   memcg code to check if the -ENOMEM came from the memcg, and then
   either put the task to sleep on the memcg's OOM waitqueue or just
   restart the fault.  The OOM victim can no longer get stuck on any
   lock a sleeping task may hold.

Debugged by Michal Hocko.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: azurIt <azurit@pobox.sk>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-21 09:22:56 -08:00
Michal Hocko e033782a26 OOM, PM: OOM killed task shouldn't escape PM suspend
commit 5695be142e203167e3cb515ef86a88424f3524eb upstream.

PM freezer relies on having all tasks frozen by the time devices are
getting frozen so that no task will touch them while they are getting
frozen. But OOM killer is allowed to kill an already frozen task in
order to handle OOM situtation. In order to protect from late wake ups
OOM killer is disabled after all tasks are frozen. This, however, still
keeps a window open when a killed task didn't manage to die by the time
freeze_processes finishes.

Reduce the race window by checking all tasks after OOM killer has been
disabled. This is still not race free completely unfortunately because
oom_killer_disable cannot stop an already ongoing OOM killer so a task
might still wake up from the fridge and get killed without
freeze_processes noticing. Full synchronization of OOM and freezer is,
however, too heavy weight for this highly unlikely case.

Introduce and check oom_kills counter which gets incremented early when
the allocator enters __alloc_pages_may_oom path and only check all the
tasks if the counter changes during the freezing attempt. The counter
is updated so early to reduce the race window since allocator checked
oom_killer_disabled which is set by PM-freezing code. A false positive
will push the PM-freezer into a slow path but that is not a big deal.

Changes since v1
- push the re-check loop out of freeze_processes into
  check_frozen_processes and invert the condition to make the code more
  readable as per Rafael

Fixes: f660daac47 (oom: thaw threads if oom killed thread is frozen before deferring)
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-14 08:47:58 -08:00
Oleg Nesterov d081edee3a oom_kill: add rcu_read_lock() into find_lock_task_mm()
commit 4d4048be8a93769350efa31d2482a038b7de73d0 upstream.

find_lock_task_mm() expects it is called under rcu or tasklist lock, but
it seems that at least oom_unkillable_task()->task_in_mem_cgroup() and
mem_cgroup_out_of_memory()->oom_badness() can call it lockless.

Perhaps we could fix the callers, but this patch simply adds rcu lock
into find_lock_task_mm().  This also allows to simplify a bit one of its
callers, oom_kill_process().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Sergey Dyasly <dserrg@gmail.com>
Cc: Sameer Nanda <snanda@chromium.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mandeep Singh Baines <msb@chromium.org>
Cc: "Ma, Xindong" <xindong.ma@intel.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: "Tu, Xiaobing" <xiaobing.tu@intel.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Li Zefan <lizefan@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05 14:54:15 -07:00
Oleg Nesterov a214c050ee oom_kill: has_intersects_mems_allowed() needs rcu_read_lock()
commit ad96244179fbd55b40c00f10f399bc04739b8e1f upstream.

At least out_of_memory() calls has_intersects_mems_allowed() without
even rcu_read_lock(), this is obviously buggy.

Add the necessary rcu_read_lock().  This means that we can not simply
return from the loop, we need "bool ret" and "break".

While at it, swap the names of task_struct's (the argument and the
local).  This cleans up the code a little bit and avoids the unnecessary
initialization.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Sergey Dyasly <dserrg@gmail.com>
Tested-by: Sergey Dyasly <dserrg@gmail.com>
Reviewed-by: Sameer Nanda <snanda@chromium.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mandeep Singh Baines <msb@chromium.org>
Cc: "Ma, Xindong" <xindong.ma@intel.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: "Tu, Xiaobing" <xiaobing.tu@intel.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Li Zefan <lizefan@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05 14:54:15 -07:00
Oleg Nesterov 4940a48e7d oom_kill: change oom_kill.c to use for_each_thread()
commit 1da4db0cd5c8a31d4468ec906b413e75e604b465 upstream.

Change oom_kill.c to use for_each_thread() rather than the racy
while_each_thread() which can loop forever if we race with exit.

Note also that most users were buggy even if while_each_thread() was
fine, the task can exit even _before_ rcu_read_lock().

Fortunately the new for_each_thread() only requires the stable
task_struct, so this change fixes both problems.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Sergey Dyasly <dserrg@gmail.com>
Tested-by: Sergey Dyasly <dserrg@gmail.com>
Reviewed-by: Sameer Nanda <snanda@chromium.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mandeep Singh Baines <msb@chromium.org>
Cc: "Ma, Xindong" <xindong.ma@intel.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: "Tu, Xiaobing" <xiaobing.tu@intel.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Li Zefan <lizefan@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05 14:54:15 -07:00
Liam Mark c6ec8d930a mm, oom: make dump_tasks public
Allow other functions to dump the list of tasks.
Useful for when debugging memory leaks.

Change-Id: I76c33a118a9765b4c2276e8c76de36399c78dbf6
Signed-off-by: Liam Mark <lmark@codeaurora.org>
2014-06-03 13:33:18 -07:00
David Rientjes 06bdd77c70 mm, oom: base root bonus on current usage
commit 778c14affaf94a9e4953179d3e13a544ccce7707 upstream.

A 3% of system memory bonus is sometimes too excessive in comparison to
other processes.

With commit a63d83f427 ("oom: badness heuristic rewrite"), the OOM
killer tries to avoid killing privileged tasks by subtracting 3% of
overall memory (system or cgroup) from their per-task consumption.  But
as a result, all root tasks that consume less than 3% of overall memory
are considered equal, and so it only takes 33+ privileged tasks pushing
the system out of memory for the OOM killer to do something stupid and
kill dhclient or other root-owned processes.  For example, on a 32G
machine it can't tell the difference between the 1M agetty and the 10G
fork bomb member.

The changelog describes this 3% boost as the equivalent to the global
overcommit limit being 3% higher for privileged tasks, but this is not
the same as discounting 3% of overall memory from _every privileged task
individually_ during OOM selection.

Replace the 3% of system memory bonus with a 3% of current memory usage
bonus.

By giving root tasks a bonus that is proportional to their actual size,
they remain comparable even when relatively small.  In the example
above, the OOM killer will discount the 1M agetty's 256 badness points
down to 179, and the 10G fork bomb's 262144 points down to 183500 points
and make the right choice, instead of discounting both to 0 and killing
agetty because it's first in the task list.

Signed-off-by: David Rientjes <rientjes@google.com>
Reported-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-13 13:48:02 -08:00
Sha Zhengju 58cf188ed6 memcg, oom: provide more precise dump info while memcg oom happening
Currently when a memcg oom is happening the oom dump messages is still
global state and provides few useful info for users.  This patch prints
more pointed memcg page statistics for memcg-oom and take hierarchy into
consideration:

Based on Michal's advice, we take hierarchy into consideration: supppose
we trigger an OOM on A's limit

        root_memcg
            |
            A (use_hierachy=1)
           / \
          B   C
          |
          D
then the printed info will be:

  Memory cgroup stats for /A:...
  Memory cgroup stats for /A/B:...
  Memory cgroup stats for /A/C:...
  Memory cgroup stats for /A/B/D:...

Following are samples of oom output:

(1) Before change:

    mal-80 invoked oom-killer:gfp_mask=0xd0, order=0, oom_score_adj=0
    mal-80 cpuset=/ mems_allowed=0
    Pid: 2976, comm: mal-80 Not tainted 3.7.0+ #10
    Call Trace:
     [<ffffffff8167fbfb>] dump_header+0x83/0x1ca
     ..... (call trace)
     [<ffffffff8168a818>] page_fault+0x28/0x30
                             <<<<<<<<<<<<<<<<<<<<< memcg specific information
    Task in /A/B/D killed as a result of limit of /A
    memory: usage 101376kB, limit 101376kB, failcnt 57
    memory+swap: usage 101376kB, limit 101376kB, failcnt 0
    kmem: usage 0kB, limit 9007199254740991kB, failcnt 0
                             <<<<<<<<<<<<<<<<<<<<< print per cpu pageset stat
    Mem-Info:
    Node 0 DMA per-cpu:
    CPU    0: hi:    0, btch:   1 usd:   0
    ......
    CPU    3: hi:    0, btch:   1 usd:   0
    Node 0 DMA32 per-cpu:
    CPU    0: hi:  186, btch:  31 usd: 173
    ......
    CPU    3: hi:  186, btch:  31 usd: 130
                             <<<<<<<<<<<<<<<<<<<<< print global page state
    active_anon:92963 inactive_anon:40777 isolated_anon:0
     active_file:33027 inactive_file:51718 isolated_file:0
     unevictable:0 dirty:3 writeback:0 unstable:0
     free:729995 slab_reclaimable:6897 slab_unreclaimable:6263
     mapped:20278 shmem:35971 pagetables:5885 bounce:0
     free_cma:0
                             <<<<<<<<<<<<<<<<<<<<< print per zone page state
    Node 0 DMA free:15836kB ... all_unreclaimable? no
    lowmem_reserve[]: 0 3175 3899 3899
    Node 0 DMA32 free:2888564kB ... all_unrelaimable? no
    lowmem_reserve[]: 0 0 724 724
    lowmem_reserve[]: 0 0 0 0
    Node 0 DMA: 1*4kB (U) ... 3*4096kB (M) = 15836kB
    Node 0 DMA32: 41*4kB (UM) ... 702*4096kB (MR) = 2888316kB
    120710 total pagecache pages
    0 pages in swap cache
                             <<<<<<<<<<<<<<<<<<<<< print global swap cache stat
    Swap cache stats: add 0, delete 0, find 0/0
    Free swap  = 499708kB
    Total swap = 499708kB
    1040368 pages RAM
    58678 pages reserved
    169065 pages shared
    173632 pages non-shared
    [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
    [ 2693]     0  2693     6005     1324      17        0             0 god
    [ 2754]     0  2754     6003     1320      16        0             0 god
    [ 2811]     0  2811     5992     1304      18        0             0 god
    [ 2874]     0  2874     6005     1323      18        0             0 god
    [ 2935]     0  2935     8720     7742      21        0             0 mal-30
    [ 2976]     0  2976    21520    17577      42        0             0 mal-80
    Memory cgroup out of memory: Kill process 2976 (mal-80) score 665 or sacrifice child
    Killed process 2976 (mal-80) total-vm:86080kB, anon-rss:69964kB, file-rss:344kB

We can see that messages dumped by show_free_areas() are longsome and can
provide so limited info for memcg that just happen oom.

(2) After change
    mal-80 invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0
    mal-80 cpuset=/ mems_allowed=0
    Pid: 2704, comm: mal-80 Not tainted 3.7.0+ #10
    Call Trace:
     [<ffffffff8167fd0b>] dump_header+0x83/0x1d1
     .......(call trace)
     [<ffffffff8168a918>] page_fault+0x28/0x30
    Task in /A/B/D killed as a result of limit of /A
                             <<<<<<<<<<<<<<<<<<<<< memcg specific information
    memory: usage 102400kB, limit 102400kB, failcnt 140
    memory+swap: usage 102400kB, limit 102400kB, failcnt 0
    kmem: usage 0kB, limit 9007199254740991kB, failcnt 0
    Memory cgroup stats for /A: cache:32KB rss:30984KB mapped_file:0KB swap:0KB inactive_anon:6912KB active_anon:24072KB inactive_file:32KB active_file:0KB unevictable:0KB
    Memory cgroup stats for /A/B: cache:0KB rss:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
    Memory cgroup stats for /A/C: cache:0KB rss:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
    Memory cgroup stats for /A/B/D: cache:32KB rss:71352KB mapped_file:0KB swap:0KB inactive_anon:6656KB active_anon:64696KB inactive_file:16KB active_file:16KB unevictable:0KB
    [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
    [ 2260]     0  2260     6006     1325      18        0             0 god
    [ 2383]     0  2383     6003     1319      17        0             0 god
    [ 2503]     0  2503     6004     1321      18        0             0 god
    [ 2622]     0  2622     6004     1321      16        0             0 god
    [ 2695]     0  2695     8720     7741      22        0             0 mal-30
    [ 2704]     0  2704    21520    17839      43        0             0 mal-80
    Memory cgroup out of memory: Kill process 2704 (mal-80) score 669 or sacrifice child
    Killed process 2704 (mal-80) total-vm:86080kB, anon-rss:71016kB, file-rss:340kB

This version provides more pointed info for memcg in "Memory cgroup stats
for XXX" section.

Signed-off-by: Sha Zhengju <handai.szj@taobao.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23 17:50:08 -08:00
David Rientjes 0fa84a4bfa mm, oom: remove redundant sleep in pagefault oom handler
out_of_memory() will already cause current to schedule if it has not been
killed, so doing it again in pagefault_out_of_memory() is redundant.
Remove it.

Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:34 -08:00
David Rientjes efacd02e4f mm, oom: cleanup pagefault oom handler
To lock the entire system from parallel oom killing, it's possible to pass
in a zonelist with all zones rather than using for_each_populated_zone()
for the iteration.  This obsoletes try_set_system_oom() and
clear_system_oom() so that they can be removed.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:34 -08:00
Lai Jiangshan bd3a66c1cd oom: use N_MEMORY instead N_HIGH_MEMORY
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Hillf Danton <dhillf@gmail.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Lin Feng <linfeng@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:32 -08:00
David Rientjes e1e12d2f31 mm, oom: fix race when specifying a thread as the oom origin
test_set_oom_score_adj() and compare_swap_oom_score_adj() are used to
specify that current should be killed first if an oom condition occurs in
between the two calls.

The usage is

	short oom_score_adj = test_set_oom_score_adj(OOM_SCORE_ADJ_MAX);
	...
	compare_swap_oom_score_adj(OOM_SCORE_ADJ_MAX, oom_score_adj);

to store the thread's oom_score_adj, temporarily change it to the maximum
score possible, and then restore the old value if it is still the same.

This happens to still be racy, however, if the user writes
OOM_SCORE_ADJ_MAX to /proc/pid/oom_score_adj in between the two calls.
The compare_swap_oom_score_adj() will then incorrectly reset the old value
prior to the write of OOM_SCORE_ADJ_MAX.

To fix this, introduce a new oom_flags_t member in struct signal_struct
that will be used for per-thread oom killer flags.  KSM and swapoff can
now use a bit in this member to specify that threads should be killed
first in oom conditions without playing around with oom_score_adj.

This also allows the correct oom_score_adj to always be shown when reading
/proc/pid/oom_score.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: Anton Vorontsov <anton.vorontsov@linaro.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11 17:22:27 -08:00
David Rientjes a9c58b907d mm, oom: change type of oom_score_adj to short
The maximum oom_score_adj is 1000 and the minimum oom_score_adj is -1000,
so this range can be represented by the signed short type with no
functional change.  The extra space this frees up in struct signal_struct
will be used for per-thread oom kill flags in the next patch.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: Anton Vorontsov <anton.vorontsov@linaro.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11 17:22:27 -08:00
David Rientjes 9ff4868e30 mm, oom: allow exiting threads to have access to memory reserves
Exiting threads, those with PF_EXITING set, can pagefault and require
memory before they can make forward progress.  This happens, for instance,
when a process must fault task->robust_list, a userspace structure, before
detaching its memory.

These threads also aren't guaranteed to get access to memory reserves
unless oom killed or killed from userspace.  The oom killer won't grant
memory reserves if other threads are also exiting other than current and
stalling at the same point.  This prevents needlessly killing processes
when others are already exiting.

Instead of special casing all the possible situations between PF_EXITING
getting set and a thread detaching its mm where it may allocate memory,
which probably wouldn't get updated when a change is made to the exit
path, the solution is to give all exiting threads access to memory
reserves if they call the oom killer.  This allows them to quickly
allocate, detach its mm, and free the memory it represents.

Summary of Luigi's bug report:

: He had an oom condition where threads were faulting on task->robust_list
: and repeatedly called the oom killer but it would defer killing a thread
: because it saw other PF_EXITING threads.  This can happen anytime we need
: to allocate memory after setting PF_EXITING and before detaching our mm;
: if there are other threads in the same state then the oom killer won't do
: anything unless one of them happens to be killed from userspace.
:
: So instead of only deferring for PF_EXITING and !task->robust_list, it's
: better to just give them access to memory reserves to prevent a potential
: livelock so that any other faults that may be introduced in the future in
: the exit path don't cause the same problem (and hopefully we don't allow
: too many of those!).

Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Tested-by: Luigi Semenzato <semenzato@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11 17:22:24 -08:00
Davidlohr Bueso 01dc52ebdf oom: remove deprecated oom_adj
The deprecated /proc/<pid>/oom_adj is scheduled for removal this month.

Signed-off-by: Davidlohr Bueso <dave@gnu.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:22:24 +09:00
David Rientjes 876aafbfd9 mm, memcg: move all oom handling to memcontrol.c
By globally defining check_panic_on_oom(), the memcg oom handler can be
moved entirely to mm/memcontrol.c.  This removes the ugly #ifdef in the
oom killer and cleans up the code.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31 18:42:45 -07:00
David Rientjes 6b0c81b3be mm, oom: reduce dependency on tasklist_lock
Since exiting tasks require write_lock_irq(&tasklist_lock) several times,
try to reduce the amount of time the readside is held for oom kills.  This
makes the interface with the memcg oom handler more consistent since it
now never needs to take tasklist_lock unnecessarily.

The only time the oom killer now takes tasklist_lock is when iterating the
children of the selected task, everything else is protected by
rcu_read_lock().

This requires that a reference to the selected process, p, is grabbed
before calling oom_kill_process().  It may release it and grab a reference
on another one of p's threads if !p->mm, but it also guarantees that it
will release the reference before returning.

[hughd@google.com: fix duplicate put_task_struct()]
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31 18:42:44 -07:00
David Rientjes 9cbb78bb31 mm, memcg: introduce own oom handler to iterate only over its own threads
The global oom killer is serialized by the per-zonelist
try_set_zonelist_oom() which is used in the page allocator.  Concurrent
oom kills are thus a rare event and only occur in systems using
mempolicies and with a large number of nodes.

Memory controller oom kills, however, can frequently be concurrent since
there is no serialization once the oom killer is called for oom conditions
in several different memcgs in parallel.

This creates a massive contention on tasklist_lock since the oom killer
requires the readside for the tasklist iteration.  If several memcgs are
calling the oom killer, this lock can be held for a substantial amount of
time, especially if threads continue to enter it as other threads are
exiting.

Since the exit path grabs the writeside of the lock with irqs disabled in
a few different places, this can cause a soft lockup on cpus as a result
of tasklist_lock starvation.

The kernel lacks unfair writelocks, and successful calls to the oom killer
usually result in at least one thread entering the exit path, so an
alternative solution is needed.

This patch introduces a seperate oom handler for memcgs so that they do
not require tasklist_lock for as much time.  Instead, it iterates only
over the threads attached to the oom memcg and grabs a reference to the
selected thread before calling oom_kill_process() to ensure it doesn't
prematurely exit.

This still requires tasklist_lock for the tasklist dump, iterating
children of the selected process, and killing all other threads on the
system sharing the same memory as the selected victim.  So while this
isn't a complete solution to tasklist_lock starvation, it significantly
reduces the amount of time that it is held.

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Sha Zhengju <handai.szj@taobao.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31 18:42:44 -07:00
David Rientjes 462607ecc5 mm, oom: introduce helper function to process threads during scan
This patch introduces a helper function to process each thread during the
iteration over the tasklist.  A new return type, enum oom_scan_t, is
defined to determine the future behavior of the iteration:

 - OOM_SCAN_OK: continue scanning the thread and find its badness,

 - OOM_SCAN_CONTINUE: do not consider this thread for oom kill, it's
   ineligible,

 - OOM_SCAN_ABORT: abort the iteration and return, or

 - OOM_SCAN_SELECT: always select this thread with the highest badness
   possible.

There is no functional change with this patch.  This new helper function
will be used in the next patch in the memory controller.

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Sha Zhengju <handai.szj@taobao.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31 18:42:44 -07:00
Andrew Morton c255a45805 memcg: rename config variables
Sanity:

CONFIG_CGROUP_MEM_RES_CTLR -> CONFIG_MEMCG
CONFIG_CGROUP_MEM_RES_CTLR_SWAP -> CONFIG_MEMCG_SWAP
CONFIG_CGROUP_MEM_RES_CTLR_SWAP_ENABLED -> CONFIG_MEMCG_SWAP_ENABLED
CONFIG_CGROUP_MEM_RES_CTLR_KMEM -> CONFIG_MEMCG_KMEM

[mhocko@suse.cz: fix missed bits]
Cc: Glauber Costa <glommer@parallels.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31 18:42:43 -07:00
David Rientjes de34d965a8 mm, oom: replace some information in tasklist dump
The number of ptes and swap entries are used in the oom killer's badness
heuristic, so they should be shown in the tasklist dump.

This patch adds those fields and replaces cpu and oom_adj values that are
currently emitted.  Cpu isn't interesting and oom_adj is deprecated and
will be removed later this year, the same information is already displayed
as oom_score_adj which is used internally.

At the same time, make the documentation a little more clear to state this
information is helpful to determine why the oom killer chose the task it
did to kill.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31 18:42:42 -07:00
David Rientjes 121d1ba0a0 mm, oom: fix potential killing of thread that is disabled from oom killing
/proc/sys/vm/oom_kill_allocating_task will immediately kill current when
the oom killer is called to avoid a potentially expensive tasklist scan
for large systems.

Currently, however, it is not checking current's oom_score_adj value which
may be OOM_SCORE_ADJ_MIN, meaning that it has been disabled from oom
killing.

This patch avoids killing current in such a condition and simply falls
back to the tasklist scan since memory still needs to be freed.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31 18:42:42 -07:00
David Rientjes 4f774b912d mm, oom: do not schedule if current has been killed
The oom killer currently schedules away from current in an uninterruptible
sleep if it does not have access to memory reserves.  It's possible that
current was killed because it shares memory with the oom killed thread or
because it was killed by the user in the interim, however.

This patch only schedules away from current if it does not have a pending
kill, i.e.  if it does not share memory with the oom killed thread.  It's
possible that it will immediately retry its memory allocation and fail,
but it will immediately be given access to memory reserves if it calls the
oom killer again.

This prevents the delay of memory freeing when threads that share memory
with the oom killed thread get unnecessarily scheduled.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31 18:42:41 -07:00
Wanpeng Li dad7557eb7 mm: fix kernel-doc warnings
Fix kernel-doc warnings such as

  Warning(../mm/page_cgroup.c:432): No description found for parameter 'id'
  Warning(../mm/page_cgroup.c:432): Excess function parameter 'mem' description in 'swap_cgroup_record'

Signed-off-by: Wanpeng Li <liwp@linux.vnet.ibm.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-20 14:39:36 -07:00
David Rientjes 61eafb00d5 mm, oom: fix and cleanup oom score calculations
The divide in p->signal->oom_score_adj * totalpages / 1000 within
oom_badness() was causing an overflow of the signed long data type.

This adds both the root bias and p->signal->oom_score_adj before doing the
normalization which fixes the issue and also cleans up the calculation.

Tested-by: Dave Jones <davej@redhat.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-20 14:39:35 -07:00
David Rientjes 1e11ad8dc4 mm, oom: fix badness score underflow
If the privileges given to root threads (3% of allowable memory) or a
negative value of /proc/pid/oom_score_adj happen to exceed the amount of
rss of a thread, its badness score overflows as a result of commit
a7f638f999 ("mm, oom: normalize oom scores to oom_score_adj scale only
for userspace").

Fix this by making the type signed and return 1, meaning the thread is
still eligible for kill, if the value is negative.

Reported-by: Dave Jones <davej@redhat.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-08 15:07:35 -07:00
David Rientjes a7f638f999 mm, oom: normalize oom scores to oom_score_adj scale only for userspace
The oom_score_adj scale ranges from -1000 to 1000 and represents the
proportion of memory available to the process at allocation time.  This
means an oom_score_adj value of 300, for example, will bias a process as
though it was using an extra 30.0% of available memory and a value of
-350 will discount 35.0% of available memory from its usage.

The oom killer badness heuristic also uses this scale to report the oom
score for each eligible process in determining the "best" process to
kill.  Thus, it can only differentiate each process's memory usage by
0.1% of system RAM.

On large systems, this can end up being a large amount of memory: 256MB
on 256GB systems, for example.

This can be fixed by having the badness heuristic to use the actual
memory usage in scoring threads and then normalizing it to the
oom_score_adj scale for userspace.  This results in better comparison
between eligible threads for kill and no change from the userspace
perspective.

Suggested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Tested-by: Dave Jones <davej@redhat.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29 16:22:24 -07:00
Eric W. Biederman 078de5f706 userns: Store uid and gid values in struct cred with kuid_t and kgid_t types
cred.h and a few trivial users of struct cred are changed.  The rest of the users
of struct cred are left for other patches as there are too many changes to make
in one go and leave the change reviewable.  If the user namespace is disabled and
CONFIG_UIDGID_STRICT_TYPE_CHECKS are disabled the code will contiue to compile
and behave correctly.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-05-03 03:28:38 -07:00
Oleg Nesterov d2d393099d signal: oom_kill_task: use SEND_SIG_FORCED instead of force_sig()
Change oom_kill_task() to use do_send_sig_info(SEND_SIG_FORCED) instead
of force_sig(SIGKILL).  With the recent changes we do not need force_ to
kill the CLONE_NEWPID tasks.

And this is more correct.  force_sig() can race with the exiting thread
even if oom_kill_task() checks p->mm != NULL, while
do_send_sig_info(group => true) kille the whole process.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Anton Vorontsov <anton.vorontsov@linaro.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:41 -07:00
David Rientjes e845e19936 mm, memcg: pass charge order to oom killer
The oom killer typically displays the allocation order at the time of oom
as a part of its diangostic messages (for global, cpuset, and mempolicy
ooms).

The memory controller may also pass the charge order to the oom killer so
it can emit the same information.  This is useful in determining how large
the memory allocation is that triggered the oom killer.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Balbir Singh <bsingharora@gmail.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-21 17:54:59 -07:00
David Rientjes 08ab9b10d4 mm, oom: force oom kill on sysrq+f
The oom killer chooses not to kill a thread if:

 - an eligible thread has already been oom killed and has yet to exit,
   and

 - an eligible thread is exiting but has yet to free all its memory and
   is not the thread attempting to currently allocate memory.

SysRq+F manually invokes the global oom killer to kill a memory-hogging
task.  This is normally done as a last resort to free memory when no
progress is being made or to test the oom killer itself.

For both uses, we always want to kill a thread and never defer.  This
patch causes SysRq+F to always kill an eligible thread and can be used to
force a kill even if another oom killed thread has failed to exit.

Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-21 17:54:58 -07:00
David Rientjes dc3f21eade mm, oom: introduce independent oom killer ratelimit state
printk_ratelimit() uses the global ratelimit state for all printks.  The
oom killer should not be subjected to this state just because another
subsystem or driver may be flooding the kernel log.

This patch introduces printk ratelimiting specifically for the oom killer.

Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-21 17:54:55 -07:00
David Rientjes 8447d950e7 mm, oom: do not emit oom killer warning if chosen thread is already exiting
If a thread is chosen for oom kill and is already PF_EXITING, then the oom
killer simply sets TIF_MEMDIE and returns.  This allows the thread to have
access to memory reserves so that it may quickly exit.  This logic is
preceeded with a comment saying there's no need to alarm the sysadmin.
This patch adds truth to that statement.

There's no need to emit any warning about the oom condition if the thread
is already exiting since it will not be killed.  In this condition, just
silently return the oom killer since its only giving access to memory
reserves and is otherwise a no-op.

Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-21 17:54:55 -07:00
David Rientjes 647f2bdf4a mm, oom: fold oom_kill_task() into oom_kill_process()
oom_kill_task() has a single caller, so fold it into its parent function,
oom_kill_process().  Slightly reduces the number of lines in the oom
killer.

Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-21 17:54:55 -07:00
David Rientjes 2a1c9b1fc0 mm, oom: avoid looping when chosen thread detaches its mm
oom_kill_task() returns non-zero iff the chosen process does not have any
threads with an attached ->mm.

In such a case, it's better to just return to the page allocator and retry
the allocation because memory could have been freed in the interim and the
oom condition may no longer exist.  It's unnecessary to loop in the oom
killer and find another thread to kill.

This allows both oom_kill_task() and oom_kill_process() to be converted to
void functions.  If the oom condition persists, the oom killer will be
recalled.

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-21 17:54:55 -07:00
Johannes Weiner 72835c86ca mm: unify remaining mem_cont, mem, etc. variable names to memcg
Signed-off-by: Johannes Weiner <jweiner@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12 20:13:06 -08:00
Johannes Weiner ec0fffd84b mm: oom_kill: remove memcg argument from oom_kill_task()
The memcg argument of oom_kill_task() hasn't been used since 341aea2
'oom-kill: remove boost_dying_task_prio()'.  Kill it.

Signed-off-by: Johannes Weiner <jweiner@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12 20:13:06 -08:00
KAMEZAWA Hiroyuki 43d2b11324 tracepoint: add tracepoints for debugging oom_score_adj
oom_score_adj is used for guarding processes from OOM-Killer.  One of
problem is that it's inherited at fork().  When a daemon set oom_score_adj
and make children, it's hard to know where the value is set.

This patch adds some tracepoints useful for debugging. This patch adds
3 trace points.
  - creating new task
  - renaming a task (exec)
  - set oom_score_adj

To debug, users need to enable some trace pointer. Maybe filtering is useful as

# EVENT=/sys/kernel/debug/tracing/events/task/
# echo "oom_score_adj != 0" > $EVENT/task_newtask/filter
# echo "oom_score_adj != 0" > $EVENT/task_rename/filter
# echo 1 > $EVENT/enable
# EVENT=/sys/kernel/debug/tracing/events/oom/
# echo 1 > $EVENT/enable

output will be like this.
# grep oom /sys/kernel/debug/tracing/trace
bash-7699  [007] d..3  5140.744510: oom_score_adj_update: pid=7699 comm=bash oom_score_adj=-1000
bash-7699  [007] ...1  5151.818022: task_newtask: pid=7729 comm=bash clone_flags=1200011 oom_score_adj=-1000
ls-7729  [003] ...2  5151.818504: task_rename: pid=7729 oldcomm=bash newcomm=ls oom_score_adj=-1000
bash-7699  [002] ...1  5175.701468: task_newtask: pid=7730 comm=bash clone_flags=1200011 oom_score_adj=-1000
grep-7730  [007] ...2  5175.701993: task_rename: pid=7730 oldcomm=bash newcomm=grep oom_score_adj=-1000

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-10 16:30:44 -08:00
Rafael J. Wysocki b00f4dc5ff Merge branch 'master' into pm-sleep
* master: (848 commits)
  SELinux: Fix RCU deref check warning in sel_netport_insert()
  binary_sysctl(): fix memory leak
  mm/vmalloc.c: remove static declaration of va from __get_vm_area_node
  ipmi_watchdog: restore settings when BMC reset
  oom: fix integer overflow of points in oom_badness
  memcg: keep root group unchanged if creation fails
  nilfs2: potential integer overflow in nilfs_ioctl_clean_segments()
  nilfs2: unbreak compat ioctl
  cpusets: stall when updating mems_allowed for mempolicy or disjoint nodemask
  evm: prevent racing during tfm allocation
  evm: key must be set once during initialization
  mmc: vub300: fix type of firmware_rom_wait_states module parameter
  Revert "mmc: enable runtime PM by default"
  mmc: sdhci: remove "state" argument from sdhci_suspend_host
  x86, dumpstack: Fix code bytes breakage due to missing KERN_CONT
  IB/qib: Correct sense on freectxts increment and decrement
  RDMA/cma: Verify private data length
  cgroups: fix a css_set not found bug in cgroup_attach_proc
  oprofile: Fix uninitialized memory access when writing to writing to oprofilefs
  Revert "xen/pv-on-hvm kexec: add xs_reset_watches to shutdown watches from old kernel"
  ...

Conflicts:
	kernel/cgroup_freezer.c
2011-12-21 21:59:45 +01:00
Frantisek Hrbata ff05b6f7ae oom: fix integer overflow of points in oom_badness
An integer overflow will happen on 64bit archs if task's sum of rss,
swapents and nr_ptes exceeds (2^31)/1000 value.  This was introduced by
commit

f755a04 oom: use pte pages in OOM score

where the oom score computation was divided into several steps and it's no
longer computed as one expression in unsigned long(rss, swapents, nr_pte
are unsigned long), where the result value assigned to points(int) is in
range(1..1000).  So there could be an int overflow while computing

176          points *= 1000;

and points may have negative value. Meaning the oom score for a mem hog task
will be one.

196          if (points <= 0)
197                  return 1;

For example:
[ 3366]     0  3366 35390480 24303939   5       0             0 oom01
Out of memory: Kill process 3366 (oom01) score 1 or sacrifice child

Here the oom1 process consumes more than 24303939(rss)*4096~=92GB physical
memory, but it's oom score is one.

In this situation the mem hog task is skipped and oom killer kills another and
most probably innocent task with oom score greater than one.

The points variable should be of type long instead of int to prevent the
int overflow.

Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: <stable@vger.kernel.org>		[2.6.36+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-20 10:25:04 -08:00
Tejun Heo a5be2d0d1a freezer: rename thaw_process() to __thaw_task() and simplify the implementation
thaw_process() now has only internal users - system and cgroup
freezers.  Remove the unnecessary return value, rename, unexport and
collapse __thaw_process() into it.  This will help further updates to
the freezer code.

-v3: oom_kill grew a use of thaw_process() while this patch was
     pending.  Convert it to use __thaw_task() for now.  In the longer
     term, this should be handled by allowing tasks to die if killed
     even if it's frozen.

-v2: minor style update as suggested by Matt.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Paul Menage <menage@google.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
2011-11-21 12:32:23 -08:00
Michal Hocko 5aecc85abd oom: do not kill tasks with oom_score_adj OOM_SCORE_ADJ_MIN
Commit c9f01245 ("oom: remove oom_disable_count") has removed the
oom_disable_count counter which has been used for early break out from
oom_badness so we could never select a task with oom_score_adj set to
OOM_SCORE_ADJ_MIN (oom disabled).

Now that the counter is gone we are always going through heuristics
calculation and we always return a non zero positive value.  This means
that we can end up killing a task with OOM disabled because it is
indistinguishable from regular tasks with 1% resp.  CAP_SYS_ADMIN tasks
with 3% usage of memory or tasks with oom_score_adj set but OOM enabled.

Let's break out early if the task should have OOM disabled.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ying Han <yinghan@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-15 22:41:51 -02:00
Linus Torvalds 32aaeffbd4 Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux
* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
  Revert "tracing: Include module.h in define_trace.h"
  irq: don't put module.h into irq.h for tracking irqgen modules.
  bluetooth: macroize two small inlines to avoid module.h
  ip_vs.h: fix implicit use of module_get/module_put from module.h
  nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
  include: replace linux/module.h with "struct module" wherever possible
  include: convert various register fcns to macros to avoid include chaining
  crypto.h: remove unused crypto_tfm_alg_modname() inline
  uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
  pm_runtime.h: explicitly requires notifier.h
  linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
  miscdevice.h: fix up implicit use of lists and types
  stop_machine.h: fix implicit use of smp.h for smp_processor_id
  of: fix implicit use of errno.h in include/linux/of.h
  of_platform.h: delete needless include <linux/module.h>
  acpi: remove module.h include from platform/aclinux.h
  miscdevice.h: delete unnecessary inclusion of module.h
  device_cgroup.h: delete needless include <linux/module.h>
  net: sch_generic remove redundant use of <linux/module.h>
  net: inet_timewait_sock doesnt need <linux/module.h>
  ...

Fix up trivial conflicts (other header files, and  removal of the ab3550 mfd driver) in
 - drivers/media/dvb/frontends/dibx000_common.c
 - drivers/media/video/{mt9m111.c,ov6650.c}
 - drivers/mfd/ab3550-core.c
 - include/linux/dmaengine.h
2011-11-06 19:44:47 -08:00
David Rientjes 43362a4977 oom: fix race while temporarily setting current's oom_score_adj
test_set_oom_score_adj() was introduced in 72788c3856 ("oom: replace
PF_OOM_ORIGIN with toggling oom_score_adj") to temporarily elevate
current's oom_score_adj for ksm and swapoff without requiring an
additional per-process flag.

Using that function to both set oom_score_adj to OOM_SCORE_ADJ_MAX and
then reinstate the previous value is racy since it's possible that
userspace can set the value to something else itself before the old value
is reinstated.  That results in userspace setting current's oom_score_adj
to a different value and then the kernel immediately setting it back to
its previous value without notification.

To fix this, a new compare_swap_oom_score_adj() function is introduced
with the same semantics as the compare and swap CAS instruction, or
CMPXCHG on x86.  It is used to reinstate the previous value of
oom_score_adj if and only if the present value is the same as the old
value.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ying Han <yinghan@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31 17:30:45 -07:00
David Rientjes c9f01245b6 oom: remove oom_disable_count
This removes mm->oom_disable_count entirely since it's unnecessary and
currently buggy.  The counter was intended to be per-process but it's
currently decremented in the exit path for each thread that exits, causing
it to underflow.

The count was originally intended to prevent oom killing threads that
share memory with threads that cannot be killed since it doesn't lead to
future memory freeing.  The counter could be fixed to represent all
threads sharing the same mm, but it's better to remove the count since:

 - it is possible that the OOM_DISABLE thread sharing memory with the
   victim is waiting on that thread to exit and will actually cause
   future memory freeing, and

 - there is no guarantee that a thread is disabled from oom killing just
   because another thread sharing its mm is oom disabled.

Signed-off-by: David Rientjes <rientjes@google.com>
Reported-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Ying Han <yinghan@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31 17:30:45 -07:00
David Rientjes 7b0d44fa49 oom: avoid killing kthreads if they assume the oom killed thread's mm
After selecting a task to kill, the oom killer iterates all processes and
kills all other threads that share the same mm_struct in different thread
groups.  It would not otherwise be helpful to kill a thread if its memory
would not be subsequently freed.

A kernel thread, however, may assume a user thread's mm by using
use_mm().  This is only temporary and should not result in sending a
SIGKILL to that kthread.

This patch ensures that only user threads and not kthreads are sent a
SIGKILL if they share the same mm_struct as the oom killed task.

Signed-off-by: David Rientjes <rientjes@google.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31 17:30:45 -07:00
David Rientjes f660daac47 oom: thaw threads if oom killed thread is frozen before deferring
If a thread has been oom killed and is frozen, thaw it before returning to
the page allocator.  Otherwise, it can stay frozen indefinitely and no
memory will be freed.

Signed-off-by: David Rientjes <rientjes@google.com>
Reported-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Acked-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31 17:30:45 -07:00