Commit Graph

108 Commits

Author SHA1 Message Date
Oleg Nesterov 31bbe17032 kernel/fork.c:copy_process(): don't add the uninitialized child to thread/task/pid lists
copy_process() adds the new child to thread_group/init_task.tasks list and
then does attach_pid(child, PIDTYPE_PID).  This means that the lockless
next_thread() or next_task() can see this thread with the wrong pid.  Say,
"ls /proc/pid/task" can list the same inode twice.

We could move attach_pid(child, PIDTYPE_PID) up, but in this case
find_task_by_vpid() can find the new thread before it was fully
initialized.

And this is already true for PIDTYPE_PGID/PIDTYPE_SID, With this patch
copy_process() initializes child->pids[*].pid first, then calls
attach_pid() to insert the task into the pid->tasks list.

attach_pid() no longer need the "struct pid*" argument, it is always
called after pid_link->pid was already set.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Sergey Dyasly <dserrg@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-27 22:10:32 +02:00
Raphael S. Carvalho 71cf620d99 kernel/pid.c: move statement
Move statement to static initilization of init_pid_ns.

Signed-off-by: Raphael S. Carvalho <raphael.scarv@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-27 22:10:32 +02:00
Oleg Nesterov 717113e2d5 pidns: fix free_pid() to handle the first fork failure
"case 0" in free_pid() assumes that disable_pid_allocation() should
clear PIDNS_HASH_ADDING before the last pid goes away.

However this doesn't happen if the first fork() fails to create the
child reaper which should call disable_pid_allocation().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-27 22:10:31 +02:00
Oleg Nesterov 7bc7d51be8 BACKPORT: FROMLIST: pids: make task_tgid_nr_ns() safe
This was reported many times, and this was even mentioned in commit
52ee2dfdd4 "pids: refactor vnr/nr_ns helpers to make them safe" but
somehow nobody bothered to fix the obvious problem: task_tgid_nr_ns()
is not safe because task->group_leader points to nowhere after the
exiting task passes exit_notify(), rcu_read_lock() can not help.

We really need to change __unhash_process() to nullify group_leader,
parent, and real_parent, but this needs some cleanups. Until then we
can turn task_tgid_nr_ns() into another user of __task_pid_nr_ns() and
fix the problem.

Reported-by: Troy Kensinger <tkensinger@google.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

(url: https://patchwork.kernel.org/patch/9913055/)
Bug: 31495866

Change-Id: I5e67b02a77e805f71fa3a787249f13c1310f02e2
(cherry picked from commit 48de73e34a5760591758ff7ffdc79bc464e60f69)
2018-05-26 00:39:33 +02:00
Ian Maund 8b08aa9e75 This is the 3.10.67 stable release
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJUyuGRAAoJEDjbvchgkmk+7EwQALYPOeh+AManQFB1MQvFuOgZ
 /4ulpjhGXw/RPTKHMeyHo8vRfUhMOx8UPF62uql+g1l9b/Zt2bs6qXu4QcxRRsQc
 trSTUpi+U14y1hkgqOVOcFYP2ZaTjNEBQgLJ4eGn46CliLqme+rfoyRYm2GXzcR4
 6cbSAr3mufdFIpi9/8Dn62Gv0aws5lIv3qkHJXznyuux3tisPT5y6Ux2KJoivPn/
 SqADtRpwo+7lTjl15fE++9AqNsGMorV6toT2OO/7nXP+824psInKLmREAT2qC99b
 BG61vcYdxOuHtzmwrvCf1jSRjxhvZT0j2xhBr/vCKcxy08AT0vDv68zrV1r6TIuu
 U7/CKXtFBY95cjfnkTLJuswBSuIA/+sQHV6DaddH0V8fcZ6rQMLrblQ9ZcFFFkmT
 2SG6lmlXqZvcEKYGMnL/Dcow1rkRhB5stiGgTkYxjiRSRpzAHISRJ/GGpsT+rRqK
 HpBs5p9JshvRl7RWKwAu+DNGaEK1X/WYxc4/jw6dZFWX7lEWSMIPlr9zXgZCZ39y
 V6lV1VVlT9/CSs1swKHUyhHHehlFsnIlQ6Fkiycr/KkuqBLs92Hyb7WhpVa819yX
 osXdxSm6J54skiOLKYpBWHpnY09Tc+p28VEfMpErTExgp2oE8F34K7kdhoQPQb97
 2mHiXNa+J4CLUNQ+sRmw
 =HDBo
 -----END PGP SIGNATURE-----

Merge commit 'v3.10.67' into msm-3.10

This merge brings us up to date with upstream kernel.org tag v3.10.67.
It also contains changes to allow forbidden warnings introduced in
the commit 'core, nfqueue, openvswitch: Orphan frags in skb_zerocopy
and handle errors'. Once upstream has corrected these warnings, the
changes to scripts/gcc-wrapper.py, in this commit, can be reverted.

* commit 'v3.10.67' (915 commits)
  Linux 3.10.67
  md/raid5: fetch_block must fetch all the blocks handle_stripe_dirtying wants.
  ext4: fix warning in ext4_da_update_reserve_space()
  quota: provide interface for readding allocated space into reserved space
  crypto: add missing crypto module aliases
  crypto: include crypto- module prefix in template
  crypto: prefix module autoloading with "crypto-"
  drbd: merge_bvec_fn: properly remap bvm->bi_bdev
  Revert "swiotlb-xen: pass dev_addr to swiotlb_tbl_unmap_single"
  ipvs: uninitialized data with IP_VS_IPV6
  KEYS: close race between key lookup and freeing
  sata_dwc_460ex: fix resource leak on error path
  x86/asm/traps: Disable tracing and kprobes in fixup_bad_iret and sync_regs
  x86, tls: Interpret an all-zero struct user_desc as "no segment"
  x86, tls, ldt: Stop checking lm in LDT_empty
  x86/tsc: Change Fast TSC calibration failed from error to info
  x86, hyperv: Mark the Hyper-V clocksource as being continuous
  clocksource: exynos_mct: Fix bitmask regression for exynos4_mct_write
  can: dev: fix crtlmode_supported check
  bus: mvebu-mbus: fix support of MBus window 13
  ARM: dts: imx25: Fix PWM "per" clocks
  time: adjtimex: Validate the ADJ_FREQUENCY values
  time: settimeofday: Validate the values of tv from user
  dm cache: share cache-metadata object across inactive and active DM tables
  ipr: wait for aborted command responses
  drm/i915: Fix mutex->owner inspection race under DEBUG_MUTEXES
  scripts/recordmcount.pl: There is no -m32 gcc option on Super-H anymore
  ALSA: usb-audio: Add mic volume fix quirk for Logitech Webcam C210
  libata: prevent HSM state change race between ISR and PIO
  pinctrl: Fix two deadlocks
  gpio: sysfs: fix gpio device-attribute leak
  gpio: sysfs: fix gpio-chip device-attribute leak
  Linux 3.10.66
  s390/3215: fix tty output containing tabs
  s390/3215: fix hanging console issue
  fsnotify: next_i is freed during fsnotify_unmount_inodes.
  netfilter: ipset: small potential read beyond the end of buffer
  mmc: sdhci: Fix sleep in atomic after inserting SD card
  LOCKD: Fix a race when initialising nlmsvc_timeout
  x86, um: actually mark system call tables readonly
  um: Skip futex_atomic_cmpxchg_inatomic() test
  decompress_bunzip2: off by one in get_next_block()
  ARM: shmobile: sh73a0 legacy: Set .control_parent for all irqpin instances
  ARM: omap5/dra7xx: Fix frequency typos
  ARM: clk-imx6q: fix video divider for rev T0 1.0
  ARM: imx6q: drop unnecessary semicolon
  ARM: dts: imx25: Fix the SPI1 clocks
  Input: I8042 - add Acer Aspire 7738 to the nomux list
  Input: i8042 - reset keyboard to fix Elantech touchpad detection
  can: kvaser_usb: Don't send a RESET_CHIP for non-existing channels
  can: kvaser_usb: Reset all URB tx contexts upon channel close
  can: kvaser_usb: Don't free packets when tight on URBs
  USB: keyspan: fix null-deref at probe
  USB: cp210x: add IDs for CEL USB sticks and MeshWorks devices
  USB: cp210x: fix ID for production CEL MeshConnect USB Stick
  usb: dwc3: gadget: Stop TRB preparation after limit is reached
  usb: dwc3: gadget: Fix TRB preparation during SG
  OHCI: add a quirk for ULi M5237 blocking on reset
  gpiolib: of: Correct error handling in of_get_named_gpiod_flags
  NFSv4.1: Fix client id trunking on Linux
  ftrace/jprobes/x86: Fix conflict between jprobes and function graph tracing
  vfio-pci: Fix the check on pci device type in vfio_pci_probe()
  uvcvideo: Fix destruction order in uvc_delete()
  smiapp: Take mutex during PLL update in sensor initialisation
  af9005: fix kernel panic on init if compiled without IR
  smiapp-pll: Correct clock debug prints
  video/logo: prevent use of logos after they have been freed
  storvsc: ring buffer failures may result in I/O freeze
  iscsi-target: Fail connection on short sendmsg writes
  hp_accel: Add support for HP ZBook 15
  cfg80211: Fix 160 MHz channels with 80+80 and 160 MHz drivers
  ARC: [nsimosci] move peripherals to match model to FPGA
  drm/i915: Force the CS stall for invalidate flushes
  drm/i915: Invalidate media caches on gen7
  drm/radeon: properly filter DP1.2 4k modes on non-DP1.2 hw
  drm/radeon: check the right ring in radeon_evict_flags()
  drm/vmwgfx: Fix fence event code
  enic: fix rx skb checksum
  alx: fix alx_poll()
  tcp: Do not apply TSO segment limit to non-TSO packets
  tg3: tg3_disable_ints using uninitialized mailbox value to disable interrupts
  netlink: Don't reorder loads/stores before marking mmap netlink frame as available
  netlink: Always copy on mmap TX.
  Linux 3.10.65
  mm: Don't count the stack guard page towards RLIMIT_STACK
  mm: propagate error from stack expansion even for guard page
  mm, vmscan: prevent kswapd livelock due to pfmemalloc-throttled process being killed
  perf session: Do not fail on processing out of order event
  perf: Fix events installation during moving group
  perf/x86/intel/uncore: Make sure only uncore events are collected
  Btrfs: don't delay inode ref updates during log replay
  ARM: mvebu: disable I/O coherency on non-SMP situations on Armada 370/375/38x/XP
  scripts/kernel-doc: don't eat struct members with __aligned
  nilfs2: fix the nilfs_iget() vs. nilfs_new_inode() races
  nfsd4: fix xdr4 inclusion of escaped char
  fs: nfsd: Fix signedness bug in compare_blob
  serial: samsung: wait for transfer completion before clock disable
  writeback: fix a subtle race condition in I_DIRTY clearing
  cdc-acm: memory leak in error case
  genhd: check for int overflow in disk_expand_part_tbl()
  USB: cdc-acm: check for valid interfaces
  ALSA: hda - Fix wrong gpio_dir & gpio_mask hint setups for IDT/STAC codecs
  ALSA: hda - using uninitialized data
  ALSA: usb-audio: extend KEF X300A FU 10 tweak to Arcam rPAC
  driver core: Fix unbalanced device reference in drivers_probe
  x86, vdso: Use asm volatile in __getcpu
  x86_64, vdso: Fix the vdso address randomization algorithm
  HID: Add a new id 0x501a for Genius MousePen i608X
  HID: add battery quirk for USB_DEVICE_ID_APPLE_ALU_WIRELESS_2011_ISO keyboard
  HID: roccat: potential out of bounds in pyra_sysfs_write_settings()
  HID: i2c-hid: prevent buffer overflow in early IRQ
  HID: i2c-hid: fix race condition reading reports
  iommu/vt-d: Fix an off-by-one bug in __domain_mapping()
  UBI: Fix double free after do_sync_erase()
  UBI: Fix invalid vfree()
  pstore-ram: Allow optional mapping with pgprot_noncached
  pstore-ram: Fix hangs by using write-combine mappings
  PCI: Restore detection of read-only BARs
  ASoC: dwc: Ensure FIFOs are flushed to prevent channel swap
  ASoC: max98090: Fix ill-defined sidetone route
  ASoC: sigmadsp: Refuse to load firmware files with a non-supported version
  ath5k: fix hardware queue index assignment
  swiotlb-xen: pass dev_addr to swiotlb_tbl_unmap_single
  can: peak_usb: fix memset() usage
  can: peak_usb: fix cleanup sequence order in case of error during init
  ath9k: fix BE/BK queue order
  ath9k_hw: fix hardware queue allocation
  ocfs2: fix journal commit deadlock
  Linux 3.10.64
  Btrfs: fix fs corruption on transaction abort if device supports discard
  Btrfs: do not move em to modified list when unpinning
  eCryptfs: Remove buggy and unnecessary write in file name decode routine
  eCryptfs: Force RO mount when encrypted view is enabled
  udf: Verify symlink size before loading it
  exit: pidns: alloc_pid() leaks pid_namespace if child_reaper is exiting
  ncpfs: return proper error from NCP_IOC_SETROOT ioctl
  crypto: af_alg - fix backlog handling
  userns: Unbreak the unprivileged remount tests
  userns: Allow setting gid_maps without privilege when setgroups is disabled
  userns: Add a knob to disable setgroups on a per user namespace basis
  userns: Rename id_map_mutex to userns_state_mutex
  userns: Only allow the creator of the userns unprivileged mappings
  userns: Check euid no fsuid when establishing an unprivileged uid mapping
  userns: Don't allow unprivileged creation of gid mappings
  userns: Don't allow setgroups until a gid mapping has been setablished
  userns: Document what the invariant required for safe unprivileged mappings.
  groups: Consolidate the setgroups permission checks
  umount: Disallow unprivileged mount force
  mnt: Update unprivileged remount test
  mnt: Implicitly add MNT_NODEV on remount when it was implicitly added by mount
  mac80211: free management frame keys when removing station
  mac80211: fix multicast LED blinking and counter
  KEYS: Fix stale key registration at error path
  isofs: Fix unchecked printing of ER records
  x86/tls: Don't validate lm in set_thread_area() after all
  dm space map metadata: fix sm_bootstrap_get_nr_blocks()
  dm bufio: fix memleak when using a dm_buffer's inline bio
  nfs41: fix nfs4_proc_layoutget error handling
  megaraid_sas: corrected return of wait_event from abort frame path
  mmc: block: add newline to sysfs display of force_ro
  mfd: tc6393xb: Fail ohci suspend if full state restore is required
  md/bitmap: always wait for writes on unplug.
  x86, kvm: Clear paravirt_enabled on KVM guests for espfix32's benefit
  x86_64, switch_to(): Load TLS descriptors before switching DS and ES
  x86/tls: Disallow unusual TLS segments
  x86/tls: Validate TLS entries to protect espfix
  isofs: Fix infinite looping over CE entries
  Linux 3.10.63
  ALSA: usb-audio: Don't resubmit pending URBs at MIDI error recovery
  powerpc: 32 bit getcpu VDSO function uses 64 bit instructions
  ARM: sched_clock: Load cycle count after epoch stabilizes
  igb: bring link up when PHY is powered up
  ext2: Fix oops in ext2_get_block() called from ext2_quota_write()
  nEPT: Nested INVEPT
  net: sctp: use MAX_HEADER for headroom reserve in output path
  net: mvneta: fix Tx interrupt delay
  rtnetlink: release net refcnt on error in do_setlink()
  net/mlx4_core: Limit count field to 24 bits in qp_alloc_res
  tg3: fix ring init when there are more TX than RX channels
  ipv6: gre: fix wrong skb->protocol in WCCP
  sata_fsl: fix error handling of irq_of_parse_and_map
  ahci: disable MSI on SAMSUNG 0xa800 SSD
  AHCI: Add DeviceIDs for Sunrise Point-LP SATA controller
  media: smiapp: Only some selection targets are settable
  drm/i915: Unlock panel even when LVDS is disabled
  drm/radeon: kernel panic in drm_calc_vbltimestamp_from_scanoutpos with 3.18.0-rc6
  i2c: davinci: generate STP always when NACK is received
  i2c: omap: fix i207 errata handling
  i2c: omap: fix NACK and Arbitration Lost irq handling
  xen-netfront: Remove BUGs on paged skb data which crosses a page boundary
  mm: fix swapoff hang after page migration and fork
  mm: frontswap: invalidate expired data on a dup-store failure
  Linux 3.10.62
  nfsd: Fix ACL null pointer deref
  powerpc/powernv: Honor the generic "no_64bit_msi" flag
  bnx2fc: do not add shared skbs to the fcoe_rx_list
  nfsd4: fix leak of inode reference on delegation failure
  nfsd: Fix slot wake up race in the nfsv4.1 callback code
  rt2x00: do not align payload on modern H/W
  can: dev: avoid calling kfree_skb() from interrupt context
  spi: dw: Fix dynamic speed change.
  iser-target: Handle DEVICE_REMOVAL event on network portal listener correctly
  target: Don't call TFO->write_pending if data_length == 0
  srp-target: Retry when QP creation fails with ENOMEM
  Input: xpad - use proper endpoint type
  ARM: 8222/1: mvebu: enable strex backoff delay
  ARM: 8216/1: xscale: correct auxiliary register in suspend/resume
  ALSA: usb-audio: Add ctrl message delay quirk for Marantz/Denon devices
  can: esd_usb2: fix memory leak on disconnect
  USB: xhci: don't start a halted endpoint before its new dequeue is set
  usb-quirks: Add reset-resume quirk for MS Wireless Laser Mouse 6000
  usb: serial: ftdi_sio: add PIDs for Matrix Orbital products
  USB: serial: cp210x: add IDs for CEL MeshConnect USB Stick
  USB: keyspan: fix tty line-status reporting
  USB: keyspan: fix overrun-error reporting
  USB: ssu100: fix overrun-error reporting
  iio: Fix IIO_EVENT_CODE_EXTRACT_DIR bit mask
  powerpc/pseries: Fix endiannes issue in RTAS call from xmon
  powerpc/pseries: Honor the generic "no_64bit_msi" flag
  of/base: Fix PowerPC address parsing hack
  ASoC: wm_adsp: Avoid attempt to free buffers that might still be in use
  ASoC: sgtl5000: Fix SMALL_POP bit definition
  PCI/MSI: Add device flag indicating that 64-bit MSIs don't work
  ipx: fix locking regression in ipx_sendmsg and ipx_recvmsg
  pptp: fix stack info leak in pptp_getname()
  qmi_wwan: Add support for HP lt4112 LTE/HSPA+ Gobi 4G Modem
  ieee802154: fix error handling in ieee802154fake_probe()
  ipv4: Fix incorrect error code when adding an unreachable route
  inetdevice: fixed signed integer overflow
  sparc64: Fix constraints on swab helpers.
  uprobes, x86: Fix _TIF_UPROBE vs _TIF_NOTIFY_RESUME
  x86, mm: Set NX across entire PMD at boot
  x86: Require exact match for 'noxsave' command line option
  x86_64, traps: Rework bad_iret
  x86_64, traps: Stop using IST for #SS
  x86_64, traps: Fix the espfix64 #DF fixup and rewrite it in C
  MIPS: Loongson: Make platform serial setup always built-in.
  MIPS: oprofile: Fix backtrace on 64-bit kernel
  Linux 3.10.61
  mm: memcg: handle non-error OOM situations more gracefully
  mm: memcg: do not trap chargers with full callstack on OOM
  mm: memcg: rework and document OOM waiting and wakeup
  mm: memcg: enable memcg OOM killer only for user faults
  x86: finish user fault error path with fatal signal
  arch: mm: pass userspace fault flag to generic fault handler
  arch: mm: do not invoke OOM killer on kernel fault OOM
  arch: mm: remove obsolete init OOM protection
  mm: invoke oom-killer from remaining unconverted page fault handlers
  net: sctp: fix skb_over_panic when receiving malformed ASCONF chunks
  net: sctp: fix panic on duplicate ASCONF chunks
  net: sctp: fix remote memory pressure from excessive queueing
  KVM: x86: Don't report guest userspace emulation error to userspace
  SCSI: hpsa: fix a race in cmd_free/scsi_done
  net/mlx4_en: Fix BlueFlame race
  ARM: Correct BUG() assembly to ensure it is endian-agnostic
  perf/x86/intel: Use proper dTLB-load-misses event on IvyBridge
  mei: bus: fix possible boundaries violation
  perf: Handle compat ioctl
  MIPS: Fix forgotten preempt_enable() when CPU has inclusive pcaches
  dell-wmi: Fix access out of memory
  ARM: probes: fix instruction fetch order with <asm/opcodes.h>
  br: fix use of ->rx_handler_data in code executed on non-rx_handler path
  netfilter: nf_nat: fix oops on netns removal
  netfilter: xt_bpf: add mising opaque struct sk_filter definition
  netfilter: nf_log: release skbuff on nlmsg put failure
  netfilter: nfnetlink_log: fix maximum packet length logged to userspace
  netfilter: nf_log: account for size of NLMSG_DONE attribute
  ipc: always handle a new value of auto_msgmni
  clocksource: Remove "weak" from clocksource_default_clock() declaration
  kgdb: Remove "weak" from kgdb_arch_pc() declaration
  media: ttusb-dec: buffer overflow in ioctl
  NFSv4: Fix races between nfs_remove_bad_delegation() and delegation return
  nfs: Fix use of uninitialized variable in nfs_getattr()
  NFS: Don't try to reclaim delegation open state if recovery failed
  NFSv4: Ensure that we remove NFSv4.0 delegations when state has expired
  Input: alps - allow up to 2 invalid packets without resetting device
  Input: alps - ignore potential bare packets when device is out of sync
  dm raid: ensure superblock's size matches device's logical block size
  dm btree: fix a recursion depth bug in btree walking code
  block: Fix computation of merged request priority
  parisc: Use compat layer for msgctl, shmat, shmctl and semtimedop syscalls
  scsi: only re-lock door after EH on devices that were reset
  nfs: fix pnfs direct write memory leak
  firewire: cdev: prevent kernel stack leaking into ioctl arguments
  arm64: __clear_user: handle exceptions on strb
  ARM: 8198/1: make kuser helpers depend on MMU
  drm/radeon: add missing crtc unlock when setting up the MC
  mac80211: fix use-after-free in defragmentation
  macvtap: Fix csum_start when VLAN tags are present
  iwlwifi: configure the LTR
  libceph: do not crash on large auth tickets
  xtensa: re-wire umount syscall to sys_oldumount
  ALSA: usb-audio: Fix memory leak in FTU quirk
  ahci: disable MSI instead of NCQ on Samsung pci-e SSDs on macbooks
  ahci: Add Device IDs for Intel Sunrise Point PCH
  audit: keep inode pinned
  x86, x32, audit: Fix x32's AUDIT_ARCH wrt audit
  sparc32: Implement xchg and atomic_xchg using ATOMIC_HASH locks
  sparc64: Do irq_{enter,exit}() around generic_smp_call_function*().
  sparc64: Fix crashes in schizo_pcierr_intr_other().
  sunvdc: don't call VD_OP_GET_VTOC
  vio: fix reuse of vio_dring slot
  sunvdc: limit each sg segment to a page
  sunvdc: compute vdisk geometry from capacity
  sunvdc: add cdrom and v1.1 protocol support
  net: sctp: fix memory leak in auth key management
  net: sctp: fix NULL pointer dereference in af->from_addr_param on malformed packet
  gre6: Move the setting of dev->iflink into the ndo_init functions.
  ip6_tunnel: Use ip6_tnl_dev_init as the ndo_init function.
  Linux 3.10.60
  libceph: ceph-msgr workqueue needs a resque worker
  Btrfs: fix kfree on list_head in btrfs_lookup_csums_range error cleanup
  of: Fix overflow bug in string property parsing functions
  sysfs: driver core: Fix glue dir race condition by gdp_mutex
  i2c: at91: don't account as iowait
  acer-wmi: Add acpi_backlight=video quirk for the Acer KAV80
  rbd: Fix error recovery in rbd_obj_read_sync()
  drm/radeon: remove invalid pci id
  usb: gadget: udc: core: fix kernel oops with soft-connect
  usb: gadget: function: acm: make f_acm pass USB20CV Chapter9
  usb: dwc3: gadget: fix set_halt() bug with pending transfers
  crypto: algif - avoid excessive use of socket buffer in skcipher
  mm: Remove false WARN_ON from pagecache_isize_extended()
  x86, apic: Handle a bad TSC more gracefully
  posix-timers: Fix stack info leak in timer_create()
  mac80211: fix typo in starting baserate for rts_cts_rate_idx
  PM / Sleep: fix recovery during resuming from hibernation
  tty: Fix high cpu load if tty is unreleaseable
  quota: Properly return errors from dquot_writeback_dquots()
  ext3: Don't check quota format when there are no quota files
  nfsd4: fix crash on unknown operation number
  cpc925_edac: Report UE events properly
  e7xxx_edac: Report CE events properly
  i3200_edac: Report CE events properly
  i82860_edac: Report CE events properly
  scsi: Fix error handling in SCSI_IOCTL_SEND_COMMAND
  lib/bitmap.c: fix undefined shift in __bitmap_shift_{left|right}()
  cgroup/kmemleak: add kmemleak_free() for cgroup deallocations.
  usb: Do not allow usb_alloc_streams on unconfigured devices
  USB: opticon: fix non-atomic allocation in write path
  usb-storage: handle a skipped data phase
  spi: pxa2xx: toggle clocks on suspend if not disabled by runtime PM
  spi: pl022: Fix incorrect dma_unmap_sg
  usb: dwc3: gadget: Properly initialize LINK TRB
  wireless: rt2x00: add new rt2800usb device
  USB: option: add Haier CE81B CDMA modem
  usb: option: add support for Telit LE910
  USB: cdc-acm: only raise DTR on transitions from B0
  USB: cdc-acm: add device id for GW Instek AFG-2225
  usb: serial: ftdi_sio: add "bricked" FTDI device PID
  usb: serial: ftdi_sio: add Awinda Station and Dongle products
  USB: serial: cp210x: add Silicon Labs 358x VID and PID
  serial: Fix divide-by-zero fault in uart_get_divisor()
  staging:iio:ade7758: Remove "raw" from channel name
  staging:iio:ade7758: Fix check if channels are enabled in prenable
  staging:iio:ade7758: Fix NULL pointer deref when enabling buffer
  staging:iio:ad5933: Drop "raw" from channel names
  staging:iio:ad5933: Fix NULL pointer deref when enabling buffer
  OOM, PM: OOM killed task shouldn't escape PM suspend
  freezer: Do not freeze tasks killed by OOM killer
  ext4: fix oops when loading block bitmap failed
  cpufreq: intel_pstate: Fix setting max_perf_pct in performance policy
  ext4: fix overflow when updating superblock backups after resize
  ext4: check s_chksum_driver when looking for bg csum presence
  ext4: fix reservation overflow in ext4_da_write_begin
  ext4: add ext4_iget_normal() which is to be used for dir tree lookups
  ext4: grab missed write_count for EXT4_IOC_SWAP_BOOT
  ext4: don't check quota format when there are no quota files
  ext4: check EA value offset when loading
  jbd2: free bh when descriptor block checksum fails
  MIPS: tlbex: Properly fix HUGE TLB Refill exception handler
  target: Fix APTPL metadata handling for dynamic MappedLUNs
  target: Fix queue full status NULL pointer for SCF_TRANSPORT_TASK_SENSE
  qla_target: don't delete changed nacls
  ARC: Update order of registers in KGDB to match GDB 7.5
  ARC: [nsimosci] Allow "headless" models to boot
  KVM: x86: Emulator fixes for eip canonical checks on near branches
  KVM: x86: Fix wrong masking on relative jump/call
  kvm: x86: don't kill guest on unknown exit reason
  KVM: x86: Check non-canonical addresses upon WRMSR
  KVM: x86: Improve thread safety in pit
  KVM: x86: Prevent host from panicking on shared MSR writes.
  kvm: fix excessive pages un-pinning in kvm_iommu_map error path.
  media: tda7432: Fix setting TDA7432_MUTE bit for TDA7432_RF register
  media: ds3000: fix LNB supply voltage on Tevii S480 on initialization
  media: em28xx-v4l: give back all active video buffers to the vb2 core properly on streaming stop
  media: v4l2-common: fix overflow in v4l_bound_align_image()
  drm/nouveau/bios: memset dcb struct to zero before parsing
  drm/tilcdc: Fix the error path in tilcdc_load()
  drm/ast: Fix HW cursor image
  Input: i8042 - quirks for Fujitsu Lifebook A544 and Lifebook AH544
  Input: i8042 - add noloop quirk for Asus X750LN
  framebuffer: fix border color
  modules, lock around setting of MODULE_STATE_UNFORMED
  dm log userspace: fix memory leak in dm_ulog_tfr_init failure path
  block: fix alignment_offset math that assumes io_min is a power-of-2
  drbd: compute the end before rb_insert_augmented()
  dm bufio: update last_accessed when relinking a buffer
  virtio_pci: fix virtio spec compliance on restore
  selinux: fix inode security list corruption
  pstore: Fix duplicate {console,ftrace}-efi entries
  mfd: rtsx_pcr: Fix MSI enable error handling
  mnt: Prevent pivot_root from creating a loop in the mount tree
  UBI: add missing kmem_cache_free() in process_pool_aeb error path
  random: add and use memzero_explicit() for clearing data
  crypto: more robust crypto_memneq
  fix misuses of f_count() in ppp and netlink
  kill wbuf_queued/wbuf_dwork_lock
  ALSA: pcm: Zero-clear reserved fields of PCM status ioctl in compat mode
  evm: check xattr value length and type in evm_inode_setxattr()
  x86, pageattr: Prevent overflow in slow_virt_to_phys() for X86_PAE
  x86_64, entry: Fix out of bounds read on sysenter
  x86_64, entry: Filter RFLAGS.NT on entry from userspace
  x86, flags: Rename X86_EFLAGS_BIT1 to X86_EFLAGS_FIXED
  x86, fpu: shift drop_init_fpu() from save_xstate_sig() to handle_signal()
  x86, fpu: __restore_xstate_sig()->math_state_restore() needs preempt_disable()
  x86: Reject x32 executables if x32 ABI not supported
  vfs: fix data corruption when blocksize < pagesize for mmaped data
  UBIFS: fix free log space calculation
  UBIFS: fix a race condition
  UBIFS: remove mst_mutex
  fs: Fix theoretical division by 0 in super_cache_scan().
  fs: make cont_expand_zero interruptible
  mmc: rtsx_pci_sdmmc: fix incorrect last byte in R2 response
  libata-sff: Fix controllers with no ctl port
  pata_serverworks: disable 64-KB DMA transfers on Broadcom OSB4 IDE Controller
  Revert "percpu: free percpu allocation info for uniprocessor system"
  lockd: Try to reconnect if statd has moved
  drivers/net: macvtap and tun depend on INET
  ipv4: dst_entry leak in ip_send_unicast_reply()
  ax88179_178a: fix bonding failure
  ipv4: fix nexthop attlen check in fib_nh_match
  tracing/syscalls: Ignore numbers outside NR_syscalls' range
  Linux 3.10.59
  ecryptfs: avoid to access NULL pointer when write metadata in xattr
  ARM: at91/PMC: don't forget to write PMC_PCDR register to disable clocks
  ALSA: usb-audio: Add support for Steinberg UR22 USB interface
  ALSA: emu10k1: Fix deadlock in synth voice lookup
  ALSA: pcm: use the same dma mmap codepath both for arm and arm64
  arm64: compat: fix compat types affecting struct compat_elf_prpsinfo
  spi: dw-mid: terminate ongoing transfers at exit
  kernel: add support for gcc 5
  fanotify: enable close-on-exec on events' fd when requested in fanotify_init()
  mm: clear __GFP_FS when PF_MEMALLOC_NOIO is set
  Bluetooth: Fix issue with USB suspend in btusb driver
  Bluetooth: Fix HCI H5 corrupted ack value
  rt2800: correct BBP1_TX_POWER_CTRL mask
  PCI: Generate uppercase hex for modalias interface class
  PCI: Increase IBM ipr SAS Crocodile BARs to at least system page size
  iwlwifi: Add missing PCI IDs for the 7260 series
  NFSv4.1: Fix an NFSv4.1 state renewal regression
  NFSv4: fix open/lock state recovery error handling
  NFSv4: Fix lock recovery when CREATE_SESSION/SETCLIENTID_CONFIRM fails
  lzo: check for length overrun in variable length encoding.
  Revert "lzo: properly check for overruns"
  Documentation: lzo: document part of the encoding
  m68k: Disable/restore interrupts in hwreg_present()/hwreg_write()
  Drivers: hv: vmbus: Fix a bug in vmbus_open()
  Drivers: hv: vmbus: Cleanup vmbus_establish_gpadl()
  Drivers: hv: vmbus: Cleanup vmbus_teardown_gpadl()
  Drivers: hv: vmbus: Cleanup vmbus_post_msg()
  firmware_class: make sure fw requests contain a name
  qla2xxx: Use correct offset to req-q-out for reserve calculation
  mptfusion: enable no_write_same for vmware scsi disks
  be2iscsi: check ip buffer before copying
  regmap: fix NULL pointer dereference in _regmap_write/read
  regmap: debugfs: fix possbile NULL pointer dereference
  spi: dw-mid: check that DMA was inited before exit
  spi: dw-mid: respect 8 bit mode
  x86/intel/quark: Switch off CR4.PGE so TLB flush uses CR3 instead
  kvm: don't take vcpu mutex for obviously invalid vcpu ioctls
  KVM: s390: unintended fallthrough for external call
  kvm: x86: fix stale mmio cache bug
  fs: Add a missing permission check to do_umount
  Btrfs: fix race in WAIT_SYNC ioctl
  Btrfs: fix build_backref_tree issue with multiple shared blocks
  Btrfs: try not to ENOSPC on log replay
  Linux 3.10.58
  USB: cp210x: add support for Seluxit USB dongle
  USB: serial: cp210x: added Ketra N1 wireless interface support
  USB: Add device quirk for ASUS T100 Base Station keyboard
  ipv6: reallocate addrconf router for ipv6 address when lo device up
  tcp: fixing TLP's FIN recovery
  sctp: handle association restarts when the socket is closed.
  ip6_gre: fix flowi6_proto value in xmit path
  hyperv: Fix a bug in netvsc_start_xmit()
  tg3: Allow for recieve of full-size 8021AD frames
  tg3: Work around HW/FW limitations with vlan encapsulated frames
  l2tp: fix race while getting PMTU on PPP pseudo-wire
  openvswitch: fix panic with multiple vlan headers
  packet: handle too big packets for PACKET_V3
  tcp: fix tcp_release_cb() to dispatch via address family for mtu_reduced()
  sit: Fix ipip6_tunnel_lookup device matching criteria
  myri10ge: check for DMA mapping errors
  Linux 3.10.57
  cpufreq: ondemand: Change the calculation of target frequency
  cpufreq: Fix wrong time unit conversion
  nl80211: clear skb cb before passing to netlink
  drbd: fix regression 'out of mem, failed to invoke fence-peer helper'
  jiffies: Fix timeval conversion to jiffies
  md/raid5: disable 'DISCARD' by default due to safety concerns.
  media: vb2: fix VBI/poll regression
  mm: numa: Do not mark PTEs pte_numa when splitting huge pages
  mm, thp: move invariant bug check out of loop in __split_huge_page_map
  ring-buffer: Fix infinite spin in reading buffer
  init/Kconfig: Fix HAVE_FUTEX_CMPXCHG to not break up the EXPERT menu
  perf: fix perf bug in fork()
  udf: Avoid infinite loop when processing indirect ICBs
  Linux 3.10.56
  vm_is_stack: use for_each_thread() rather then buggy while_each_thread()
  oom_kill: add rcu_read_lock() into find_lock_task_mm()
  oom_kill: has_intersects_mems_allowed() needs rcu_read_lock()
  oom_kill: change oom_kill.c to use for_each_thread()
  introduce for_each_thread() to replace the buggy while_each_thread()
  kernel/fork.c:copy_process(): unify CLONE_THREAD-or-thread_group_leader code
  arm: multi_v7_defconfig: Enable Zynq UART driver
  ext2: Fix fs corruption in ext2_get_xip_mem()
  serial: 8250_dma: check the result of TX buffer mapping
  ARM: 7748/1: oabi: handle faults when loading swi instruction from userspace
  netfilter: nf_conntrack: avoid large timeout for mid-stream pickup
  PM / sleep: Use valid_state() for platform-dependent sleep states only
  PM / sleep: Add state field to pm_states[] entries
  ipvs: fix ipv6 hook registration for local replies
  ipvs: Maintain all DSCP and ECN bits for ipv6 tun forwarding
  ipvs: avoid netns exit crash on ip_vs_conn_drop_conntrack
  md/raid1: fix_read_error should act on all non-faulty devices.
  media: cx18: fix kernel oops with tda8290 tuner
  Fix nasty 32-bit overflow bug in buffer i/o code.
  perf kmem: Make it work again on non NUMA machines
  perf: Fix a race condition in perf_remove_from_context()
  alarmtimer: Lock k_itimer during timer callback
  alarmtimer: Do not signal SIGEV_NONE timers
  parisc: Only use -mfast-indirect-calls option for 32-bit kernel builds
  powerpc/perf: Fix ABIv2 kernel backtraces
  sched: Fix unreleased llc_shared_mask bit during CPU hotplug
  ocfs2/dlm: do not get resource spinlock if lockres is new
  nilfs2: fix data loss with mmap()
  fs/notify: don't show f_handle if exportfs_encode_inode_fh failed
  fsnotify/fdinfo: use named constants instead of hardcoded values
  kcmp: fix standard comparison bug
  Revert "mac80211: disable uAPSD if all ACs are under ACM"
  usb: dwc3: core: fix ordering for PHY suspend
  usb: dwc3: core: fix order of PM runtime calls
  usb: host: xhci: fix compliance mode workaround
  genhd: fix leftover might_sleep() in blk_free_devt()
  lockd: fix rpcbind crash on lockd startup failure
  rtlwifi: rtl8192cu: Add new ID
  percpu: perform tlb flush after pcpu_map_pages() failure
  percpu: fix pcpu_alloc_pages() failure path
  percpu: free percpu allocation info for uniprocessor system
  ata_piix: Add Device IDs for Intel 9 Series PCH
  Input: i8042 - add nomux quirk for Avatar AVIU-145A6
  Input: i8042 - add Fujitsu U574 to no_timeout dmi table
  Input: atkbd - do not try 'deactivate' keyboard on any LG laptops
  Input: elantech - fix detection of touchpad on ASUS s301l
  Input: synaptics - add support for ForcePads
  Input: serport - add compat handling for SPIOCSTYPE ioctl
  dm crypt: fix access beyond the end of allocated space
  block: Fix dev_t minor allocation lifetime
  workqueue: apply __WQ_ORDERED to create_singlethread_workqueue()
  Revert "iwlwifi: dvm: don't enable CTS to self"
  SCSI: libiscsi: fix potential buffer overrun in __iscsi_conn_send_pdu
  NFC: microread: Potential overflows in microread_target_discovered()
  iscsi-target: Fix memory corruption in iscsit_logout_post_handler_diffcid
  iscsi-target: avoid NULL pointer in iscsi_copy_param_list failure
  Target/iser: Don't put isert_conn inside disconnected handler
  Target/iser: Get isert_conn reference once got to connected_handler
  iio:inkern: fix overwritten -EPROBE_DEFER in of_iio_channel_get_by_name
  iio:magnetometer: bugfix magnetometers gain values
  iio: adc: ad_sigma_delta: Fix indio_dev->trig assignment
  iio: st_sensors: Fix indio_dev->trig assignment
  iio: meter: ade7758: Fix indio_dev->trig assignment
  iio: inv_mpu6050: Fix indio_dev->trig assignment
  iio: gyro: itg3200: Fix indio_dev->trig assignment
  iio:trigger: modify return value for iio_trigger_get
  CIFS: Fix SMB2 readdir error handling
  CIFS: Fix directory rename error
  ASoC: davinci-mcasp: Correct rx format unit configuration
  shmem: fix nlink for rename overwrite directory
  x86 early_ioremap: Increase FIX_BTMAPS_SLOTS to 8
  KVM: x86: handle idiv overflow at kvm_write_tsc
  regmap: Fix handling of volatile registers for format_write() chips
  ACPICA: Update to GPIO region handler interface.
  MIPS: mcount: Adjust stack pointer for static trace in MIPS32
  MIPS: ZBOOT: add missing <linux/string.h> include
  ARM: 8165/1: alignment: don't break misaligned NEON load/store
  ARM: 7897/1: kexec: Use the right ISA for relocate_new_kernel
  ARM: 8133/1: use irq_set_affinity with force=false when migrating irqs
  ARM: 8128/1: abort: don't clear the exclusive monitors
  NFSv4: Fix another bug in the close/open_downgrade code
  NFSv4: nfs4_state_manager() vs. nfs_server_remove_lists()
  usb:hub set hub->change_bits when over-current happens
  usb: dwc3: omap: fix ordering for runtime pm calls
  USB: EHCI: unlink QHs even after the controller has stopped
  USB: storage: Add quirks for Entrega/Xircom USB to SCSI converters
  USB: storage: Add quirk for Ariston Technologies iConnect USB to SCSI adapter
  USB: storage: Add quirk for Adaptec USBConnect 2000 USB-to-SCSI Adapter
  storage: Add single-LUN quirk for Jaz USB Adapter
  usb: hub: take hub->hdev reference when processing from eventlist
  xhci: fix oops when xhci resumes from hibernate with hw lpm capable devices
  xhci: Fix null pointer dereference if xhci initialization fails
  USB: zte_ev: fix removed PIDs
  USB: ftdi_sio: add support for NOVITUS Bono E thermal printer
  USB: sierra: add 1199:68AA device ID
  USB: sierra: avoid CDC class functions on "68A3" devices
  USB: zte_ev: remove duplicate Qualcom PID
  USB: zte_ev: remove duplicate Gobi PID
  Revert "USB: option,zte_ev: move most ZTE CDMA devices to zte_ev"
  USB: option: add VIA Telecom CDS7 chipset device id
  USB: option: reduce interrupt-urb logging verbosity
  USB: serial: fix potential heap buffer overflow
  USB: sisusb: add device id for Magic Control USB video
  USB: serial: fix potential stack buffer overflow
  USB: serial: pl2303: add device id for ztek device
  xtensa: fix a6 and a7 handling in fast_syscall_xtensa
  xtensa: fix TLBTEMP_BASE_2 region handling in fast_second_level_miss
  xtensa: fix access to THREAD_RA/THREAD_SP/THREAD_DS
  xtensa: fix address checks in dma_{alloc,free}_coherent
  xtensa: replace IOCTL code definitions with constants
  drm/radeon: add connector quirk for fujitsu board
  drm/vmwgfx: Fix a potential infinite spin waiting for fifo idle
  drm/ast: AST2000 cannot be detected correctly
  drm/i915: Wait for vblank before enabling the TV encoder
  drm/i915: Remove bogus __init annotation from DMI callbacks
  HID: logitech-dj: prevent false errors to be shown
  HID: magicmouse: sanity check report size in raw_event() callback
  HID: picolcd: sanity check report size in raw_event() callback
  cfq-iosched: Fix wrong children_weight calculation
  ALSA: pcm: fix fifo_size frame calculation
  ALSA: hda - Fix invalid pin powermap without jack detection
  ALSA: hda - Fix COEF setups for ALC1150 codec
  ALSA: core: fix buffer overflow in snd_info_get_line()
  arm64: ptrace: fix compat hardware watchpoint reporting
  trace: Fix epoll hang when we race with new entries
  i2c: at91: Fix a race condition during signal handling in at91_do_twi_xfer.
  i2c: at91: add bound checking on SMBus block length bytes
  arm64: flush TLS registers during exec
  ibmveth: Fix endian issues with rx_no_buffer statistic
  ahci: add pcid for Marvel 0x9182 controller
  ahci: Add Device IDs for Intel 9 Series PCH
  pata_scc: propagate return value of scc_wait_after_reset
  drm/i915: read HEAD register back in init_ring_common() to enforce ordering
  drm/radeon: load the lm63 driver for an lm64 thermal chip.
  drm/ttm: Choose a pool to shrink correctly in ttm_dma_pool_shrink_scan().
  drm/ttm: Fix possible division by 0 in ttm_dma_pool_shrink_scan().
  drm/tilcdc: fix double kfree
  drm/tilcdc: fix release order on exit
  drm/tilcdc: panel: fix leak when unloading the module
  drm/tilcdc: tfp410: fix dangling sysfs connector node
  drm/tilcdc: slave: fix dangling sysfs connector node
  drm/tilcdc: panel: fix dangling sysfs connector node
  carl9170: fix sending URBs with wrong type when using full-speed
  Linux 3.10.55
  libceph: gracefully handle large reply messages from the mon
  libceph: rename ceph_msg::front_max to front_alloc_len
  tpm: Provide a generic means to override the chip returned timeouts
  vfs: fix bad hashing of dentries
  dcache.c: get rid of pointless macros
  IB/srp: Fix deadlock between host removal and multipathd
  blkcg: don't call into policy draining if root_blkg is already gone
  mtd: nand: omap: Fix 1-bit Hamming code scheme, omap_calculate_ecc()
  mtd/ftl: fix the double free of the buffers allocated in build_maps()
  CIFS: Fix wrong restart readdir for SMB1
  CIFS: Fix wrong filename length for SMB2
  CIFS: Fix wrong directory attributes after rename
  CIFS: Possible null ptr deref in SMB2_tcon
  CIFS: Fix async reading on reconnects
  CIFS: Fix STATUS_CANNOT_DELETE error mapping for SMB2
  libceph: do not hard code max auth ticket len
  libceph: add process_one_ticket() helper
  libceph: set last_piece in ceph_msg_data_pages_cursor_init() correctly
  md/raid1,raid10: always abort recover on write error.
  xfs: don't zero partial page cache pages during O_DIRECT writes
  xfs: don't zero partial page cache pages during O_DIRECT writes
  xfs: don't dirty buffers beyond EOF
  xfs: quotacheck leaves dquot buffers without verifiers
  RDMA/iwcm: Use a default listen backlog if needed
  md/raid10: Fix memory leak when raid10 reshape completes.
  md/raid10: fix memory leak when reshaping a RAID10.
  md/raid6: avoid data corruption during recovery of double-degraded RAID6
  Bluetooth: Avoid use of session socket after the session gets freed
  Bluetooth: never linger on process exit
  mnt: Add tests for unprivileged remount cases that have found to be faulty
  mnt: Change the default remount atime from relatime to the existing value
  mnt: Correct permission checks in do_remount
  mnt: Move the test for MNT_LOCK_READONLY from change_mount_flags into do_remount
  mnt: Only change user settable mount flags in remount
  ring-buffer: Up rb_iter_peek() loop count to 3
  ring-buffer: Always reset iterator to reader page
  ACPI / cpuidle: fix deadlock between cpuidle_lock and cpu_hotplug.lock
  ACPI: Run fixed event device notifications in process context
  ACPICA: Utilities: Fix memory leak in acpi_ut_copy_iobject_to_iobject
  bfa: Fix undefined bit shift on big-endian architectures with 32-bit DMA address
  ASoC: pxa-ssp: drop SNDRV_PCM_FMTBIT_S24_LE
  ASoC: max98090: Fix missing free_irq
  ASoC: samsung: Correct I2S DAI suspend/resume ops
  ASoC: wm_adsp: Add missing MODULE_LICENSE
  ASoC: pcm: fix dpcm_path_put in dpcm runtime update
  openrisc: Rework signal handling
  MIPS: Fix accessing to per-cpu data when flushing the cache
  MIPS: OCTEON: make get_system_type() thread-safe
  MIPS: asm: thread_info: Add _TIF_SECCOMP flag
  MIPS: Cleanup flags in syscall flags handlers.
  MIPS: asm/reg.h: Make 32- and 64-bit definitions available at the same time
  MIPS: Remove BUG_ON(!is_fpu_owner()) in do_ade()
  MIPS: tlbex: Fix a missing statement for HUGETLB
  MIPS: Prevent user from setting FCSR cause bits
  MIPS: GIC: Prevent array overrun
  drivers: scsi: storvsc: Correctly handle TEST_UNIT_READY failure
  Drivers: scsi: storvsc: Implement a eh_timed_out handler
  powerpc/pseries: Failure on removing device node
  powerpc/mm: Use read barrier when creating real_pte
  powerpc/mm/numa: Fix break placement
  regulator: arizona-ldo1: remove bypass functionality
  mfd: omap-usb-host: Fix improper mask use.
  kernel/smp.c:on_each_cpu_cond(): fix warning in fallback path
  CAPABILITIES: remove undefined caps from all processes
  tpm: missing tpm_chip_put in tpm_get_random()
  firmware: Do not use WARN_ON(!spin_is_locked())
  spi: omap2-mcspi: Configure hardware when slave driver changes mode
  spi: orion: fix incorrect handling of cell-index DT property
  iommu/amd: Fix cleanup_domain for mass device removal
  media: media-device: Remove duplicated memset() in media_enum_entities()
  media: au0828: Only alt setting logic when needed
  media: xc4000: Fix get_frequency()
  media: xc5000: Fix get_frequency()
  Linux 3.10.54
  USB: fix build error with CONFIG_PM_RUNTIME disabled
  NFSv4: Fix problems with close in the presence of a delegation
  NFSv3: Fix another acl regression
  svcrdma: Select NFSv4.1 backchannel transport based on forward channel
  NFSD: Decrease nfsd_users in nfsd_startup_generic fail
  usb: hub: Prevent hub autosuspend if usbcore.autosuspend is -1
  USB: whiteheat: Added bounds checking for bulk command response
  USB: ftdi_sio: Added PID for new ekey device
  USB: ftdi_sio: add Basic Micro ATOM Nano USB2Serial PID
  ARM: OMAP2+: hwmod: Rearm wake-up interrupts for DT when MUSB is idled
  usb: xhci: amd chipset also needs short TX quirk
  xhci: Treat not finding the event_seg on COMP_STOP the same as COMP_STOP_INVAL
  Staging: speakup: Update __speakup_paste_selection() tty (ab)usage to match vt
  jbd2: fix infinite loop when recovering corrupt journal blocks
  mei: nfc: fix memory leak in error path
  mei: reset client state on queued connect request
  Btrfs: fix csum tree corruption, duplicate and outdated checksums
  hpsa: fix bad -ENOMEM return value in hpsa_big_passthru_ioctl
  x86/efi: Enforce CONFIG_RELOCATABLE for EFI boot stub
  x86_64/vsyscall: Fix warn_bad_vsyscall log output
  x86: don't exclude low BIOS area when allocating address space for non-PCI cards
  drm/radeon: add additional SI pci ids
  ext4: fix BUG_ON in mb_free_blocks()
  kvm: iommu: fix the third parameter of kvm_iommu_put_pages (CVE-2014-3601)
  Revert "KVM: x86: Increase the number of fixed MTRR regs to 10"
  KVM: nVMX: fix "acknowledge interrupt on exit" when APICv is in use
  KVM: x86: always exit on EOIs for interrupts listed in the IOAPIC redir table
  KVM: x86: Inter-privilege level ret emulation is not implemeneted
  crypto: ux500 - make interrupt mode plausible
  serial: core: Preserve termios c_cflag for console resume
  ext4: fix ext4_discard_allocated_blocks() if we can't allocate the pa struct
  drivers/i2c/busses: use correct type for dma_map/unmap
  hwmon: (dme1737) Prevent overflow problem when writing large limits
  hwmon: (ads1015) Fix out-of-bounds array access
  hwmon: (lm85) Fix various errors on attribute writes
  hwmon: (ads1015) Fix off-by-one for valid channel index checking
  hwmon: (gpio-fan) Prevent overflow problem when writing large limits
  hwmon: (lm78) Fix overflow problems seen when writing large temperature limits
  hwmon: (sis5595) Prevent overflow problem when writing large limits
  drm: omapdrm: fix compiler errors
  ARM: OMAP3: Fix choice of omap3_restore_es function in OMAP34XX rev3.1.2 case.
  mei: start disconnect request timer consistently
  ALSA: hda/realtek - Avoid setting wrong COEF on ALC269 & co
  ALSA: hda/ca0132 - Don't try loading firmware at resume when already failed
  ALSA: virtuoso: add Xonar Essence STX II support
  ALSA: hda - fix an external mic jack problem on a HP machine
  USB: Fix persist resume of some SS USB devices
  USB: ehci-pci: USB host controller support for Intel Quark X1000
  USB: serial: ftdi_sio: Add support for new Xsens devices
  USB: serial: ftdi_sio: Annotate the current Xsens PID assignments
  USB: OHCI: don't lose track of EDs when a controller dies
  isofs: Fix unbounded recursion when processing relocated directories
  HID: fix a couple of off-by-ones
  HID: logitech: perform bounds checking on device_id early enough
  stable_kernel_rules: Add pointer to netdev-FAQ for network patches
  Linux 3.10.53
  arch/sparc/math-emu/math_32.c: drop stray break operator
  sparc64: ldc_connect() should not return EINVAL when handshake is in progress.
  sunsab: Fix detection of BREAK on sunsab serial console
  bbc-i2c: Fix BBC I2C envctrl on SunBlade 2000
  sparc64: Guard against flushing openfirmware mappings.
  sparc64: Do not insert non-valid PTEs into the TSB hash table.
  sparc64: Add membar to Niagara2 memcpy code.
  sparc64: Fix huge TSB mapping on pre-UltraSPARC-III cpus.
  sparc64: Don't bark so loudly about 32-bit tasks generating 64-bit fault addresses.
  sparc64: Fix top-level fault handling bugs.
  sparc64: Handle 32-bit tasks properly in compute_effective_address().
  sparc64: Make itc_sync_lock raw
  sparc64: Fix argument sign extension for compat_sys_futex().
  sctp: fix possible seqlock seadlock in sctp_packet_transmit()
  iovec: make sure the caller actually wants anything in memcpy_fromiovecend
  net: Correctly set segment mac_len in skb_segment().
  macvlan: Initialize vlan_features to turn on offload support.
  net: sctp: inherit auth_capable on INIT collisions
  tcp: Fix integer-overflow in TCP vegas
  tcp: Fix integer-overflows in TCP veno
  net: sendmsg: fix NULL pointer dereference
  ip: make IP identifiers less predictable
  inetpeer: get rid of ip_id_count
  bnx2x: fix crash during TSO tunneling
  Linux 3.10.52
  x86/espfix/xen: Fix allocation of pages for paravirt page tables
  lib/btree.c: fix leak of whole btree nodes
  net/l2tp: don't fall back on UDP [get|set]sockopt
  net: mvneta: replace Tx timer with a real interrupt
  net: mvneta: add missing bit descriptions for interrupt masks and causes
  net: mvneta: do not schedule in mvneta_tx_timeout
  net: mvneta: use per_cpu stats to fix an SMP lock up
  net: mvneta: increase the 64-bit rx/tx stats out of the hot path
  Revert "mac80211: move "bufferable MMPDU" check to fix AP mode scan"
  staging: vt6655: Fix Warning on boot handle_irq_event_percpu.
  x86_64/entry/xen: Do not invoke espfix64 on Xen
  x86, espfix: Make it possible to disable 16-bit support
  x86, espfix: Make espfix64 a Kconfig option, fix UML
  x86, espfix: Fix broken header guard
  x86, espfix: Move espfix definitions into a separate header file
  x86-64, espfix: Don't leak bits 31:16 of %esp returning to 16-bit stack
  Revert "x86-64, modify_ldt: Make support for 16-bit segments a runtime option"
  timer: Fix lock inversion between hrtimer_bases.lock and scheduler locks
  printk: rename printk_sched to printk_deferred
  iio: buffer: Fix demux table creation
  staging: vt6655: Fix disassociated messages every 10 seconds
  mm, thp: do not allow thp faults to avoid cpuset restrictions
  scsi: handle flush errors properly
  rapidio/tsi721_dma: fix failure to obtain transaction descriptor
  cfg80211: fix mic_failure tracing
  ARM: 8115/1: LPAE: reduce damage caused by idmap to virtual memory layout
  crypto: af_alg - properly label AF_ALG socket
  Linux 3.10.51
  core, nfqueue, openvswitch: Orphan frags in skb_zerocopy and handle errors
  x86/efi: Include a .bss section within the PE/COFF headers
  s390/ptrace: fix PSW mask check
  Fix gcc-4.9.0 miscompilation of load_balance() in scheduler
  mm: hugetlb: fix copy_hugetlb_page_range()
  x86_32, entry: Store badsys error code in %eax
  hwmon: (smsc47m192) Fix temperature limit and vrm write operations
  parisc: Remove SA_RESTORER define
  coredump: fix the setting of PF_DUMPCORE
  Input: fix defuzzing logic
  slab_common: fix the check for duplicate slab names
  slab_common: Do not check for duplicate slab names
  tracing: Fix wraparound problems in "uptime" trace clock
  blkcg: don't call into policy draining if root_blkg is already gone
  ahci: add support for the Promise FastTrak TX8660 SATA HBA (ahci mode)
  libata: introduce ata_host->n_tags to avoid oops on SAS controllers
  libata: support the ata host which implements a queue depth less than 32
  block: don't assume last put of shared tags is for the host
  block: provide compat ioctl for BLKZEROOUT
  media: tda10071: force modulation to QPSK on DVB-S
  media: hdpvr: fix two audio bugs
  Linux 3.10.50
  ARC: Implement ptrace(PTRACE_GET_THREAD_AREA)
  sched: Fix possible divide by zero in avg_atom() calculation
  locking/mutex: Disable optimistic spinning on some architectures
  PM / sleep: Fix request_firmware() error at resume
  dm cache metadata: do not allow the data block size to change
  dm thin metadata: do not allow the data block size to change
  alarmtimer: Fix bug where relative alarm timers were treated as absolute
  drm/radeon: avoid leaking edid data
  drm/qxl: return IRQ_NONE if it was not our irq
  drm/radeon: set default bl level to something reasonable
  irqchip: gic: Fix core ID calculation when topology is read from DT
  irqchip: gic: Add support for cortex a7 compatible string
  ring-buffer: Fix polling on trace_pipe
  mwifiex: fix Tx timeout issue
  perf/x86/intel: ignore CondChgd bit to avoid false NMI handling
  ipv4: fix buffer overflow in ip_options_compile()
  dns_resolver: Null-terminate the right string
  dns_resolver: assure that dns_query() result is null-terminated
  sunvnet: clean up objects created in vnet_new() on vnet_exit()
  net: pppoe: use correct channel MTU when using Multilink PPP
  net: sctp: fix information leaks in ulpevent layer
  tipc: clear 'next'-pointer of message fragments before reassembly
  be2net: set EQ DB clear-intr bit in be_open()
  netlink: Fix handling of error from netlink_dump().
  net: mvneta: Fix big endian issue in mvneta_txq_desc_csum()
  net: mvneta: fix operation in 10 Mbit/s mode
  appletalk: Fix socket referencing in skb
  tcp: fix false undo corner cases
  igmp: fix the problem when mc leave group
  net: qmi_wwan: add two Sierra Wireless/Netgear devices
  net: qmi_wwan: Add ID for Telewell TW-LTE 4G v2
  ipv4: icmp: Fix pMTU handling for rare case
  tcp: Fix divide by zero when pushing during tcp-repair
  bnx2x: fix possible panic under memory stress
  net: fix sparse warning in sk_dst_set()
  ipv4: irq safe sk_dst_[re]set() and ipv4_sk_update_pmtu() fix
  ipv4: fix dst race in sk_dst_get()
  8021q: fix a potential memory leak
  net: sctp: check proc_dointvec result in proc_sctp_do_auth
  tcp: fix tcp_match_skb_to_sack() for unaligned SACK at end of an skb
  ip_tunnel: fix ip_tunnel_lookup
  shmem: fix splicing from a hole while it's punched
  shmem: fix faulting into a hole, not taking i_mutex
  shmem: fix faulting into a hole while it's punched
  iwlwifi: dvm: don't enable CTS to self
  igb: do a reset on SR-IOV re-init if device is down
  hwmon: (adt7470) Fix writes to temperature limit registers
  hwmon: (da9052) Don't use dash in the name attribute
  hwmon: (da9055) Don't use dash in the name attribute
  tracing: Add ftrace_trace_stack into __trace_puts/__trace_bputs
  tracing: Fix graph tracer with stack tracer on other archs
  fuse: handle large user and group ID
  Bluetooth: Ignore H5 non-link packets in non-active state
  Drivers: hv: util: Fix a bug in the KVP code
  media: gspca_pac7302: Add new usb-id for Genius i-Look 317
  usb: Check if port status is equal to RxDetect

Signed-off-by: Ian Maund <imaund@codeaurora.org>
2015-04-24 18:04:40 -07:00
Oleg Nesterov 12279351a4 exit: pidns: alloc_pid() leaks pid_namespace if child_reaper is exiting
commit 24c037ebf5723d4d9ab0996433cee4f96c292a4d upstream.

alloc_pid() does get_pid_ns() beforehand but forgets to put_pid_ns() if it
fails because disable_pid_allocation() was called by the exiting
child_reaper.

We could simply move get_pid_ns() down to successful return, but this fix
tries to be as trivial as possible.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Aaron Tomlin <atomlin@redhat.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Serge Hallyn <serge.hallyn@ubuntu.com>
Cc: Sterling Alexander <stalexan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08 09:58:17 -08:00
Ian Maund f1b32d4e47 Merge upstream linux-stable v3.10.28 into msm-3.10
The following commits have been reverted from this merge, as they are
known to introduce new bugs and are currently incompatible with our
audio implementation. Investigation of these commits is ongoing, and
they are expected to be brought in at a later time:

86e6de7 ALSA: compress: fix drain calls blocking other compress functions (v6)
16442d4 ALSA: compress: fix drain calls blocking other compress functions

This merge commit also includes a change in block, necessary for
compilation. Upstream has modified elevator_init_fn to prevent race
conditions, requring updates to row_init_queue and test_init_queue.

* commit 'v3.10.28': (1964 commits)
  Linux 3.10.28
  ARM: 7938/1: OMAP4/highbank: Flush L2 cache before disabling
  drm/i915: Don't grab crtc mutexes in intel_modeset_gem_init()
  serial: amba-pl011: use port lock to guard control register access
  mm: Make {,set}page_address() static inline if WANT_PAGE_VIRTUAL
  md/raid5: Fix possible confusion when multiple write errors occur.
  md/raid10: fix two bugs in handling of known-bad-blocks.
  md/raid10: fix bug when raid10 recovery fails to recover a block.
  md: fix problem when adding device to read-only array with bitmap.
  drm/i915: fix DDI PLLs HW state readout code
  nilfs2: fix segctor bug that causes file system corruption
  thp: fix copy_page_rep GPF by testing is_huge_zero_pmd once only
  ftrace/x86: Load ftrace_ops in parameter not the variable holding it
  SELinux: Fix possible NULL pointer dereference in selinux_inode_permission()
  writeback: Fix data corruption on NFS
  hwmon: (coretemp) Fix truncated name of alarm attributes
  vfs: In d_path don't call d_dname on a mount point
  staging: comedi: adl_pci9111: fix incorrect irq passed to request_irq()
  staging: comedi: addi_apci_1032: fix subdevice type/flags bug
  mm/memory-failure.c: recheck PageHuge() after hugetlb page migrate successfully
  GFS2: Increase i_writecount during gfs2_setattr_chown
  perf/x86/amd/ibs: Fix waking up from S3 for AMD family 10h
  perf scripting perl: Fix build error on Fedora 12
  ARM: 7815/1: kexec: offline non panic CPUs on Kdump panic
  Linux 3.10.27
  sched: Guarantee new group-entities always have weight
  sched: Fix hrtimer_cancel()/rq->lock deadlock
  sched: Fix cfs_bandwidth misuse of hrtimer_expires_remaining
  sched: Fix race on toggling cfs_bandwidth_used
  x86, fpu, amd: Clear exceptions in AMD FXSAVE workaround
  netfilter: nf_nat: fix access to uninitialized buffer in IRC NAT helper
  SCSI: sd: Reduce buffer size for vpd request
  intel_pstate: Add X86_FEATURE_APERFMPERF to cpu match parameters.
  mac80211: move "bufferable MMPDU" check to fix AP mode scan
  ACPI / Battery: Add a _BIX quirk for NEC LZ750/LS
  ACPI / TPM: fix memory leak when walking ACPI namespace
  mfd: rtsx_pcr: Disable interrupts before cancelling delayed works
  clk: exynos5250: fix sysmmu_mfc{l,r} gate clocks
  clk: samsung: exynos5250: Add CLK_IGNORE_UNUSED flag for the sysreg clock
  clk: samsung: exynos4: Correct SRC_MFC register
  clk: clk-divider: fix divisor > 255 bug
  ahci: add PCI ID for Marvell 88SE9170 SATA controller
  parisc: Ensure full cache coherency for kmap/kunmap
  drm/nouveau/bios: make jump conditional
  ARM: shmobile: mackerel: Fix coherent DMA mask
  ARM: shmobile: armadillo: Fix coherent DMA mask
  ARM: shmobile: kzm9g: Fix coherent DMA mask
  ARM: dts: exynos5250: Fix MDMA0 clock number
  ARM: fix "bad mode in ... handler" message for undefined instructions
  ARM: fix footbridge clockevent device
  net: Loosen constraints for recalculating checksum in skb_segment()
  bridge: use spin_lock_bh() in br_multicast_set_hash_max
  netpoll: Fix missing TXQ unlock and and OOPS.
  net: llc: fix use after free in llc_ui_recvmsg
  virtio-net: fix refill races during restore
  virtio_net: don't leak memory or block when too many frags
  virtio-net: make all RX paths handle errors consistently
  virtio_net: fix error handling for mergeable buffers
  vlan: Fix header ops passthru when doing TX VLAN offload.
  net: rose: restore old recvmsg behavior
  rds: prevent dereference of a NULL device
  ipv6: always set the new created dst's from in ip6_rt_copy
  net: fec: fix potential use after free
  hamradio/yam: fix info leak in ioctl
  drivers/net/hamradio: Integer overflow in hdlcdrv_ioctl()
  net: inet_diag: zero out uninitialized idiag_{src,dst} fields
  ip_gre: fix msg_name parsing for recvfrom/recvmsg
  net: unix: allow bind to fail on mutex lock
  ipv6: fix illegal mac_header comparison on 32bit
  netvsc: don't flush peers notifying work during setting mtu
  tg3: Initialize REG_BASE_ADDR at PCI config offset 120 to 0
  net: unix: allow set_peek_off to fail
  net: drop_monitor: fix the value of maxattr
  ipv6: don't count addrconf generated routes against gc limit
  packet: fix send path when running with proto == 0
  virtio: delete napi structures from netdev before releasing memory
  macvtap: signal truncated packets
  tun: update file current position
  macvtap: update file current position
  macvtap: Do not double-count received packets
  rds: prevent BUG_ON triggered on congestion update to loopback
  net: do not pretend FRAGLIST support
  IPv6: Fixed support for blackhole and prohibit routes
  HID: Revert "Revert "HID: Fix logitech-dj: missing Unifying device issue""
  gpio-rcar: R-Car GPIO IRQ share interrupt
  clocksource: em_sti: Set cpu_possible_mask to fix SMP broadcast
  irqchip: renesas-irqc: Fix irqc_probe error handling
  Linux 3.10.26
  sh: add EXPORT_SYMBOL(min_low_pfn) and EXPORT_SYMBOL(max_low_pfn) to sh_ksyms_32.c
  ext4: fix bigalloc regression
  arm64: Use Normal NonCacheable memory for writecombine
  arm64: Do not flush the D-cache for anonymous pages
  arm64: Avoid cache flushing in flush_dcache_page()
  ARM: KVM: arch_timers: zero CNTVOFF upon return to host
  ARM: hyp: initialize CNTVOFF to zero
  clocksource: arch_timer: use virtual counters
  arm64: Remove unused cpu_name ascii in arch/arm64/mm/proc.S
  arm64: dts: Reserve the memory used for secondary CPU release address
  arm64: check for number of arguments in syscall_get/set_arguments()
  arm64: fix possible invalid FPSIMD initialization state
  ...

Change-Id: Ia0e5d71b536ab49ec3a1179d59238c05bdd03106
Signed-off-by: Ian Maund <imaund@codeaurora.org>
2014-03-24 14:28:34 -07:00
Eric W. Biederman 5a48788ca4 pidns: Fix hang in zap_pid_ns_processes by sending a potentially extra wakeup
commit a606488513543312805fab2b93070cefe6a3016c upstream.

Serge Hallyn <serge.hallyn@ubuntu.com> writes:

> Since commit af4b8a83ad it's been
> possible to get into a situation where a pidns reaper is
> <defunct>, reparented to host pid 1, but never reaped.  How to
> reproduce this is documented at
>
> https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1168526
> (and see
> https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1168526/comments/13)
> In short, run repeated starts of a container whose init is
>
> Process.exit(0);
>
> sysrq-t when such a task is playing zombie shows:
>
> [  131.132978] init            x ffff88011fc14580     0  2084   2039 0x00000000
> [  131.132978]  ffff880116e89ea8 0000000000000002 ffff880116e89fd8 0000000000014580
> [  131.132978]  ffff880116e89fd8 0000000000014580 ffff8801172a0000 ffff8801172a0000
> [  131.132978]  ffff8801172a0630 ffff88011729fff0 ffff880116e14650 ffff88011729fff0
> [  131.132978] Call Trace:
> [  131.132978]  [<ffffffff816f6159>] schedule+0x29/0x70
> [  131.132978]  [<ffffffff81064591>] do_exit+0x6e1/0xa40
> [  131.132978]  [<ffffffff81071eae>] ? signal_wake_up_state+0x1e/0x30
> [  131.132978]  [<ffffffff8106496f>] do_group_exit+0x3f/0xa0
> [  131.132978]  [<ffffffff810649e4>] SyS_exit_group+0x14/0x20
> [  131.132978]  [<ffffffff8170102f>] tracesys+0xe1/0xe6
>
> Further debugging showed that every time this happened, zap_pid_ns_processes()
> started with nr_hashed being 3, while we were expecting it to drop to 2.
> Any time it didn't happen, nr_hashed was 1 or 2.  So the reaper was
> waiting for nr_hashed to become 2, but free_pid() only wakes the reaper
> if nr_hashed hits 1.

The issue is that when the task group leader of an init process exits
before other tasks of the init process when the init process finally
exits it will be a secondary task sleeping in zap_pid_ns_processes and
waiting to wake up when the number of hashed pids drops to two.  This
case waits forever as free_pid only sends a wake up when the number of
hashed pids drops to 1.

To correct this the simple strategy of sending a possibly unncessary
wake up when the number of hashed pids drops to 2 is adopted.

Sending one extraneous wake up is relatively harmless, at worst we
waste a little cpu time in the rare case when a pid namespace
appropaches exiting.

We can detect the case when the pid namespace drops to just two pids
hashed race free in free_pid.

Dereferencing pid_ns->child_reaper with the pidmap_lock held is safe
without out the tasklist_lock because it is guaranteed that the
detach_pid will be called on the child_reaper before it is freed and
detach_pid calls __change_pid which calls free_pid which takes the
pidmap_lock.  __change_pid only calls free_pid if this is the
last use of the pid.  For a thread that is not the thread group leader
the threads pid will only ever have one user because a threads pid
is not allowed to be the pid of a process, of a process group or
a session.  For a thread that is a thread group leader all of
the other threads of that process will be reaped before it is allowed
for the thread group leader to be reaped ensuring there will only
be one user of the threads pid as a process pid.  Furthermore
because the thread is the init process of a pid namespace all of the
other processes in the pid namespace will have also been already freed
leading to the fact that the pid will not be used as a session pid or
a process group pid for any other running process.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Tested-by: Serge Hallyn <serge.hallyn@canonical.com>
Reported-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-09-26 17:18:27 -07:00
Jordan Crouse 21ddf13166 kernel: Mark find_task_by_vpid with EXPORT_SYMBOL_GPL
make find_task_by_vpid() visible to modules.

Change-Id: Ic0dedbad21a0d908a08dd64479a7acc6b3197d5f
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
2013-08-22 18:07:54 -07:00
Linus Torvalds 20b4fb4852 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull VFS updates from Al Viro,

Misc cleanups all over the place, mainly wrt /proc interfaces (switch
create_proc_entry to proc_create(), get rid of the deprecated
create_proc_read_entry() in favor of using proc_create_data() and
seq_file etc).

7kloc removed.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits)
  don't bother with deferred freeing of fdtables
  proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h
  proc: Make the PROC_I() and PDE() macros internal to procfs
  proc: Supply a function to remove a proc entry by PDE
  take cgroup_open() and cpuset_open() to fs/proc/base.c
  ppc: Clean up scanlog
  ppc: Clean up rtas_flash driver somewhat
  hostap: proc: Use remove_proc_subtree()
  drm: proc: Use remove_proc_subtree()
  drm: proc: Use minor->index to label things, not PDE->name
  drm: Constify drm_proc_list[]
  zoran: Don't print proc_dir_entry data in debug
  reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show()
  proc: Supply an accessor for getting the data from a PDE's parent
  airo: Use remove_proc_subtree()
  rtl8192u: Don't need to save device proc dir PDE
  rtl8187se: Use a dir under /proc/net/r8180/
  proc: Add proc_mkdir_data()
  proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h}
  proc: Move PDE_NET() to fs/proc/proc_net.c
  ...
2013-05-01 17:51:54 -07:00
David Howells 0bb80f2405 proc: Split the namespace stuff out into linux/proc_ns.h
Split the proc namespace stuff out into linux/proc_ns.h.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: netdev@vger.kernel.org
cc: Serge E. Hallyn <serge.hallyn@ubuntu.com>
cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-01 17:29:39 -04:00
Raphael S.Carvalho 5cc5445164 pid_namespace.c/.h: simplify defines
Move BITS_PER_PAGE from pid_namespace.c to pid_namespace.h, since we can
simplify the define PID_MAP_ENTRIES by using the BITS_PER_PAGE.

[akpm@linux-foundation.org: kernel/pid.c:54:1: warning: "BITS_PER_PAGE" redefined]
Signed-off-by: Raphael S.Carvalho <raphael.scarv@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-30 17:04:07 -07:00
Raphael S. Carvalho 8db049b3d6 kernel/pid.c: improve flow of a loop inside alloc_pidmap.
find_next_offset() searches for an available "cleaned bit" in the
respective pid bitmap (page), so returns the offset if found, otherwise
it returns a value equals to BITS_PER_PAGE.

For example, suppose find_next_offset didn't find any available bit, so
there's no purpose to call mk_pid (Wasteful Cpu Cycles).

Therefore, I found it could be better to call mk_pid after the checking
(offset < BITS_PER_PAGE) returned sucessfully! Another point: If (offset
< BITS_PER_PAGE) results in a "failure", then mk_pid would be called
again afterwards.

[akpm@linux-foundation.org: simplify code]
Signed-off-by: Raphael S. Carvalho <raphael.scarv@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-30 17:04:07 -07:00
Sasha Levin b67bfe0d42 hlist: drop the node parameter from iterators
I'm not sure why, but the hlist for each entry iterators were conceived

        list_for_each_entry(pos, head, member)

The hlist ones were greedy and wanted an extra parameter:

        hlist_for_each_entry(tpos, pos, head, member)

Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.

Besides the semantic patch, there was some manual work required:

 - Fix up the actual hlist iterators in linux/list.h
 - Fix up the declaration of other iterators based on the hlist ones.
 - A very small amount of places were using the 'node' parameter, this
 was modified to use 'obj->member' instead.
 - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
 properly, so those had to be fixed up manually.

The semantic patch which is mostly the work of Peter Senna Tschudin is here:

@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

type T;
expression a,c,d,e;
identifier b;
statement S;
@@

-T b;
    <+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
    ...+>

[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27 19:10:24 -08:00
Eric W. Biederman 6e6668845f kernel/pid.c: reenable interrupts when alloc_pid() fails because init has exited
We're forgetting to reenable local interrupts on an error path.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Reported-by: Josh Boyer <jwboyer@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-12 14:34:00 -08:00
Eric W. Biederman c876ad7682 pidns: Stop pid allocation when init dies
Oleg pointed out that in a pid namespace the sequence.
- pid 1 becomes a zombie
- setns(thepidns), fork,...
- reaping pid 1.
- The injected processes exiting.

Can lead to processes attempting access their child reaper and
instead following a stale pointer.

That waitpid for init can return before all of the processes in
the pid namespace have exited is also unfortunate.

Avoid these problems by disabling the allocation of new pids in a pid
namespace when init dies, instead of when the last process in a pid
namespace is reaped.

Pointed-out-by:  Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-12-25 16:10:05 -08:00
Linus Torvalds 848b81415c Merge branch 'akpm' (Andrew's patch-bomb)
Merge misc patches from Andrew Morton:
 "Incoming:

   - lots of misc stuff

   - backlight tree updates

   - lib/ updates

   - Oleg's percpu-rwsem changes

   - checkpatch

   - rtc

   - aoe

   - more checkpoint/restart support

  I still have a pile of MM stuff pending - Pekka should be merging
  later today after which that is good to go.  A number of other things
  are twiddling thumbs awaiting maintainer merges."

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (180 commits)
  scatterlist: don't BUG when we can trivially return a proper error.
  docs: update documentation about /proc/<pid>/fdinfo/<fd> fanotify output
  fs, fanotify: add @mflags field to fanotify output
  docs: add documentation about /proc/<pid>/fdinfo/<fd> output
  fs, notify: add procfs fdinfo helper
  fs, exportfs: add exportfs_encode_inode_fh() helper
  fs, exportfs: escape nil dereference if no s_export_op present
  fs, epoll: add procfs fdinfo helper
  fs, eventfd: add procfs fdinfo helper
  procfs: add ability to plug in auxiliary fdinfo providers
  tools/testing/selftests/kcmp/kcmp_test.c: print reason for failure in kcmp_test
  breakpoint selftests: print failure status instead of cause make error
  kcmp selftests: print fail status instead of cause make error
  kcmp selftests: make run_tests fix
  mem-hotplug selftests: print failure status instead of cause make error
  cpu-hotplug selftests: print failure status instead of cause make error
  mqueue selftests: print failure status instead of cause make error
  vm selftests: print failure status instead of cause make error
  ubifs: use prandom_bytes
  mtd: nandsim: use prandom_bytes
  ...
2012-12-17 20:58:12 -08:00
Gao feng a5ba911ec3 pidns: remove unused is_container_init()
Since commit 1cdcbec1a3 ("CRED: Neuter sys_capset()")
is_container_init() has no callers.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Cc: David Howells <dhowells@redhat.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Cc: James Morris <jmorris@namei.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-17 17:15:23 -08:00
Linus Torvalds 6a2b60b17b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull user namespace changes from Eric Biederman:
 "While small this set of changes is very significant with respect to
  containers in general and user namespaces in particular.  The user
  space interface is now complete.

  This set of changes adds support for unprivileged users to create user
  namespaces and as a user namespace root to create other namespaces.
  The tyranny of supporting suid root preventing unprivileged users from
  using cool new kernel features is broken.

  This set of changes completes the work on setns, adding support for
  the pid, user, mount namespaces.

  This set of changes includes a bunch of basic pid namespace
  cleanups/simplifications.  Of particular significance is the rework of
  the pid namespace cleanup so it no longer requires sending out
  tendrils into all kinds of unexpected cleanup paths for operation.  At
  least one case of broken error handling is fixed by this cleanup.

  The files under /proc/<pid>/ns/ have been converted from regular files
  to magic symlinks which prevents incorrect caching by the VFS,
  ensuring the files always refer to the namespace the process is
  currently using and ensuring that the ptrace_mayaccess permission
  checks are always applied.

  The files under /proc/<pid>/ns/ have been given stable inode numbers
  so it is now possible to see if different processes share the same
  namespaces.

  Through the David Miller's net tree are changes to relax many of the
  permission checks in the networking stack to allowing the user
  namespace root to usefully use the networking stack.  Similar changes
  for the mount namespace and the pid namespace are coming through my
  tree.

  Two small changes to add user namespace support were commited here adn
  in David Miller's -net tree so that I could complete the work on the
  /proc/<pid>/ns/ files in this tree.

  Work remains to make it safe to build user namespaces and 9p, afs,
  ceph, cifs, coda, gfs2, ncpfs, nfs, nfsd, ocfs2, and xfs so the
  Kconfig guard remains in place preventing that user namespaces from
  being built when any of those filesystems are enabled.

  Future design work remains to allow root users outside of the initial
  user namespace to mount more than just /proc and /sys."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (38 commits)
  proc: Usable inode numbers for the namespace file descriptors.
  proc: Fix the namespace inode permission checks.
  proc: Generalize proc inode allocation
  userns: Allow unprivilged mounts of proc and sysfs
  userns: For /proc/self/{uid,gid}_map derive the lower userns from the struct file
  procfs: Print task uids and gids in the userns that opened the proc file
  userns: Implement unshare of the user namespace
  userns: Implent proc namespace operations
  userns: Kill task_user_ns
  userns: Make create_new_namespaces take a user_ns parameter
  userns: Allow unprivileged use of setns.
  userns: Allow unprivileged users to create new namespaces
  userns: Allow setting a userns mapping to your current uid.
  userns: Allow chown and setgid preservation
  userns: Allow unprivileged users to create user namespaces.
  userns: Ignore suid and sgid on binaries if the uid or gid can not be mapped
  userns: fix return value on mntns_install() failure
  vfs: Allow unprivileged manipulation of the mount namespace.
  vfs: Only support slave subtrees across different user namespaces
  vfs: Add a user namespace reference from struct mnt_namespace
  ...
2012-12-17 15:44:47 -08:00
Nadia Yvette Chambers 6d49e352ae propagate name change to comments in kernel source
I've legally changed my name with New York State, the US Social Security
Administration, et al. This patch propagates the name change and change
in initials and login to comments in the kernel source as well.

Signed-off-by: Nadia Yvette Chambers <nyc@holomorphy.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-12-06 10:39:54 +01:00
Eric W. Biederman 98f842e675 proc: Usable inode numbers for the namespace file descriptors.
Assign a unique proc inode to each namespace, and use that
inode number to ensure we only allocate at most one proc
inode for every namespace in proc.

A single proc inode per namespace allows userspace to test
to see if two processes are in the same namespace.

This has been a long requested feature and only blocked because
a naive implementation would put the id in a global space and
would ultimately require having a namespace for the names of
namespaces, making migration and certain virtualization tricks
impossible.

We still don't have per superblock inode numbers for proc, which
appears necessary for application unaware checkpoint/restart and
migrations (if the application is using namespace file descriptors)
but that is now allowd by the design if it becomes important.

I have preallocated the ipc and uts initial proc inode numbers so
their structures can be statically initialized.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-11-20 04:19:49 -08:00
Eric W. Biederman af4b8a83ad pidns: Wait in zap_pid_ns_processes until pid_ns->nr_hashed == 1
Looking at pid_ns->nr_hashed is a bit simpler and it works for
disjoint process trees that an unshare or a join of a pid_namespace
may create.

Acked-by: "Serge E. Hallyn" <serge@hallyn.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-11-19 05:59:12 -08:00
Eric W. Biederman 5e1182deb8 pidns: Don't allow new processes in a dead pid namespace.
Set nr_hashed to -1 just before we schedule the work to cleanup proc.
Test nr_hashed just before we hash a new pid and if nr_hashed is < 0
fail.

This guaranteees that processes never enter a pid namespaces after we
have cleaned up the state to support processes in a pid namespace.

Currently sending SIGKILL to all of the process in a pid namespace as
init exists gives us this guarantee but we need something a little
stronger to support unsharing and joining a pid namespace.

Acked-by: "Serge E. Hallyn" <serge@hallyn.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-11-19 05:59:11 -08:00
Eric W. Biederman 0a01f2cc39 pidns: Make the pidns proc mount/umount logic obvious.
Track the number of pids in the proc hash table.  When the number of
pids goes to 0 schedule work to unmount the kernel mount of proc.

Move the mount of proc into alloc_pid when we allocate the pid for
init.

Remove the surprising calls of pid_ns_release proc in fork and
proc_flush_task.  Those code paths really shouldn't know about proc
namespace implementation details and people have demonstrated several
times that finding and understanding those code paths is difficult and
non-obvious.

Because of the call path detach pid is alwasy called with the
rtnl_lock held free_pid is not allowed to sleep, so the work to
unmounting proc is moved to a work queue.  This has the side benefit
of not blocking the entire world waiting for the unnecessary
rcu_barrier in deactivate_locked_super.

In the process of making the code clear and obvious this fixes a bug
reported by Gao feng <gaofeng@cn.fujitsu.com> where we would leak a
mount of proc during clone(CLONE_NEWPID|CLONE_NEWNET) if copy_pid_ns
succeeded and copy_net_ns failed.

Acked-by: "Serge E. Hallyn" <serge@hallyn.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-11-19 05:59:10 -08:00
Eric W. Biederman 17cf22c33e pidns: Use task_active_pid_ns where appropriate
The expressions tsk->nsproxy->pid_ns and task_active_pid_ns
aka ns_of_pid(task_pid(tsk)) should have the same number of
cache line misses with the practical difference that
ns_of_pid(task_pid(tsk)) is released later in a processes life.

Furthermore by using task_active_pid_ns it becomes trivial
to write an unshare implementation for the the pid namespace.

So I have used task_active_pid_ns everywhere I can.

In fork since the pid has not yet been attached to the
process I use ns_of_pid, to achieve the same effect.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-11-19 05:59:09 -08:00
Eric W. Biederman 49f4d8b93c pidns: Capture the user namespace and filter ns_last_pid
- Capture the the user namespace that creates the pid namespace
- Use that user namespace to test if it is ok to write to
  /proc/sys/kernel/ns_last_pid.

Zhao Hongjiang <zhaohongjiang@huawei.com> noticed I was missing a put_user_ns
in when destroying a pid_ns.  I have foloded his patch into this one
so that bisects will work properly.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-11-19 05:57:31 -08:00
Eric W. Biederman 4f82f45730 net ip6 flowlabel: Make owner a union of struct pid * and kuid_t
Correct a long standing omission and use struct pid in the owner
field of struct ip6_flowlabel when the share type is IPV6_FL_S_PROCESS.
This guarantees we don't have issues when pid wraparound occurs.

Use a kuid_t in the owner field of struct ip6_flowlabel when the
share type is IPV6_FL_S_USER to add user namespace support.

In /proc/net/ip6_flowlabel capture the current pid namespace when
opening the file and release the pid namespace when the file is
closed ensuring we print the pid owner value that is meaning to
the reader of the file.  Similarly use from_kuid_munged to print
uid values that are meaningful to the reader of the file.

This requires exporting pid_nr_ns so that ipv6 can continue to built
as a module.  Yoiks what silliness

Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-08-14 21:49:25 -07:00
Tim Bird 31fe62b958 mm: add a low limit to alloc_large_system_hash
UDP stack needs a minimum hash size value for proper operation and also
uses alloc_large_system_hash() for proper NUMA distribution of its hash
tables and automatic sizing depending on available system memory.

On some low memory situations, udp_table_init() must ignore the
alloc_large_system_hash() result and reallocs a bigger memory area.

As we cannot easily free old hash table, we leak it and kmemleak can
issue a warning.

This patch adds a low limit parameter to alloc_large_system_hash() to
solve this problem.

We then specify UDP_HTABLE_SIZE_MIN for UDP/UDPLite hash table
allocation.

Reported-by: Mark Asselstine <mark.asselstine@windriver.com>
Reported-by: Tim Bird <tim.bird@am.sony.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-24 00:28:21 -04:00
Dimitri Sivanich 074b85175a vfs: fix panic in __d_lookup() with high dentry hashtable counts
When the number of dentry cache hash table entries gets too high
(2147483648 entries), as happens by default on a 16TB system, use of a
signed integer in the dcache_init() initialization loop prevents the
dentry_hashtable from getting initialized, causing a panic in
__d_lookup().  Fix this in dcache_init() and similar areas.

Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-02-13 20:45:38 -05:00
Pavel Emelyanov b8f566b04d sysctl: add the kernel.ns_last_pid control
The sysctl works on the current task's pid namespace, getting and setting
its last_pid field.

Writing is allowed for CAP_SYS_ADMIN-capable tasks thus making it possible
to create a task with desired pid value.  This ability is required badly
for the checkpoint/restore in userspace.

This approach suits all the parties for now.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12 20:13:11 -08:00
Paul Gortmaker 9984de1a5a kernel: Map most files to use export.h instead of module.h
The changed files were only including linux/module.h for the
EXPORT_SYMBOL infrastructure, and nothing else.  Revector them
onto the isolated export header for faster compile times.

Nothing to see here but a whole lot of instances of:

  -#include <linux/module.h>
  +#include <linux/export.h>

This commit is only changing the kernel dir; next targets
will probably be mm, fs, the arch dirs, etc.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 09:20:12 -04:00
Paul E. McKenney b3fbab0571 rcu: Restore checks for blocking in RCU read-side critical sections
Long ago, using TREE_RCU with PREEMPT would result in "scheduling
while atomic" diagnostics if you blocked in an RCU read-side critical
section.  However, PREEMPT now implies TREE_PREEMPT_RCU, which defeats
this diagnostic.  This commit therefore adds a replacement diagnostic
based on PROVE_RCU.

Because rcu_lockdep_assert() and lockdep_rcu_dereference() are now being
used for things that have nothing to do with rcu_dereference(), rename
lockdep_rcu_dereference() to lockdep_rcu_suspicious() and add a third
argument that is a string indicating what is suspicious.  This third
argument is passed in from a new third argument to rcu_lockdep_assert().
Update all calls to rcu_lockdep_assert() to add an informative third
argument.

Also, add a pair of rcu_lockdep_assert() calls from within
rcu_note_context_switch(), one complaining if a context switch occurs
in an RCU-bh read-side critical section and another complaining if a
context switch occurs in an RCU-sched read-side critical section.
These are present only if the PROVE_RCU kernel parameter is enabled.

Finally, fix some checkpatch whitespace complaints in lockdep.c.

Again, you must enable PROVE_RCU to see these new diagnostics.  But you
are enabling PROVE_RCU to check out new RCU uses in any case, aren't you?

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-09-28 21:36:37 -07:00
Michal Hocko d8bf4ca9ca rcu: treewide: Do not use rcu_read_lock_held when calling rcu_dereference_check
Since ca5ecddf (rcu: define __rcu address space modifier for sparse)
rcu_dereference_check use rcu_read_lock_held as a part of condition
automatically so callers do not have to do that as well.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-07-08 22:21:58 +02:00
Linus Torvalds c78193e9c7 next_pidmap: fix overflow condition
next_pidmap() just quietly accepted whatever 'last' pid that was passed
in, which is not all that safe when one of the users is /proc.

Admittedly the proc code should do some sanity checking on the range
(and that will be the next commit), but that doesn't mean that the
helper functions should just do that pidmap pointer arithmetic without
checking the range of its arguments.

So clamp 'last' to PID_MAX_LIMIT.  The fact that we then do "last+1"
doesn't really matter, the for-loop does check against the end of the
pidmap array properly (it's only the actual pointer arithmetic overflow
case we need to worry about, and going one bit beyond isn't going to
overflow).

[ Use PID_MAX_LIMIT rather than pid_max as per Eric Biederman ]

Reported-by: Tavis Ormandy <taviso@cmpxchg8b.com>
Analyzed-by: Robert Święcki <robert@swiecki.net>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-18 10:35:30 -07:00
Rik van Riel 77c100c83e export pid symbols needed for kvm_vcpu_on_spin
Export the symbols required for a race-free kvm_vcpu_on_spin.

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-03-17 13:08:28 -03:00
Tetsuo Handa 4221a9918e Add RCU check for find_task_by_vpid().
find_task_by_vpid() says "Must be called under rcu_read_lock().". But due to
commit 3120438 "rcu: Disable lockdep checking in RCU list-traversal primitives",
we are currently unable to catch "find_task_by_vpid() with tasklist_lock held
but RCU lock not held" errors due to the RCU-lockdep checks being
suppressed in the RCU variants of the struct list_head traversals.
This commit therefore places an explicit check for being in an RCU
read-side critical section in find_task_by_pid_ns().

  ===================================================
  [ INFO: suspicious rcu_dereference_check() usage. ]
  ---------------------------------------------------
  kernel/pid.c:386 invoked rcu_dereference_check() without protection!

  other info that might help us debug this:

  rcu_scheduler_active = 1, debug_locks = 1
  1 lock held by rc.sysinit/1102:
   #0:  (tasklist_lock){.+.+..}, at: [<c1048340>] sys_setpgid+0x40/0x160

  stack backtrace:
  Pid: 1102, comm: rc.sysinit Not tainted 2.6.35-rc3-dirty #1
  Call Trace:
   [<c105e714>] lockdep_rcu_dereference+0x94/0xb0
   [<c104b4cd>] find_task_by_pid_ns+0x6d/0x70
   [<c104b4e8>] find_task_by_vpid+0x18/0x20
   [<c1048347>] sys_setpgid+0x47/0x160
   [<c1002b50>] sysenter_do_call+0x12/0x36

Commit updated to use a new rcu_lockdep_assert() exported API rather than
the old internal __do_rcu_dereference().

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2010-08-19 17:18:02 -07:00
Arnd Bergmann 67bdbffd69 rculist: avoid __rcu annotations
This avoids warnings from missing __rcu annotations
in the rculist implementation, making it possible to
use the same lists in both RCU and non-RCU cases.

We can add rculist annotations later, together with
lockdep support for rculist, which is missing as well,
but that may involve changing all the users.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2010-08-19 17:18:00 -07:00
Oleg Nesterov c52b0b91ba pids: alloc_pidmap: remove the unnecessary boundary checks
alloc_pidmap() calculates max_scan so that if the initial offset != 0 we
inspect the first map->page twice.  This is correct, we want to find the
unused bits < offset in this bitmap block.  Add the comment.

But it doesn't make any sense to stop the find_next_offset() loop when we
are looking into this map->page for the second time.  We have already
already checked the bits >= offset during the first attempt, it is fine to
do this again, no matter if we succeed this time or not.

Remove this hard-to-understand code.  It optimizes the very unlikely case
when we are going to fail, but slows down the more likely case.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Salman Qazi <sqazi@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11 08:59:20 -07:00
Salman 5fdee8c4a5 pids: fix a race in pid generation that causes pids to be reused immediately
A program that repeatedly forks and waits is susceptible to having the
same pid repeated, especially when it competes with another instance of
the same program.  This is really bad for bash implementation.
Furthermore, many shell scripts assume that pid numbers will not be used
for some length of time.

Race Description:

A                                    B

// pid == offset == n                // pid == offset == n + 1
test_and_set_bit(offset, map->page)
                                     test_and_set_bit(offset, map->page);
                                     pid_ns->last_pid = pid;
pid_ns->last_pid = pid;
                                     // pid == n + 1 is freed (wait())

                                     // Next fork()...
                                     last = pid_ns->last_pid; // == n
                                     pid = last + 1;

Code to reproduce it (Running multiple instances is more effective):

#include <errno.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>

// The distance mod 32768 between two pids, where the first pid is expected
// to be smaller than the second.
int PidDistance(pid_t first, pid_t second) {
  return (second + 32768 - first) % 32768;
}

int main(int argc, char* argv[]) {
  int failed = 0;
  pid_t last_pid = 0;
  int i;
  printf("%d\n", sizeof(pid_t));
  for (i = 0; i < 10000000; ++i) {
    if (i % 32786 == 0)
      printf("Iter: %d\n", i/32768);
    int child_exit_code = i % 256;
    pid_t pid = fork();
    if (pid == -1) {
      fprintf(stderr, "fork failed, iteration %d, errno=%d", i, errno);
      exit(1);
    }
    if (pid == 0) {
      // Child
      exit(child_exit_code);
    } else {
      // Parent
      if (i > 0) {
        int distance = PidDistance(last_pid, pid);
        if (distance == 0 || distance > 30000) {
          fprintf(stderr,
                  "Unexpected pid sequence: previous fork: pid=%d, "
                  "current fork: pid=%d for iteration=%d.\n",
                  last_pid, pid, i);
          failed = 1;
        }
      }
      last_pid = pid;
      int status;
      int reaped = wait(&status);
      if (reaped != pid) {
        fprintf(stderr,
                "Wait return value: expected pid=%d, "
                "got %d, iteration %d\n",
                pid, reaped, i);
        failed = 1;
      } else if (WEXITSTATUS(status) != child_exit_code) {
        fprintf(stderr,
                "Unexpected exit status %x, iteration %d\n",
                WEXITSTATUS(status), i);
        failed = 1;
      }
    }
  }
  exit(failed);
}

Thanks to Ted Tso for the key ideas of this implementation.

Signed-off-by: Salman Qazi <sqazi@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11 08:59:20 -07:00
Hedi Berriche 72680a191b pids: increase pid_max based on num_possible_cpus
On a system with a substantial number of processors, the early default
pid_max of 32k will not be enough.  A system with 1664 CPU's, there are
25163 processes started before the login prompt.  It's estimated that with
2048 CPU's we will pass the 32k limit.  With 4096, we'll reach that limit
very early during the boot cycle, and processes would stall waiting for an
available pid.

This patch increases the early maximum number of pids available, and
increases the minimum number of pids that can be set during runtime.

[akpm@linux-foundation.org: fix warnings]
Signed-off-by: Hedi Berriche <hedi@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Robin Holt <holt@sgi.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Greg KH <gregkh@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: John Stoffel <john@stoffel.org>
Cc: Jack Steiner <steiner@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:51 -07:00
Linus Torvalds 4e3eaddd14 Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  locking: Make sparse work with inline spinlocks and rwlocks
  x86/mce: Fix RCU lockdep splats
  rcu: Increase RCU CPU stall timeouts if PROVE_RCU
  ftrace: Replace read_barrier_depends() with rcu_dereference_raw()
  rcu: Suppress RCU lockdep warnings during early boot
  rcu, ftrace: Fix RCU lockdep splat in ftrace_perf_buf_prepare()
  rcu: Suppress __mpol_dup() false positive from RCU lockdep
  rcu: Make rcu_read_lock_sched_held() handle !PREEMPT
  rcu: Add control variables to lockdep_rcu_dereference() diagnostics
  rcu, cgroup: Relax the check in task_subsys_state() as early boot is now handled by lockdep-RCU
  rcu: Use wrapper function instead of exporting tasklist_lock
  sched, rcu: Fix rcu_dereference() for RCU-lockdep
  rcu: Make task_subsys_state() RCU-lockdep checks handle boot-time use
  rcu: Fix holdoff for accelerated GPs for last non-dynticked CPU
  x86/gart: Unexport gart_iommu_aperture

Fix trivial conflicts in kernel/trace/ftrace.c
2010-03-13 14:43:01 -08:00
Tetsuo Handa 9728e5d6e6 kernel/pid.c: update comment on find_task_by_pid_ns
tasklist_lock does protect the task and its pid, it can't go away.  The
problem is that find_pid_ns() itself is unsafe without rcu lock, it can
race with copy_process()->free_pid(any_pid).

Protecting copy_process()->free_pid(any_pid) with tasklist_lock would make
it possible to call find_task_by_pid_ns() under tasklist safely, but we
don't do so because we are trying to get rid of the read_lock sites of
tasklist_lock.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-06 11:26:33 -08:00
Paul E. McKenney db1466b3e1 rcu: Use wrapper function instead of exporting tasklist_lock
Lockdep-RCU commit d11c563d exported tasklist_lock, which is not
a good thing.  This patch instead exports a function that uses
lockdep to check whether tasklist_lock is held.

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
Cc: Christoph Hellwig <hch@lst.de>
LKML-Reference: <1267631219-8713-1-git-send-email-paulmck@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-04 11:46:14 +01:00
Paul E. McKenney d11c563dd2 sched: Use lockdep-based checking on rcu_dereference()
Update the rcu_dereference() usages to take advantage of the new
lockdep-based checking.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <1266887105-1528-6-git-send-email-paulmck@linux.vnet.ibm.com>
[ -v2: fix allmodconfig missing symbol export build failure on x86 ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-25 10:34:26 +01:00
André Goddard Rosa 417e315247 pid: reduce code size by using a pointer to iterate over array
It decreases code size by 16 bytes on my gcc 4.4.1 on Core 2:
  text    data     bss     dec     hex filename
  4314    2216       8    6538    198a kernel/pid.o-BEFORE
  4298    2216       8    6522    197a kernel/pid.o-AFTER

Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:12 -08:00
André Goddard Rosa 7be6d991bc pid: tighten pidmap spinlock critical section by removing kfree()
Avoid calling kfree() under pidmap spinlock, calling it afterwards.

Normally kfree() is fast, but sometimes it can be slow, so avoid
calling it under the spinlock if we can do it.

Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:12 -08:00
Jan Beulich 2c85f51d22 mm: also use alloc_large_system_hash() for the PID hash table
This is being done by allowing boot time allocations to specify that they
may want a sub-page sized amount of memory.

Overall this seems more consistent with the other hash table allocations,
and allows making two supposedly mm-only variables really mm-only
(nr_{kernel,all}_pages).

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-22 07:17:38 -07:00
Catalin Marinas 264ef8a904 kmemleak: Remove alloc_bootmem annotations introduced in the past
kmemleak_alloc() calls were added in some places where alloc_bootmem was
called. Since now kmemleak tracks bootmem allocations, these explicit
calls should be run.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
2009-07-09 17:07:02 +01:00
Catalin Marinas 12de38b186 kmemleak: Inform kmemleak about pid_hash
Kmemleak does not track alloc_bootmem calls but the pid_hash allocated
in pidhash_init() would need to be scanned as it contains pointers to
struct pid objects.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2009-06-29 17:14:14 +01:00
Christoph Hellwig 17f98dcf60 pids: clean up find_task_by_pid variants
find_task_by_pid_type_ns is only used to implement find_task_by_vpid and
find_task_by_pid_ns, but both of them pass PIDTYPE_PID as first argument.
So just fold find_task_by_pid_type_ns into find_task_by_pid_ns and use
find_task_by_pid_ns to implement find_task_by_vpid.

While we're at it also remove the exports for find_task_by_pid_ns and
find_task_by_vpid - we don't have any modular callers left as the only
modular caller of he old pre pid namespace find_task_by_pid (gfs2) was
switched to pid_task which operates on a struct pid pointer instead of a
pid_t.  Given the confusion about pid_t values vs namespace that's
generally the better option anyway and I think we're better of restricting
modules to do it that way.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-18 13:03:55 -07:00