android_kernel_samsung_msm8976

Commit Graph

Author	SHA1	Message	Date
Nick Desaulniers	61d1f020a0	cgroup: prefer %pK to %p Prevents leaking kernel pointers when using kptr_restrict. Bug: 30149174 Change-Id: I0fa3cd8d4a0d9ea76d085bba6020f1eda073c09b Git-repo: https://android.googlesource.com/kernel/msm.git Git-commit: 505e48f32f1321ed7cf80d49dd5f31b16da445a8 Signed-off-by: Srinivasa Rao Kuppala <srkupp@codeaurora.org>	2016-12-06 09:24:09 -08:00
Kaushal Kumar	4a36e44c45	This is the 3.10.84 stable release -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJVoAOcAAoJEDjbvchgkmk+UhcP/1EOwnsJDcZ/sZkkclNgRmrJ yLBCW65caLAI2E3SmIdKvHQwIx7lHzX5gmWRBrvx+fIl4KhaNKEQ0NCOf1ATaVuQ MkYMdkicXWpLiFNdKokezryevGS8T1RME+2QlPFv3++Rby1Gy90YD5tu7YlIrEn7 sPRJQHEPCzVAQ7Lqhd66yHICM6/QvdefXj4pjh7vV8IMb2YwnY4vqYt7RxnJCUfP tqljxrT274kzpA2awzALNh+o3B3/Y4W9ROmlDWviw3JBc9gEqFXYwbDf8KDwA5c0 sp9GPGed/dV5DFuqRcAHksJenFnE3E4gZjo/R5hluHQU27peBuRfXev2hZyBfZqG 796eUOky8fb0OiyxHfT2vhfGeD7CHI/asvIAORjDBVUqzJy9nkkby3XJ0U4tW+pz VkcilD2oHw1uRIFH3JoBWTJ9W6CYSNFG1qxw+brgfKT5otJG/dBiI8kBABx+aTq7 V+A2cvf11oVwDEb93dnVypMGsfCywqzJUwEIRli9fTFjK7Fg9CBSGX38nwVGUaRv M2/NeloTyWqUQE41Nd11gCu+hKQRtUU77nxpZcSeKn1XsbpO9/7dHTwcELRuKnTD 9XDksqPznXmC9KXGj7XMcRkLyWyB//JHjay0FCS6b4S6v7R5nrEIRjcpdB+H1WLd zMOXRH4ZlcOAS/Yt2QMd =8AB3 -----END PGP SIGNATURE----- Merge upstream tag 'v3.10.84' into LA.BR.1.3.3 This merge brings us up-to-date as of upstream tag v3.10.84 * tag 'v3.10.84' (317 commits): Linux 3.10.84 fs: Fix S_NOSEC handling KVM: x86: make vapics_in_nmi_mode atomic MIPS: Fix KVM guest fixmap address x86/PCI: Use host bridge _CRS info on Foxconn K8M890-8237A powerpc/perf: Fix book3s kernel to userspace backtraces arm: KVM: force execution of HCPTR access on VM exit Revert "crypto: talitos - convert to use be16_add_cpu()" crypto: talitos - avoid memleak in talitos_alg_alloc() sctp: Fix race between OOTB responce and route removal packet: avoid out of bounds read in round robin fanout packet: read num_members once in packet_rcv_fanout() bridge: fix br_stp_set_bridge_priority race conditions bridge: fix multicast router rlist endless loop sparc: Use GFP_ATOMIC in ldc_alloc_exp_dring() as it can be called in softirq context Linux 3.10.83 bus: mvebu: pass the coherency availability information at init time KVM: nSVM: Check for NRIPS support before updating control field ARM: clk-imx6q: refine sata's parent d_walk() might skip too much ipv6: update ip6_rt_last_gc every time GC is run ipv6: prevent fib6_run_gc() contention xfrm: Increase the garbage collector threshold Btrfs: make xattr replace operations atomic x86/microcode/intel: Guard against stack overflow in the loader fs: take i_mutex during prepare_binprm for set[ug]id executables hpsa: add missing pci_set_master in kdump path hpsa: refine the pci enable/disable handling sb_edac: Fix erroneous bytes->gigabytes conversion ACPICA: Utilities: Cleanup to remove useless ACPI_PRINTF/FORMAT_xxx helpers. ACPICA: Utilities: Cleanup to convert physical address printing formats. __ptrace_may_access() should not deny sub-threads include/linux/sched.h: don't use task->pid/tgid in same_thread_group/has_group_leader_pid netfilter: Zero the tuple in nfnl_cthelper_parse_tuple() netfilter: nfnetlink_cthelper: Remove 'const' and '&' to avoid warnings config: Enable NEED_DMA_MAP_STATE by default when SWIOTLB is selected get rid of s_files and files_lock fput: turn "list_head delayed_fput_list" into llist_head Linux 3.10.82 lpfc: Add iotag memory barrier pipe: iovec: Fix memory corruption when retrying atomic copy as non-atomic drm/mgag200: Reject non-character-cell-aligned mode widths tracing: Have filter check for balanced ops crypto: caam - fix RNG buffer cache alignment Linux 3.10.81 btrfs: cleanup orphans while looking up default subvolume btrfs: incorrect handling for fiemap_fill_next_extent return cfg80211: wext: clear sinfo struct before calling driver mm/memory_hotplug.c: set zone->wait_table to null after freeing it drm/i915: Fix DDC probe for passive adapters pata_octeon_cf: fix broken build ozwpan: unchecked signed subtraction leads to DoS ozwpan: divide-by-zero leading to panic ozwpan: Use proper check to prevent heap overflow MIPS: Fix enabling of DEBUG_STACKOVERFLOW ring-buffer-benchmark: Fix the wrong sched_priority of producer USB: serial: ftdi_sio: Add support for a Motion Tracker Development Board USB: cp210x: add ID for HubZ dual ZigBee and Z-Wave dongle block: fix ext_dev_lock lockdep report Input: elantech - fix detection of touchpads where the revision matches a known rate ALSA: usb-audio: add MAYA44 USB+ mixer control names ALSA: usb-audio: Add mic volume fix quirk for Logitech Quickcam Fusion ALSA: hda/realtek - Add a fixup for another Acer Aspire 9420 iio: adis16400: Compute the scan mask from channel indices iio: adis16400: Use != channel indices for the two voltage channels iio: adis16400: Report pressure channel scale xen: netback: read hotplug script once at start of day. udp: fix behavior of wrong checksums net_sched: invoke ->attach() after setting dev->qdisc unix/caif: sk_socket can disappear when state is unlocked net: dp83640: fix broken calibration routine. bridge: fix parsing of MLDv2 reports ipv4: Avoid crashing in ip_error net: phy: Allow EEE for all RGMII variants Linux 3.10.80 fs/binfmt_elf.c:load_elf_binary(): return -EINVAL on zero-length mappings vfs: read file_handle only once in handle_to_path ACPI / init: Fix the ordering of acpi_reserve_resources() Input: elantech - fix semi-mt protocol for v3 HW rtlwifi: rtl8192cu: Fix kernel deadlock md/raid5: don't record new size if resize_stripes fails. svcrpc: fix potential GSSX_ACCEPT_SEC_CONTEXT decoding failures ARM: fix missing syscall trace exit ARM: dts: imx27: only map 4 Kbyte for fec registers crypto: s390/ghash - Fix incorrect ghash icv buffer handling. rt2x00: add new rt2800usb device DWA 130 libata: Ignore spurious PHY event on LPM policy change libata: Add helper to determine when PHY events should be ignored ext4: check for zero length extent explicitly ext4: convert write_begin methods to stable_page_writes semantics mmc: atmel-mci: fix bad variable type for clkdiv powerpc: Align TOC to 256 bytes usb: gadget: configfs: Fix interfaces array NULL-termination usb-storage: Add NO_WP_DETECT quirk for Lacie 059f:0651 devices USB: cp210x: add ID for KCF Technologies PRN device USB: pl2303: Remove support for Samsung I330 USB: visor: Match I330 phone more precisely xhci: gracefully handle xhci_irq dead device xhci: Solve full event ring by increasing TRBS_PER_SEGMENT to 256 xhci: fix isoc endpoint dequeue from advancing too far on transaction error target/pscsi: Don't leak scsi_host if hba is VIRTUAL_HOST ASoC: wm8994: correct BCLK DIV 348 to 384 ASoC: wm8960: fix "RINPUT3" audio route error ASoC: mc13783: Fix wrong mask value used in mc13xxx_reg_rmw() calls ALSA: hda - Add headphone quirk for Lifebook E752 ALSA: hda - Add Conexant codecs CX20721, CX20722, CX20723 and CX20724 d_walk() might skip too much lib: Fix strnlen_user() to not touch memory after specified maximum hwmon: (ntc_thermistor) Ensure iio channel is of type IIO_VOLTAGE libceph: request a new osdmap if lingering request maps to no osd lguest: fix out-by-one error in address checking. fs, omfs: add NULL terminator in the end up the token list KVM: MMU: fix CR4.SMEP=1, CR0.WP=0 with shadow pages net: socket: Fix the wrong returns for recvmsg and sendmsg kernel: use the gnu89 standard explicitly staging, rtl8192e, LLVMLinux: Remove unused inline prototype staging: rtl8712, rtl8712: avoid lots of build warnings staging, rtl8192e, LLVMLinux: Change extern inline to static inline drm/i915: Fix declaration of intel_gmbus_{is_forced_bit/is_port_falid} staging: wlags49_h2: fix extern inline functions Linux 3.10.79 ACPICA: Utilities: Cleanup to enforce ACPI_PHYSADDR_TO_PTR()/ACPI_PTR_TO_PHYSADDR(). ACPICA: Tables: Change acpi_find_root_pointer() to use acpi_physical_address. revert "softirq: Add support for triggering softirq work on softirqs" sound/oss: fix deadlock in sequencer_ioctl(SNDCTL_SEQ_OUTOFBAND) mmc: card: Don't access RPMB partitions for normal read/write pinctrl: Don't just pretend to protect pinctrl_maps, do it for real drm/i915: Add missing MacBook Pro models with dual channel LVDS ARM: mvebu: armada-xp-openblocks-ax3-4: Disable internal RTC ARM: dts: imx23-olinuxino: Fix dr_mode of usb0 ARM: dts: imx28: Fix AUART4 TX-DMA interrupt name ARM: dts: imx25: Add #pwm-cells to pwm4 gpio: sysfs: fix memory leaks and device hotplug gpio: unregister gpiochip device before removing it xen/console: Update console event channel on resume mm/memory-failure: call shake_page() when error hits thp tail page nilfs2: fix sanity check of btree level in nilfs_btree_root_broken() ocfs2: dlm: fix race between purge and get lock resource Linux 3.10.78 ARC: signal handling robustify UBI: fix soft lockup in ubi_check_volume() Drivers: hv: vmbus: Don't wait after requesting offers ARM: dts: dove: Fix uart[23] reg property staging: panel: fix lcd type usb: gadget: printer: enqueue printer's response for setup request usb: host: oxu210hp: use new USB_RESUME_TIMEOUT 3w-sas: fix command completion race 3w-9xxx: fix command completion race 3w-xxxx: fix command completion race ext4: fix data corruption caused by unwritten and delayed extents rbd: end I/O the entire obj_request on error serial: of-serial: Remove device_type = "serial" registration ALSA: hda - Fix mute-LED fixed mode ALSA: emu10k1: Emu10k2 32 bit DMA mode ALSA: emu10k1: Fix card shortname string buffer overflow ALSA: emux: Fix mutex deadlock in OSS emulation ALSA: emux: Fix mutex deadlock at unloading ipv4: Missing sk_nulls_node_init() in ping_unhash(). Linux 3.10.77 s390: Fix build error nosave: consolidate __nosave_{begin,end} in <asm/sections.h> memstick: mspro_block: add missing curly braces C6x: time: Ensure consistency in __init wl18xx: show rx_frames_per_rates as an array as it really is lib: memzero_explicit: use barrier instead of OPTIMIZER_HIDE_VAR e1000: add dummy allocator to fix race condition between mtu change and netpoll ksoftirqd: Enable IRQs and call cond_resched() before poking RCU RCU pathwalk breakage when running into a symlink overmounting something drm/i915: cope with large i2c transfers drm/radeon: fix doublescan modes (v2) i2c: core: Export bus recovery functions IB/mlx4: Fix WQE LSO segment calculation IB/core: don't disallow registering region starting at 0x0 IB/core: disallow registering 0-sized memory region stk1160: Make sure current buffer is released mvsas: fix panic on expander attached SATA devices Drivers: hv: vmbus: Fix a bug in the error path in vmbus_open() xtensa: provide __NR_sync_file_range2 instead of __NR_sync_file_range xtensa: xtfpga: fix hardware lockup caused by LCD driver ACPICA: Utilities: split IO address types from data type models. drivers: parport: Kconfig: exclude arm64 for PARPORT_PC scsi: storvsc: Fix a bug in copy_from_bounce_buffer() UBI: fix check for "too many bytes" UBI: initialize LEB number variable UBI: fix out of bounds write UBI: account for bitflips in both the VID header and data tools/power turbostat: Use $(CURDIR) instead of $(PWD) and add support for O= option in Makefile powerpc/perf: Cap 64bit userspace backtraces to PERF_MAX_STACK_DEPTH ext4: make fsync to sync parent dir in no-journal for real this time arm64: kernel: compiling issue, need delete read_current_timer() video: vgacon: Don't build on arm64 console: Disable VGA text console support on cris drivers: parport: Kconfig: exclude h8300 for PARPORT_PC parport: disable PC-style parallel port support on cris rtlwifi: rtl8192cu: Add new device ID rtlwifi: rtl8192cu: Add new USB ID ptrace: fix race between ptrace_resume() and wait_task_stopped() fs/binfmt_elf.c: fix bug in loading of PIE binaries Input: elantech - fix absolute mode setting on some ASUS laptops ALSA: emu10k1: don't deadlock in proc-functions usb: core: hub: use new USB_RESUME_TIMEOUT usb: host: sl811: use new USB_RESUME_TIMEOUT usb: host: xhci: use new USB_RESUME_TIMEOUT usb: host: isp116x: use new USB_RESUME_TIMEOUT usb: host: r8a66597: use new USB_RESUME_TIMEOUT usb: define a generic USB_RESUME_TIMEOUT macro usb: phy: Find the right match in devm_usb_phy_match ARM: S3C64XX: Use fixed IRQ bases to avoid conflicts on Cragganmore ARM: 8320/1: fix integer overflow in ELF_ET_DYN_BASE power_supply: lp8788-charger: Fix leaked power supply on probe fail ring-buffer: Replace this_cpu_() with __this_cpu_() spi: spidev: fix possible arithmetic overflow for multi-transfer message cdc-wdm: fix endianness bug in debug statements MIPS: Hibernate: flush TLB entries earlier KVM: use slowpath for cross page cached accesses s390/hibernate: fix save and restore of kernel text section KVM: s390: Zero out current VMDB of STSI before including level3 data. usb: gadget: composite: enable BESL support Btrfs: fix inode eviction infinite loop after cloning into it Btrfs: fix log tree corruption when fs mounted with -o discard tcp: avoid looping in tcp_send_fin() tcp: fix possible deadlock in tcp_send_fin() ip_forward: Drop frames with attached skb->sk Linux 3.10.76 dcache: Fix locking bugs in backported "deal with deadlock in d_walk()" arc: mm: Fix build failure sb_edac: avoid INTERNAL ERROR message in EDAC with unspecified channel x86: mm: move mmap_sem unlock from mm_fault_error() to caller vm: make stack guard page errors return VM_FAULT_SIGSEGV rather than SIGBUS vm: add VM_FAULT_SIGSEGV handling support deal with deadlock in d_walk() move d_rcu from overlapping d_child to overlapping d_alias kconfig: Fix warning "‘jump’ may be used uninitialized" KVM: x86: SYSENTER emulation is broken netfilter: conntrack: disable generic tracking for known protocols Bluetooth: Ignore isochronous endpoints for Intel USB bootloader Bluetooth: Add support for Intel bootloader devices Bluetooth: btusb: Add IMC Networks (Broadcom based) Bluetooth: Add firmware update for Atheros 0cf3:311f Bluetooth: Enable Atheros 0cf3:311e for firmware upload mm: Fix NULL pointer dereference in madvise(MADV_WILLNEED) support splice: Apply generic position and size checks to each write jfs: fix readdir regression serial: 8250_dw: Fix deadlock in LCR workaround benet: Call dev_kfree_skby_any instead of kfree_skb. ixgb: Call dev_kfree_skby_any instead of dev_kfree_skb. tg3: Call dev_kfree_skby_any instead of dev_kfree_skb. bnx2: Call dev_kfree_skby_any instead of dev_kfree_skb. r8169: Call dev_kfree_skby_any instead of dev_kfree_skb. 8139too: Call dev_kfree_skby_any instead of dev_kfree_skb. 8139cp: Call dev_kfree_skby_any instead of kfree_skb. tcp: tcp_make_synack() should clear skb->tstamp tcp: fix FRTO undo on cumulative ACK of SACKed range ipv6: Don't reduce hop limit for an interface tcp: prevent fetching dst twice in early demux code remove extra definitions of U32_MAX conditionally define U32_MAX Linux 3.10.75 pagemap: do not leak physical addresses to non-privileged userspace console: Fix console name size mismatch IB/mlx4: Saturate RoCE port PMA counters in case of overflow kernel.h: define u8, s8, u32, etc. limits net: llc: use correct size for sysctl timeout entries net: rds: use correct size for max unacked packets and bytes ipc: fix compat msgrcv with negative msgtyp core, nfqueue, openvswitch: fix compilation warning media: s5p-mfc: fix mmap support for 64bit arch iscsi target: fix oops when adding reject pdu ocfs2: _really_ sync the right range be2iscsi: Fix kernel panic when device initialization fails cifs: fix use-after-free bug in find_writable_file usb: xhci: apply XHCI_AVOID_BEI quirk to all Intel xHCI controllers cpuidle: ACPI: do not overwrite name and description of C0 dmaengine: omap-dma: Fix memory leak when terminating running transfer iio: imu: Use iio_trigger_get for indio_dev->trig assignment iio: inv_mpu6050: Clear timestamps fifo while resetting hardware fifo Defer processing of REQ_PREEMPT requests for blocked devices USB: ftdi_sio: Use jtag quirk for SNAP Connect E10 USB: ftdi_sio: Added custom PID for Synapse Wireless product radeon: Do not directly dereference pointers to BIOS area. writeback: fix possible underflow in write bandwidth calculation writeback: add missing INITIAL_JIFFIES init in global_update_bandwidth() mm/memory hotplug: postpone the reset of obsolete pgdat nbd: fix possible memory leak iwlwifi: dvm: run INIT firmware again upon .start() IB/uverbs: Prevent integer overflow in ib_umem_get address arithmetic IB/core: Avoid leakage from kernel to user space tcp: Fix crash in TCP Fast Open selinux: fix sel_write_enforce broken return value ALSA: hda - Fix headphone pin config for Lifebook T731 ALSA: usb - Creative USB X-Fi Pro SB1095 volume knob support ALSA: hda - Add one more node in the EAPD supporting candidate list Linux 3.10.74 net: ethernet: pcnet32: Setup the SRAM and NOUFLO on Am79C97{3, 5} powerpc/mpc85xx: Add ranges to etsec2 nodes hfsplus: fix B-tree corruption after insertion at position 0 dm: hold suspend_lock while suspending device during device deletion vt6655: RFbSetPower fix missing rate RATE_12M perf: Fix irq_work 'tail' recursion Revert "iwlwifi: mvm: fix failure path when power_update fails in add_interface" mac80211: drop unencrypted frames in mesh fwding mac80211: disable u-APSD queues by default nl80211: ignore HT/VHT capabilities without QoS/WMM tcm_qla2xxx: Fix incorrect use of __transport_register_session tcm_fc: missing curly braces in ft_invl_hw_context() ASoC: wm8955: Fix wrong value references for boolean kctl ASoC: adav80x: Fix wrong value references for boolean kctl ASoC: ak4641: Fix wrong value references for boolean kctl ASoC: wm8904: Fix wrong value references for boolean kctl ASoC: wm8903: Fix wrong value references for boolean kctl ASoC: wm2000: Fix wrong value references for boolean kctl ASoC: wm8731: Fix wrong value references for boolean kctl ASoC: tas5086: Fix wrong value references for boolean kctl ASoC: wm8960: Fix wrong value references for boolean kctl ASoC: cs4271: Fix wrong value references for boolean kctl ASoC: sgtl5000: remove useless register write clearing CHRGPUMP_POWERUP Change-Id: Ib7976ee2c7224e39074157e28db4158db40b00db Signed-off-by: Kaushal Kumar <kaushalk@codeaurora.org>	2015-09-30 13:25:40 +05:30
Al Viro	6637ecd306	move d_rcu from overlapping d_child to overlapping d_alias commit 946e51f2bf37f1656916eb75bd0742ba33983c28 upstream. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Ben Hutchings <ben@decadent.org.uk> [hujianyang: Backported to 3.10 refer to the work of Ben Hutchings in 3.2: - Apply name changes in all the different places we use d_alias and d_child - Move the WARN_ON() in __d_free() to d_free() as we don't have dentry_free()] Signed-off-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-04-29 10:34:00 +02:00
Rom Lemarchand	a2ddc4658e	cgroup: refactor allow_attach function into common code move cpu_cgroup_allow_attach to a common subsys_cgroup_allow_attach. This allows any process with CAP_SYS_NICE to move tasks across cgroups if they use this function as their allow_attach handler. Bug: 18260435 Change-Id: I6bb4933d07e889d0dc39e33b4e71320c34a2c90f Signed-off-by: Rom Lemarchand <romlem@android.com> Git-commit: 57114e95e8c4f5035c993fc74bbe94cd9573f1bb Git-repo: https://android.googlesource.com/kernel/common.git Signed-off-by: Ian Maund <imaund@codeaurora.org>	2015-03-19 14:59:17 -07:00
Ian Maund	f1b32d4e47	Merge upstream linux-stable v3.10.28 into msm-3.10 The following commits have been reverted from this merge, as they are known to introduce new bugs and are currently incompatible with our audio implementation. Investigation of these commits is ongoing, and they are expected to be brought in at a later time: `86e6de7` ALSA: compress: fix drain calls blocking other compress functions (v6) `16442d4` ALSA: compress: fix drain calls blocking other compress functions This merge commit also includes a change in block, necessary for compilation. Upstream has modified elevator_init_fn to prevent race conditions, requring updates to row_init_queue and test_init_queue. * commit 'v3.10.28': (1964 commits) Linux 3.10.28 ARM: 7938/1: OMAP4/highbank: Flush L2 cache before disabling drm/i915: Don't grab crtc mutexes in intel_modeset_gem_init() serial: amba-pl011: use port lock to guard control register access mm: Make {,set}page_address() static inline if WANT_PAGE_VIRTUAL md/raid5: Fix possible confusion when multiple write errors occur. md/raid10: fix two bugs in handling of known-bad-blocks. md/raid10: fix bug when raid10 recovery fails to recover a block. md: fix problem when adding device to read-only array with bitmap. drm/i915: fix DDI PLLs HW state readout code nilfs2: fix segctor bug that causes file system corruption thp: fix copy_page_rep GPF by testing is_huge_zero_pmd once only ftrace/x86: Load ftrace_ops in parameter not the variable holding it SELinux: Fix possible NULL pointer dereference in selinux_inode_permission() writeback: Fix data corruption on NFS hwmon: (coretemp) Fix truncated name of alarm attributes vfs: In d_path don't call d_dname on a mount point staging: comedi: adl_pci9111: fix incorrect irq passed to request_irq() staging: comedi: addi_apci_1032: fix subdevice type/flags bug mm/memory-failure.c: recheck PageHuge() after hugetlb page migrate successfully GFS2: Increase i_writecount during gfs2_setattr_chown perf/x86/amd/ibs: Fix waking up from S3 for AMD family 10h perf scripting perl: Fix build error on Fedora 12 ARM: 7815/1: kexec: offline non panic CPUs on Kdump panic Linux 3.10.27 sched: Guarantee new group-entities always have weight sched: Fix hrtimer_cancel()/rq->lock deadlock sched: Fix cfs_bandwidth misuse of hrtimer_expires_remaining sched: Fix race on toggling cfs_bandwidth_used x86, fpu, amd: Clear exceptions in AMD FXSAVE workaround netfilter: nf_nat: fix access to uninitialized buffer in IRC NAT helper SCSI: sd: Reduce buffer size for vpd request intel_pstate: Add X86_FEATURE_APERFMPERF to cpu match parameters. mac80211: move "bufferable MMPDU" check to fix AP mode scan ACPI / Battery: Add a _BIX quirk for NEC LZ750/LS ACPI / TPM: fix memory leak when walking ACPI namespace mfd: rtsx_pcr: Disable interrupts before cancelling delayed works clk: exynos5250: fix sysmmu_mfc{l,r} gate clocks clk: samsung: exynos5250: Add CLK_IGNORE_UNUSED flag for the sysreg clock clk: samsung: exynos4: Correct SRC_MFC register clk: clk-divider: fix divisor > 255 bug ahci: add PCI ID for Marvell 88SE9170 SATA controller parisc: Ensure full cache coherency for kmap/kunmap drm/nouveau/bios: make jump conditional ARM: shmobile: mackerel: Fix coherent DMA mask ARM: shmobile: armadillo: Fix coherent DMA mask ARM: shmobile: kzm9g: Fix coherent DMA mask ARM: dts: exynos5250: Fix MDMA0 clock number ARM: fix "bad mode in ... handler" message for undefined instructions ARM: fix footbridge clockevent device net: Loosen constraints for recalculating checksum in skb_segment() bridge: use spin_lock_bh() in br_multicast_set_hash_max netpoll: Fix missing TXQ unlock and and OOPS. net: llc: fix use after free in llc_ui_recvmsg virtio-net: fix refill races during restore virtio_net: don't leak memory or block when too many frags virtio-net: make all RX paths handle errors consistently virtio_net: fix error handling for mergeable buffers vlan: Fix header ops passthru when doing TX VLAN offload. net: rose: restore old recvmsg behavior rds: prevent dereference of a NULL device ipv6: always set the new created dst's from in ip6_rt_copy net: fec: fix potential use after free hamradio/yam: fix info leak in ioctl drivers/net/hamradio: Integer overflow in hdlcdrv_ioctl() net: inet_diag: zero out uninitialized idiag_{src,dst} fields ip_gre: fix msg_name parsing for recvfrom/recvmsg net: unix: allow bind to fail on mutex lock ipv6: fix illegal mac_header comparison on 32bit netvsc: don't flush peers notifying work during setting mtu tg3: Initialize REG_BASE_ADDR at PCI config offset 120 to 0 net: unix: allow set_peek_off to fail net: drop_monitor: fix the value of maxattr ipv6: don't count addrconf generated routes against gc limit packet: fix send path when running with proto == 0 virtio: delete napi structures from netdev before releasing memory macvtap: signal truncated packets tun: update file current position macvtap: update file current position macvtap: Do not double-count received packets rds: prevent BUG_ON triggered on congestion update to loopback net: do not pretend FRAGLIST support IPv6: Fixed support for blackhole and prohibit routes HID: Revert "Revert "HID: Fix logitech-dj: missing Unifying device issue"" gpio-rcar: R-Car GPIO IRQ share interrupt clocksource: em_sti: Set cpu_possible_mask to fix SMP broadcast irqchip: renesas-irqc: Fix irqc_probe error handling Linux 3.10.26 sh: add EXPORT_SYMBOL(min_low_pfn) and EXPORT_SYMBOL(max_low_pfn) to sh_ksyms_32.c ext4: fix bigalloc regression arm64: Use Normal NonCacheable memory for writecombine arm64: Do not flush the D-cache for anonymous pages arm64: Avoid cache flushing in flush_dcache_page() ARM: KVM: arch_timers: zero CNTVOFF upon return to host ARM: hyp: initialize CNTVOFF to zero clocksource: arch_timer: use virtual counters arm64: Remove unused cpu_name ascii in arch/arm64/mm/proc.S arm64: dts: Reserve the memory used for secondary CPU release address arm64: check for number of arguments in syscall_get/set_arguments() arm64: fix possible invalid FPSIMD initialization state ... Change-Id: Ia0e5d71b536ab49ec3a1179d59238c05bdd03106 Signed-off-by: Ian Maund <imaund@codeaurora.org>	2014-03-24 14:28:34 -07:00
Tejun Heo	a6647e9e4b	cgroup: use a dedicated workqueue for cgroup destruction commit e5fca243abae1445afbfceebda5f08462ef869d3 upstream. Since `be44562613` ("cgroup: remove synchronize_rcu() from cgroup_diput()"), cgroup destruction path makes use of workqueue. css freeing is performed from a work item from that point on and a later commit, ea15f8ccdb430 ("cgroup: split cgroup destruction into two steps"), moves css offlining to workqueue too. As cgroup destruction isn't depended upon for memory reclaim, the destruction work items were put on the system_wq; unfortunately, some controller may block in the destruction path for considerable duration while holding cgroup_mutex. As large part of destruction path is synchronized through cgroup_mutex, when combined with high rate of cgroup removals, this has potential to fill up system_wq's max_active of 256. Also, it turns out that memcg's css destruction path ends up queueing and waiting for work items on system_wq through work_on_cpu(). If such operation happens while system_wq is fully occupied by cgroup destruction work items, work_on_cpu() can't make forward progress because system_wq is full and other destruction work items on system_wq can't make forward progress because the work item waiting for work_on_cpu() is holding cgroup_mutex, leading to deadlock. This can be fixed by queueing destruction work items on a separate workqueue. This patch creates a dedicated workqueue - cgroup_destroy_wq - for this purpose. As these work items shouldn't have inter-dependencies and mostly serialized by cgroup_mutex anyway, giving high concurrency level doesn't buy anything and the workqueue's @max_active is set to 1 so that destruction work items are executed one by one on each CPU. Hugh Dickins: Because cgroup_init() is run before init_workqueues(), cgroup_destroy_wq can't be allocated from cgroup_init(). Do it from a separate core_initcall(). In the future, we probably want to reorder so that workqueue init happens before cgroup_init(). Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Hugh Dickins <hughd@google.com> Reported-by: Shawn Bohrer <shawn.bohrer@gmail.com> Link: http://lkml.kernel.org/r/20131111220626.GA7509@sbohrermbp13-local.rgmadvisors.com Link: http://lkml.kernel.org/g/alpine.LNX.2.00.1310301606080.2333@eggly.anvils Cc: stable@vger.kernel.org # v3.9+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-12-04 10:57:20 -08:00
Anjana V Kumar	0b6d292136	cgroup: fix to break the while loop in cgroup_attach_task() correctly Both Anjana and Eunki reported a stall in the while_each_thread loop in cgroup_attach_task(). It's because, when we attach a single thread to a cgroup, if the cgroup is exiting or is already in that cgroup, we won't break the loop. If the task is already in the cgroup, the bug can lead to another thread being attached to the cgroup unexpectedly: # echo 5207 > tasks # cat tasks 5207 # echo 5207 > tasks # cat tasks 5207 5215 What's worse, if the task to be attached isn't the leader of the thread group, we might never exit the loop, hence cpu stall. Thanks for Oleg's analysis. This bug was introduced by commit `081aa458c3` ("cgroup: consolidate cgroup_attach_task() and cgroup_attach_proc()") [ lizf: - fixed the first continue, pointed out by Oleg, - rewrote changelog. ] Change-Id: Ic2d5ac051880a4ff9fb358f38a452f3f3709c84e Cc: <stable@vger.kernel.org> # 3.9+ Reported-by: Eunki Kim <eunki_kim@samsung.com> Reported-by: Anjana V Kumar <anjanavk12@gmail.com> Signed-off-by: Anjana V Kumar <anjanavk12@gmail.com> Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org> Git-commit: ea84753c98a7ac6b74e530b64c444a912b3835ca Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git Signed-off-by: Osvaldo Banuelos <osvaldob@codeaurora.org>	2013-11-25 16:25:00 -08:00
Anjana V Kumar	5be794dc4b	cgroup: fix to break the while loop in cgroup_attach_task() correctly commit ea84753c98a7ac6b74e530b64c444a912b3835ca upstream. Both Anjana and Eunki reported a stall in the while_each_thread loop in cgroup_attach_task(). It's because, when we attach a single thread to a cgroup, if the cgroup is exiting or is already in that cgroup, we won't break the loop. If the task is already in the cgroup, the bug can lead to another thread being attached to the cgroup unexpectedly: # echo 5207 > tasks # cat tasks 5207 # echo 5207 > tasks # cat tasks 5207 5215 What's worse, if the task to be attached isn't the leader of the thread group, we might never exit the loop, hence cpu stall. Thanks for Oleg's analysis. This bug was introduced by commit `081aa458c3` ("cgroup: consolidate cgroup_attach_task() and cgroup_attach_proc()") [ lizf: - fixed the first continue, pointed out by Oleg, - rewrote changelog. ] Reported-by: Eunki Kim <eunki_kim@samsung.com> Reported-by: Anjana V Kumar <anjanavk12@gmail.com> Signed-off-by: Anjana V Kumar <anjanavk12@gmail.com> Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-11-13 12:05:30 +09:00
Li Zefan	a09a1b8700	cgroup: fix umount vs cgroup_cfts_commit() race commit 084457f284abf6789d90509ee11dae383842b23b upstream. cgroup_cfts_commit() uses dget() to keep cgroup alive after cgroup_mutex is dropped, but dget() won't prevent cgroupfs from being umounted. When the race happens, vfs will see some dentries with non-zero refcnt while umount is in process. Keep running this: mount -t cgroup -o blkio xxx /cgroup umount /cgroup And this: modprobe cfq-iosched rmmod cfs-iosched After a while, the BUG() in shrink_dcache_for_umount_subtree() may be triggered: BUG: Dentry xxx{i=0,n=blkio.yyy} still in use (1) [umount of cgroup cgroup] Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-08-11 18:35:24 -07:00
Li Zefan	83d0eb7975	cgroup: fix umount vs cgroup_event_remove() race commit 1c8158eeae0f37d0eee9f1fbe68080df6a408df2 upstream. commit `5db9a4d99b` Author: Tejun Heo <tj@kernel.org> Date: Sat Jul 7 16:08:18 2012 -0700 cgroup: fix cgroup hierarchy umount race This commit fixed a race caused by the dput() in css_dput_fn(), but the dput() in cgroup_event_remove() can also lead to the same BUG(). Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-07-21 18:21:25 -07:00
Colin Cross	bebadf46e9	cgroup: Add generic cgroup subsystem permission checks Rather than using explicit euid == 0 checks when trying to move tasks into a cgroup via CFS, move permission checks into each specific cgroup subsystem. If a subsystem does not specify a 'allow_attach' handler, then we fall back to doing our checks the old way. Use the 'allow_attach' handler for the 'cpu' cgroup to allow non-root processes to add arbitrary processes to a 'cpu' cgroup if it has the CAP_SYS_NICE capability set. This version of the patch adds a 'allow_attach' handler instead of reusing the 'can_attach' handler. If the 'can_attach' handler is reused, a new cgroup that implements 'can_attach' but not the permission checks could end up with no permission checks at all. Change-Id: Icfa950aa9321d1ceba362061d32dc7dfa2c64f0c Original-Author: San Mehat <san@google.com> Signed-off-by: Colin Cross <ccross@android.com>	2013-07-01 13:38:49 -07:00
Jeff Liu	2a0ff3fbe3	cgroup: warn about mismatching options of a new mount of an existing hierarchy With the new __DEVEL__sane_behavior mount option was introduced, if the root cgroup is alive with no xattr function, to mount a new cgroup with xattr will be rejected in terms of design which just fine. However, if the root cgroup does not mounted with __DEVEL__sane_hehavior, to create a new cgroup with xattr option will succeed although after that the EA function does not works as expected but will get ENOTSUPP for setting up attributes under either cgroup. e.g. setfattr: /cgroup2/test: Operation not supported Instead of keeping silence in this case, it's better to drop a log entry in warning level. That would be helpful to understand the reason behind the scene from the user's perspective, and this is essentially an improvement does not break the backward compatibilities. With this fix, above mount attemption will keep up works as usual but the following line cound be found at the system log: [ ...] cgroup: new mount options do not match the existing superblock tj: minor formatting / message updates. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reported-by: Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org	2013-05-29 07:59:39 +09:00
Tejun Heo	7805d000db	cgroup: fix a subtle bug in descendant pre-order walk When cgroup_next_descendant_pre() initiates a walk, it checks whether the subtree root doesn't have any children and if not returns NULL. Later code assumes that the subtree isn't empty. This is broken because the subtree may become empty inbetween, which can lead to the traversal escaping the subtree by walking to the sibling of the subtree root. There's no reason to have the early exit path. Remove it along with the later assumption that the subtree isn't empty. This simplifies the code a bit and fixes the subtle bug. While at it, fix the comment of cgroup_for_each_descendant_pre() which was incorrectly referring to ->css_offline() instead of ->css_online(). Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Michal Hocko <mhocko@suse.cz> Cc: stable@vger.kernel.org	2013-05-24 10:50:24 +09:00
Li Zefan	d6cbf35dac	cgroup: initialize xattr before calling d_instantiate() cgroup_create_file() calls d_instantiate(), which may decide to look at the xattrs on the file. Smack always does this and SELinux can be configured to do so. But cgroup_add_file() didn't initialize xattrs before calling cgroup_create_file(), which finally leads to dereferencing NULL dentry->d_fsdata. This bug has been there since cgroup xattr was introduced. Cc: <stable@vger.kernel.org> # 3.8.x Reported-by: Ivan Bulatovic <combuster@archlinux.us> Reported-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-05-14 08:24:15 -07:00
Linus Torvalds	20b4fb4852	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull VFS updates from Al Viro, Misc cleanups all over the place, mainly wrt /proc interfaces (switch create_proc_entry to proc_create(), get rid of the deprecated create_proc_read_entry() in favor of using proc_create_data() and seq_file etc). 7kloc removed. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits) don't bother with deferred freeing of fdtables proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h proc: Make the PROC_I() and PDE() macros internal to procfs proc: Supply a function to remove a proc entry by PDE take cgroup_open() and cpuset_open() to fs/proc/base.c ppc: Clean up scanlog ppc: Clean up rtas_flash driver somewhat hostap: proc: Use remove_proc_subtree() drm: proc: Use remove_proc_subtree() drm: proc: Use minor->index to label things, not PDE->name drm: Constify drm_proc_list[] zoran: Don't print proc_dir_entry data in debug reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show() proc: Supply an accessor for getting the data from a PDE's parent airo: Use remove_proc_subtree() rtl8192u: Don't need to save device proc dir PDE rtl8187se: Use a dir under /proc/net/r8180/ proc: Add proc_mkdir_data() proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h} proc: Move PDE_NET() to fs/proc/proc_net.c ...	2013-05-01 17:51:54 -07:00
Al Viro	8d8b97ba49	take cgroup_open() and cpuset_open() to fs/proc/base.c Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-05-01 17:29:46 -04:00
Linus Torvalds	16fa94b532	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler changes from Ingo Molnar: "The main changes in this development cycle were: - full dynticks preparatory work by Frederic Weisbecker - factor out the cpu time accounting code better, by Li Zefan - multi-CPU load balancer cleanups and improvements by Joonsoo Kim - various smaller fixes and cleanups" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits) sched: Fix init NOHZ_IDLE flag sched: Prevent to re-select dst-cpu in load_balance() sched: Rename load_balance_tmpmask to load_balance_mask sched: Move up affinity check to mitigate useless redoing overhead sched: Don't consider other cpus in our group in case of NEWLY_IDLE sched: Explicitly cpu_idle_type checking in rebalance_domains() sched: Change position of resched_cpu() in load_balance() sched: Fix wrong rq's runnable_avg update with rt tasks sched: Document task_struct::personality field sched/cpuacct/UML: Fix header file dependency bug on the UML build cgroup: Kill subsys.active flag sched/cpuacct: No need to check subsys active state sched/cpuacct: Initialize cpuacct subsystem earlier sched/cpuacct: Initialize root cpuacct earlier sched/cpuacct: Allocate per_cpu cpuusage for root cpuacct statically sched/cpuacct: Clean up cpuacct.h sched/cpuacct: Remove redundant NULL checks in cpuacct_acount_field() sched/cpuacct: Remove redundant NULL checks in cpuacct_charge() sched/cpuacct: Add cpuacct_acount_field() sched/cpuacct: Add cpuacct_init() ...	2013-04-30 07:43:28 -07:00
Linus Torvalds	191a712090	Merge branch 'for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: - Fixes and a lot of cleanups. Locking cleanup is finally complete. cgroup_mutex is no longer exposed to individual controlelrs which used to cause nasty deadlock issues. Li fixed and cleaned up quite a bit including long standing ones like racy cgroup_path(). - device cgroup now supports proper hierarchy thanks to Aristeu. - perf_event cgroup now supports proper hierarchy. - A new mount option "__DEVEL__sane_behavior" is added. As indicated by the name, this option is to be used for development only at this point and generates a warning message when used. Unfortunately, cgroup interface currently has too many brekages and inconsistencies to implement a consistent and unified hierarchy on top. The new flag is used to collect the behavior changes which are necessary to implement consistent unified hierarchy. It's likely that this flag won't be used verbatim when it becomes ready but will be enabled implicitly along with unified hierarchy. The option currently disables some of broken behaviors in cgroup core and also .use_hierarchy switch in memcg (will be routed through -mm), which can be used to make very unusual hierarchy where nesting is partially honored. It will also be used to implement hierarchy support for blk-throttle which would be impossible otherwise without introducing a full separate set of control knobs. This is essentially versioning of interface which isn't very nice but at this point I can't see any other options which would allow keeping the interface the same while moving towards hierarchy behavior which is at least somewhat sane. The planned unified hierarchy is likely to require some level of adaptation from userland anyway, so I think it'd be best to take the chance and update the interface such that it's supportable in the long term. Maintaining the existing interface does complicate cgroup core but shouldn't put too much strain on individual controllers and I think it'd be manageable for the foreseeable future. Maybe we'll be able to drop it in a decade. Fix up conflicts (including a semantic one adding a new #include to ppc that was uncovered by header the file changes) as per Tejun. * 'for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (45 commits) cpuset: fix compile warning when CONFIG_SMP=n cpuset: fix cpu hotplug vs rebuild_sched_domains() race cpuset: use rebuild_sched_domains() in cpuset_hotplug_workfn() cgroup: restore the call to eventfd->poll() cgroup: fix use-after-free when umounting cgroupfs cgroup: fix broken file xattrs devcg: remove parent_cgroup. memcg: force use_hierarchy if sane_behavior cgroup: remove cgrp->top_cgroup cgroup: introduce sane_behavior mount option move cgroupfs_root to include/linux/cgroup.h cgroup: convert cgroupfs_root flag bits to masks and add CGRP_ prefix cgroup: make cgroup_path() not print double slashes Revert "cgroup: remove bind() method from cgroup_subsys." perf: make perf_event cgroup hierarchical cgroup: implement cgroup_is_descendant() cgroup: make sure parent won't be destroyed before its children cgroup: remove bind() method from cgroup_subsys. devcg: remove broken_hierarchy tag cgroup: remove cgroup_lock_is_held() ...	2013-04-29 19:14:20 -07:00
Linus Torvalds	46d9be3e5e	Merge branch 'for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue updates from Tejun Heo: "A lot of activities on workqueue side this time. The changes achieve the followings. - WQ_UNBOUND workqueues - the workqueues which are per-cpu - are updated to be able to interface with multiple backend worker pools. This involved a lot of churning but the end result seems actually neater as unbound workqueues are now a lot closer to per-cpu ones. - The ability to interface with multiple backend worker pools are used to implement unbound workqueues with custom attributes. Currently the supported attributes are the nice level and CPU affinity. It may be expanded to include cgroup association in future. The attributes can be specified either by calling apply_workqueue_attrs() or through /sys/bus/workqueue/WQ_NAME/* if the workqueue in question is exported through sysfs. The backend worker pools are keyed by the actual attributes and shared by any workqueues which share the same attributes. When attributes of a workqueue are changed, the workqueue binds to the worker pool with the specified attributes while leaving the work items which are already executing in its previous worker pools alone. This allows converting custom worker pool implementations which want worker attribute tuning to use workqueues. The writeback pool is already converted in block tree and there are a couple others are likely to follow including btrfs io workers. - WQ_UNBOUND's ability to bind to multiple worker pools is also used to make it NUMA-aware. Because there's no association between work item issuer and the specific worker assigned to execute it, before this change, using unbound workqueue led to unnecessary cross-node bouncing and it couldn't be helped by autonuma as it requires tasks to have implicit node affinity and workers are assigned randomly. After these changes, an unbound workqueue now binds to multiple NUMA-affine worker pools so that queued work items are executed in the same node. This is turned on by default but can be disabled system-wide or for individual workqueues. Crypto was requesting NUMA affinity as encrypting data across different nodes can contribute noticeable overhead and doing it per-cpu was too limiting for certain cases and IO throughput could be bottlenecked by one CPU being fully occupied while others have idle cycles. While the new features required a lot of changes including restructuring locking, it didn't complicate the execution paths much. The unbound workqueue handling is now closer to per-cpu ones and the new features are implemented by simply associating a workqueue with different sets of backend worker pools without changing queue, execution or flush paths. As such, even though the amount of change is very high, I feel relatively safe in that it isn't likely to cause subtle issues with basic correctness of work item execution and handling. If something is wrong, it's likely to show up as being associated with worker pools with the wrong attributes or OOPS while workqueue attributes are being changed or during CPU hotplug. While this creates more backend worker pools, it doesn't add too many more workers unless, of course, there are many workqueues with unique combinations of attributes. Assuming everything else is the same, NUMA awareness costs an extra worker pool per NUMA node with online CPUs. There are also a couple things which are being routed outside the workqueue tree. - block tree pulled in workqueue for-3.10 so that writeback worker pool can be converted to unbound workqueue with sysfs control exposed. This simplifies the code, makes writeback workers NUMA-aware and allows tuning nice level and CPU affinity via sysfs. - The conversion to workqueue means that there's no 1:1 association between a specific worker, which makes writeback folks unhappy as they want to be able to tell which filesystem caused a problem from backtrace on systems with many filesystems mounted. This is resolved by allowing work items to set debug info string which is printed when the task is dumped. As this change involves unifying implementations of dump_stack() and friends in arch codes, it's being routed through Andrew's -mm tree." * 'for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (84 commits) workqueue: use kmem_cache_free() instead of kfree() workqueue: avoid false negative WARN_ON() in destroy_workqueue() workqueue: update sysfs interface to reflect NUMA awareness and a kernel param to disable NUMA affinity workqueue: implement NUMA affinity for unbound workqueues workqueue: introduce put_pwq_unlocked() workqueue: introduce numa_pwq_tbl_install() workqueue: use NUMA-aware allocation for pool_workqueues workqueue: break init_and_link_pwq() into two functions and introduce alloc_unbound_pwq() workqueue: map an unbound workqueues to multiple per-node pool_workqueues workqueue: move hot fields of workqueue_struct to the end workqueue: make workqueue->name[] fixed len workqueue: add workqueue->unbound_attrs workqueue: determine NUMA node of workers accourding to the allowed cpumask workqueue: drop 'H' from kworker names of unbound worker pools workqueue: add wq_numa_tbl_len and wq_numa_possible_cpumask[] workqueue: move pwq_pool_locking outside of get/put_unbound_pool() workqueue: fix memory leak in apply_workqueue_attrs() workqueue: fix unbound workqueue attrs hashing / comparison workqueue: fix race condition in unbound workqueue free path workqueue: remove pwq_lock which is no longer used ...	2013-04-29 19:07:40 -07:00
Michal Hocko	6d2488f64a	cgroup: remove css_get_next Now that we have generic and well ordered cgroup tree walkers there is no need to keep css_get_next in the place. Signed-off-by: Michal Hocko <mhocko@suse.cz> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Li Zefan <lizefan@huawei.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Ying Han <yinghan@google.com> Cc: Tejun Heo <htejun@gmail.com> Cc: Glauber Costa <glommer@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 15:54:33 -07:00
Li Zefan	7ef70e4873	cgroup: restore the call to eventfd->poll() I mistakenly removed the call to eventfd->poll() while I was actually intending to remove the return value... Calling evenfd->poll() will hook cgroup_event_wake() to the poll waitqueue, which will be called to unregister eventfd when rmdir a cgroup or close eventfd. Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-04-26 11:58:03 -07:00
Li Zefan	cc20e01cd6	cgroup: fix use-after-free when umounting cgroupfs Try: # mount -t cgroup xxx /cgroup # mkdir /cgroup/sub && rmdir /cgroup/sub && umount /cgroup And you might see this: ida_remove called for id=1 which is not allocated. It's because cgroup_kill_sb() is called to destroy root->cgroup_ida and free cgrp->root before ida_simple_removed() is called. What's worse is we're accessing cgrp->root while it has been freed. Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-04-26 11:58:02 -07:00
Li Zefan	712317ad97	cgroup: fix broken file xattrs We should store file xattrs in struct cfent instead of struct cftype, because cftype is a type while cfent is object instance of cftype. For example each cgroup has a tasks file, and each tasks file is associated with a uniq cfent, but all those files share the same struct cftype. Alexey Kodanev reported a crash, which can be reproduced: # mount -t cgroup -o xattr /sys/fs/cgroup # mkdir /sys/fs/cgroup/test # setfattr -n trusted.value -v test_value /sys/fs/cgroup/tasks # rmdir /sys/fs/cgroup/test # umount /sys/fs/cgroup oops! In this case, simple_xattrs_free() will free the same struct simple_xattrs twice. tj: Dropped unused local variable @cft from cgroup_diput(). Cc: <stable@vger.kernel.org> # 3.8.x Reported-by: Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-04-18 23:11:40 -07:00
Li Zefan	05fb22ec54	cgroup: remove cgrp->top_cgroup It's not used, and it can be retrieved via cgrp->root->top_cgroup. Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-04-14 23:26:10 -07:00
Tejun Heo	873fe09ea5	cgroup: introduce sane_behavior mount option It's a sad fact that at this point various cgroup controllers are carrying so many idiosyncrasies and pure insanities that it simply isn't possible to reach any sort of sane consistent behavior while maintaining staying fully compatible with what already has been exposed to userland. As we can't break exposed userland interface, transitioning to sane behaviors can only be done in steps while maintaining backwards compatibility. This patch introduces a new mount option - __DEVEL__sane_behavior - which disables crazy features and enforces consistent behaviors in cgroup core proper and various controllers. As exactly which behaviors it changes are still being determined, the mount option, at this point, is useful only for development of the new behaviors. As such, the mount option is prefixed with __DEVEL__ and generates a warning message when used. Eventually, once we get to the point where all controller's behaviors are consistent enough to implement unified hierarchy, the __DEVEL__ prefix will be dropped, and more importantly, unified-hierarchy will enforce sane_behavior by default. Maybe we'll able to completely drop the crazy stuff after a while, maybe not, but we at least have a strategy to move on to saner behaviors. This patch introduces the mount option and changes the following behaviors in cgroup core. * Mount options "noprefix" and "clone_children" are disallowed. Also, cgroupfs file cgroup.clone_children is not created. * When mounting an existing superblock, mount options should match. This is currently pretty crazy. If one mounts a cgroup, creates a subdirectory, unmounts it and then mount it again with different option, it looks like the new options are applied but they aren't. * Remount is disallowed. The behaviors changes are documented in the comment above CGRP_ROOT_SANE_BEHAVIOR enum and will be expanded as different controllers are converted and planned improvements progress. v2: Dropped unnecessary explicit file permission setting sane_behavior cftype entry as suggested by Li Zefan. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Acked-by: Li Zefan <lizefan@huawei.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Vivek Goyal <vgoyal@redhat.com>	2013-04-14 20:15:26 -07:00
Tejun Heo	25a7e6848d	move cgroupfs_root to include/linux/cgroup.h While controllers shouldn't be accessing cgroupfs_root directly, it being hidden inside kern/cgroup.c makes somethings pretty silly. This makes routing hierarchy-wide settings which need to be visible to controllers cumbersome. We're gonna add another hierarchy-wide setting which needs to be accessed from controllers. Move cgroupfs_root and its flags to the header file so that we can access root settings with inline helpers. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Acked-by: Li Zefan <lizefan@huawei.com>	2013-04-14 20:15:25 -07:00
Tejun Heo	9343862945	cgroup: convert cgroupfs_root flag bits to masks and add CGRP_ prefix There's no reason to be using bitops, which tends to be more cumbersome, to handle root flags. Convert them to masks. Also, as they'll be moved to include/linux/cgroup.h and it's generally a good idea, add CGRP_ prefix. Note that flags are assigned from (1 << 1). The first bit will be used by a flag which will be added soon. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Acked-by: Li Zefan <lizefan@huawei.com>	2013-04-14 20:15:25 -07:00
Tejun Heo	da1f296fd2	cgroup: make cgroup_path() not print double slashes While reimplementing cgroup_path(), `65dff759d2` ("cgroup: fix cgroup_path() vs rename() race") introduced a bug where the path of a non-root cgroup would have two slahses at the beginning, which is caused by treating the root cgroup which has the name '/' like non-root cgroups. $ grep systemd /proc/self/cgroup 1:name=systemd://user/root/1 Fix it by special casing root cgroup case and not looping over it in the normal path. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Li Zefan <lizefan@huawei.com>	2013-04-14 10:47:02 -07:00
Tejun Heo	26d5bbe5ba	Revert "cgroup: remove bind() method from cgroup_subsys." This reverts commit `84cfb6ab48`. There are scheduled changes which make use of the removed callback. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Rami Rosen <ramirose@gmail.com> Cc: Li Zefan <lizefan@huawei.com>	2013-04-12 10:29:04 -07:00
Li Zefan	78574cf981	cgroup: implement cgroup_is_descendant() A couple controllers want to determine whether two cgroups are in ancestor/descendant relationship. As it's more likely that the descendant is the primary subject of interest and there are other operations focusing on the descendants, let's ask is_descendent rather than is_ancestor. Implementation is trivial as the previous patch guarantees that all ancestors of a cgroup stay accessible as long as the cgroup is accessible. tj: Removed depth optimization, renamed from cgroup_is_ancestor(), rewrote descriptions. Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-04-10 11:07:08 -07:00
Li Zefan	415cf07a1c	cgroup: make sure parent won't be destroyed before its children Suppose we rmdir a cgroup and there're still css refs, this cgroup won't be freed. Then we rmdir the parent cgroup, and the parent is freed immediately due to css ref draining to 0. Now it would be a disaster if the still-alive child cgroup tries to access its parent. Make sure this won't happen. Signed-off-by: Li Zefan <lizefan@huawei.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-04-10 11:07:00 -07:00
Rami Rosen	84cfb6ab48	cgroup: remove bind() method from cgroup_subsys. The bind() method of cgroup_subsys is not used in any of the controllers (cpuset, freezer, blkio, net_cls, memcg, net_prio, devices, perf, hugetlb, cpu and cpuacct) tj: Removed the entry on ->bind() from Documentation/cgroups/cgroups.txt. Also updated a couple paragraphs which were suggesting that dynamic re-binding may be implemented. It's not gonna. Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-04-10 10:46:59 -07:00
Li Zefan	479f614110	cgroup: Kill subsys.active flag The only user was cpuacct. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Li Zefan <lizefan@huawei.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/5155385A.4040207@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-04-10 13:54:22 +02:00
Tejun Heo	2219449a65	cgroup: remove cgroup_lock_is_held() We don't want controllers to assume that the information is officially available and do funky things with it. The only user is task_subsys_state_check() which uses it to verify RCU access context. We can move cgroup_lock_is_held() inside CONFIG_PROVE_RCU but that doesn't add meaningful protection compared to conditionally exposing cgroup_mutex. Remove cgroup_lock_is_held(), export cgroup_mutex iff CONFIG_PROVE_RCU and use lockdep_is_held() directly on the mutex in task_subsys_state_check(). While at it, add parentheses around macro arguments in task_subsys_state_check(). Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>	2013-04-07 09:29:51 -07:00
Tejun Heo	47cfcd0922	cgroup: kill cgroup_[un]lock() Now that locking interface is unexported, there's no reason to keep around these thin wrappers. Kill them and use mutex operations directly. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>	2013-04-07 09:29:51 -07:00
Tejun Heo	b9777cf8d7	cgroup: unexport locking interface and cgroup_attach_task() Now that all external cgroup_lock() users are gone, we can finally unexport the locking interface and prevent future abuse of cgroup_mutex. Make cgroup_[un]lock() and cgroup_lock_live_group() static. Also, cgroup_attach_task() doesn't have any user left and can't be used without locking interface anyway. Make it static too. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>	2013-04-07 09:29:51 -07:00
Tejun Heo	7ae1bad99e	cgroup: relocate cgroup_lock_live_group() and cgroup_attach_task_all() cgroup_lock_live_group() and cgroup_attach_task() are scheduled to be made static. Relocate the former and cgroup_attach_task_all() so that we don't need forward declarations. This patch is pure relocation. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>	2013-04-07 09:29:51 -07:00
Tejun Heo	8cc9934520	cgroup, cpuset: replace move_member_tasks_to_cpuset() with cgroup_transfer_tasks() When a cpuset becomes empty (no CPU or memory), its tasks are transferred with the nearest ancestor with execution resources. This is implemented using cgroup_scan_tasks() with a callback which grabs cgroup_mutex and invokes cgroup_attach_task() on each task. Both cgroup_mutex and cgroup_attach_task() are scheduled to be unexported. Implement cgroup_transfer_tasks() in cgroup proper which is essentially the same as move_member_tasks_to_cpuset() except that it takes cgroups instead of cpusets and @to comes before @from like normal functions with those arguments, and replace move_member_tasks_to_cpuset() with it. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>	2013-04-07 09:29:50 -07:00
Kevin Wilson	1e2ccd1c0f	cgroup: remove unused parameter in cgroup_task_migrate(). This patch removes unused parameter from cgroup_task_migrate(). Signed-off-by: Kevin Wilson <wkevils@gmail.com> Acked-by: Acked-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-04-03 14:04:33 -07:00
Li Zefan	081aa458c3	cgroup: consolidate cgroup_attach_task() and cgroup_attach_proc() These two functions share most of the code. Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-20 07:50:25 -07:00
Li Zefan	3ac1707a13	cgroup: fix an off-by-one bug which may trigger BUG_ON() The 3rd parameter of flex_array_prealloc() is the number of elements, not the index of the last element. The effect of the bug is, when opening cgroup.procs, a flex array will be allocated and all elements of the array is allocated with GFP_KERNEL flag, but the last one is GFP_ATOMIC, and if we fail to allocate memory for it, it'll trigger a BUG_ON(). Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org	2013-03-20 07:50:04 -07:00
Tejun Heo	14a40ffccd	sched: replace PF_THREAD_BOUND with PF_NO_SETAFFINITY PF_THREAD_BOUND was originally used to mark kernel threads which were bound to a specific CPU using kthread_bind() and a task with the flag set allows cpus_allowed modifications only to itself. Workqueue is currently abusing it to prevent userland from meddling with cpus_allowed of workqueue workers. What we need is a flag to prevent userland from messing with cpus_allowed of certain kernel tasks. In kernel, anyone can (incorrectly) squash the flag, and, for worker-type usages, restricting cpus_allowed modification to the task itself doesn't provide meaningful extra proection as other tasks can inject work items to the task anyway. This patch replaces PF_THREAD_BOUND with PF_NO_SETAFFINITY. sched_setaffinity() checks the flag and return -EINVAL if set. set_cpus_allowed_ptr() is no longer affected by the flag. This will allow simplifying workqueue worker CPU affinity management. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de>	2013-03-19 13:45:20 -07:00
Li Zefan	80f36c2a1a	cgroup: remove useless code in cgroup_write_event_control() eventfd_poll() never returns POLLHUP. Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-12 15:36:00 -07:00
Li Zefan	6ee211ad0a	cgroup: don't bother to resize pid array When we open cgroup.procs, we'll allocate an buffer and store all tasks' tgid in it, and then duplicate entries will be stripped. If that results in a much smaller pid list, we'll re-allocate a smaller buffer. But we've already sucessfully allocated memory and reading the procs file is a short period and the memory will be freed very soon, so why bother to re-allocate memory. Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-12 15:36:00 -07:00
Li Zefan	d7eeac1913	cgroup: hold cgroup_mutex before calling css_offline() cpuset no longer nests cgroup_mutex inside cpu_hotplug lock, so we don't have to release cgroup_mutex before calling css_offline(). Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-12 15:35:59 -07:00
Li Zefan	6dc01181ea	cgroup: remove unused variables in cgroup_destroy_locked() Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-12 15:35:58 -07:00
Li Zefan	e7b2dcc52b	cgroup: remove cgroup_is_descendant() It was used by ns cgroup, and ns cgroup was removed long ago. Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-12 15:35:58 -07:00
Li Zefan	7d8e0bf56a	cgroup: avoid accessing modular cgroup subsys structure without locking subsys[i] is set to NULL in cgroup_unload_subsys() at modular unload, and that's protected by cgroup_mutex, and then the memory *subsys[i] resides will be freed. So this is unsafe without any locking: if (!ss \|\| ss->module) ... v2: - add a comment for enum cgroup_subsys_id - simplify the comment in cgroup_exit() Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-05 09:33:25 -08:00
Li Zefan	f50daa704f	cgroup: no need to check css refs for release notification We no longer fail rmdir() when there're still css refs, so we don't need to check css refs in check_for_release(). This also voids a bug. cgroup_has_css_refs() accesses subsys[i] without cgroup_mutex, so it can race with cgroup_unload_subsys(). cgroup_has_css_refs() ... if (ss == NULL \|\| ss->root != cgrp->root) if ss pointers to net_cls_subsys, and cls_cgroup module is unloaded right after the former check but before the latter, the memory that net_cls_subsys resides has become invalid. Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-04 10:04:54 -08:00
Li Zefan	65dff759d2	cgroup: fix cgroup_path() vs rename() race rename() will change dentry->d_name. The result of this race can be worse than seeing partially rewritten name, but we might access a stale pointer because rename() will re-allocate memory to hold a longer name. As accessing dentry->name must be protected by dentry->d_lock or parent inode's i_mutex, while on the other hand cgroup-path() can be called with some irq-safe spinlocks held, we can't generate cgroup path using dentry->d_name. Alternatively we make a copy of dentry->d_name and save it in cgrp->name when a cgroup is created, and update cgrp->name at rename(). v5: use flexible array instead of zero-size array. v4: - allocate root_cgroup_name and all root_cgroup->name points to it. - add cgroup_name() wrapper. v3: use kfree_rcu() instead of synchronize_rcu() in user-visible path. v2: make cgrp->name RCU safe. Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-04 09:50:08 -08:00

1 2 3 4 5 ...

396 Commits