The commit introducing the call to cond_resched_rcu() is backported
from a recent kernel version, which has got some very good updates
to the RCU, including a new function cond_resched_rcu which is
doing not-so-complicated rescheduling stuff.
Kernel 3.10 hasn't got any of these and porting would be overkill.
On our current code base, the RCU management is pretty stupid
compared to newer kernels, so it's just ok to reschedule by just
unlocking the RCU and relocking it: this will allow to update its
status and the drivers will be happy.
If we set SEAL_WRITE on a file, we must make sure there cannot be any
ongoing write-operations on the file. For write() calls, we simply lock
the inode mutex, for mmap() we simply verify there're no writable
mappings. However, there might be pages pinned by AIO, Direct-IO and
similar operations via GUP. We must make sure those do not write to the
memfd file after we set SEAL_WRITE.
As there is no way to notify GUP users to drop pages or to wait for them
to be done, we implement the wait ourself: When setting SEAL_WRITE, we
check all pages for their ref-count. If it's bigger than 1, we know
there's some user of the page. We then mark the page and wait for up to
150ms for those ref-counts to be dropped. If the ref-counts are not
dropped in time, we refuse the seal operation.
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Ryan Lortie <desrt@desrt.ca>
Cc: Lennart Poettering <lennart@poettering.net>
Cc: Daniel Mack <zonque@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If two processes share a common memory region, they usually want some
guarantees to allow safe access. This often includes:
- one side cannot overwrite data while the other reads it
- one side cannot shrink the buffer while the other accesses it
- one side cannot grow the buffer beyond previously set boundaries
If there is a trust-relationship between both parties, there is no need
for policy enforcement. However, if there's no trust relationship (eg.,
for general-purpose IPC) sharing memory-regions is highly fragile and
often not possible without local copies. Look at the following two
use-cases:
1) A graphics client wants to share its rendering-buffer with a
graphics-server. The memory-region is allocated by the client for
read/write access and a second FD is passed to the server. While
scanning out from the memory region, the server has no guarantee that
the client doesn't shrink the buffer at any time, requiring rather
cumbersome SIGBUS handling.
2) A process wants to perform an RPC on another process. To avoid huge
bandwidth consumption, zero-copy is preferred. After a message is
assembled in-memory and a FD is passed to the remote side, both sides
want to be sure that neither modifies this shared copy, anymore. The
source may have put sensible data into the message without a separate
copy and the target may want to parse the message inline, to avoid a
local copy.
While SIGBUS handling, POSIX mandatory locking and MAP_DENYWRITE provide
ways to achieve most of this, the first one is unproportionally ugly to
use in libraries and the latter two are broken/racy or even disabled due
to denial of service attacks.
This patch introduces the concept of SEALING. If you seal a file, a
specific set of operations is blocked on that file forever. Unlike locks,
seals can only be set, never removed. Hence, once you verified a specific
set of seals is set, you're guaranteed that no-one can perform the blocked
operations on this file, anymore.
An initial set of SEALS is introduced by this patch:
- SHRINK: If SEAL_SHRINK is set, the file in question cannot be reduced
in size. This affects ftruncate() and open(O_TRUNC).
- GROW: If SEAL_GROW is set, the file in question cannot be increased
in size. This affects ftruncate(), fallocate() and write().
- WRITE: If SEAL_WRITE is set, no write operations (besides resizing)
are possible. This affects fallocate(PUNCH_HOLE), mmap() and
write().
- SEAL: If SEAL_SEAL is set, no further seals can be added to a file.
This basically prevents the F_ADD_SEAL operation on a file and
can be set to prevent others from adding further seals that you
don't want.
The described use-cases can easily use these seals to provide safe use
without any trust-relationship:
1) The graphics server can verify that a passed file-descriptor has
SEAL_SHRINK set. This allows safe scanout, while the client is
allowed to increase buffer size for window-resizing on-the-fly.
Concurrent writes are explicitly allowed.
2) For general-purpose IPC, both processes can verify that SEAL_SHRINK,
SEAL_GROW and SEAL_WRITE are set. This guarantees that neither
process can modify the data while the other side parses it.
Furthermore, it guarantees that even with writable FDs passed to the
peer, it cannot increase the size to hit memory-limits of the source
process (in case the file-storage is accounted to the source).
The new API is an extension to fcntl(), adding two new commands:
F_GET_SEALS: Return a bitset describing the seals on the file. This
can be called on any FD if the underlying file supports
sealing.
F_ADD_SEALS: Change the seals of a given file. This requires WRITE
access to the file and F_SEAL_SEAL may not already be set.
Furthermore, the underlying file must support sealing and
there may not be any existing shared mapping of that file.
Otherwise, EBADF/EPERM is returned.
The given seals are _added_ to the existing set of seals
on the file. You cannot remove seals again.
The fcntl() handler is currently specific to shmem and disabled on all
files. A file needs to explicitly support sealing for this interface to
work. A separate syscall is added in a follow-up, which creates files that
support sealing. There is no intention to support this on other
file-systems. Semantics are unclear for non-volatile files and we lack any
use-case right now. Therefore, the implementation is specific to shmem.
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Ryan Lortie <desrt@desrt.ca>
Cc: Lennart Poettering <lennart@poettering.net>
Cc: Daniel Mack <zonque@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Angelo G. Del Regno <kholk11@gmail.com>
memfd_create() is similar to mmap(MAP_ANON), but returns a file-descriptor
that you can pass to mmap(). It can support sealing and avoids any
connection to user-visible mount-points. Thus, it's not subject to quotas
on mounted file-systems, but can be used like malloc()'ed memory, but with
a file-descriptor to it.
memfd_create() returns the raw shmem file, so calls like ftruncate() can
be used to modify the underlying inode. Also calls like fstat() will
return proper information and mark the file as regular file. If you want
sealing, you can specify MFD_ALLOW_SEALING. Otherwise, sealing is not
supported (like on all other regular files).
Compared to O_TMPFILE, it does not require a tmpfs mount-point and is not
subject to a filesystem size limit. It is still properly accounted to
memcg limits, though, and to the same overcommit or no-overcommit
accounting as all user memory.
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Ryan Lortie <desrt@desrt.ca>
Cc: Lennart Poettering <lennart@poettering.net>
Cc: Daniel Mack <zonque@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Angelo G. Del Regno <kholk11@gmail.com>
When we made the shmem_reserve_inode call in shmem_link conditional, we
forgot to update the declaration for ret so that it always has a known
value. Dan Carpenter pointed out this deficiency in the original patch.
Fixes: 1062af920c07 ("tmpfs: fix link accounting when a tmpfile is linked in")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Matej Kupljen <matej.kupljen@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 29b00e609960ae0fcff382f4c7079dd0874a5311)
Change-Id: I648253008c7977fabb9d04655c11f99a536f43ca
tmpfs has a peculiarity of accounting hard links as if they were
separate inodes: so that when the number of inodes is limited, as it is
by default, a user cannot soak up an unlimited amount of unreclaimable
dcache memory just by repeatedly linking a file.
But when v3.11 added O_TMPFILE, and the ability to use linkat() on the
fd, we missed accommodating this new case in tmpfs: "df -i" shows that
an extra "inode" remains accounted after the file is unlinked and the fd
closed and the actual inode evicted. If a user repeatedly links
tmpfiles into a tmpfs, the limit will be hit (ENOSPC) even after they
are deleted.
Just skip the extra reservation from shmem_link() in this case: there's
a sense in which this first link of a tmpfile is then cheaper than a
hard link of another file, but the accounting works out, and there's
still good limiting, so no need to do anything more complicated.
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1902182134370.7035@eggly.anvils
Fixes: f4e0c30c191 ("allow the temp files created by open() to be linked to")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Matej Kupljen <matej.kupljen@gmail.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 1062af920c07f5b54cf5060fde3339da6df0cf6b)
Change-Id: I194ccdf709b1a4a385e00e07f97085521bcabdc1
commit 7f556567036cb7f89aabe2f0954b08566b4efb53 upstream.
The well-spotted fallocate undo fix is good in most cases, but not when
fallocate failed on the very first page. index 0 then passes lend -1
to shmem_undo_range(), and that has two bad effects: (a) that it will
undo every fallocation throughout the file, unrestricted by the current
range; but more importantly (b) it can cause the undo to hang, because
lend -1 is treated as truncation, which makes it keep on retrying until
every page has gone, but those already fully instantiated will never go
away. Big thank you to xfstests generic/269 which demonstrates this.
Fixes: b9b4bb26af01 ("tmpfs: don't undo fallocate past its last page")
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit b9b4bb26af017dbe930cd4df7f9b2fc3a0497bfe upstream.
When fallocate is interrupted it will undo a range that extends one byte
past its range of allocated pages. This can corrupt an in-use page by
zeroing out its first byte. Instead, undo using the inclusive byte
range.
Fixes: 1635f6a741 ("tmpfs: undo fallocation on failure")
Link: http://lkml.kernel.org/r/1462713387-16724-1-git-send-email-anthony.romano@coreos.com
Signed-off-by: Anthony Romano <anthony.romano@coreos.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Brandon Philips <brandon@ifup.co>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJUyuGRAAoJEDjbvchgkmk+7EwQALYPOeh+AManQFB1MQvFuOgZ
/4ulpjhGXw/RPTKHMeyHo8vRfUhMOx8UPF62uql+g1l9b/Zt2bs6qXu4QcxRRsQc
trSTUpi+U14y1hkgqOVOcFYP2ZaTjNEBQgLJ4eGn46CliLqme+rfoyRYm2GXzcR4
6cbSAr3mufdFIpi9/8Dn62Gv0aws5lIv3qkHJXznyuux3tisPT5y6Ux2KJoivPn/
SqADtRpwo+7lTjl15fE++9AqNsGMorV6toT2OO/7nXP+824psInKLmREAT2qC99b
BG61vcYdxOuHtzmwrvCf1jSRjxhvZT0j2xhBr/vCKcxy08AT0vDv68zrV1r6TIuu
U7/CKXtFBY95cjfnkTLJuswBSuIA/+sQHV6DaddH0V8fcZ6rQMLrblQ9ZcFFFkmT
2SG6lmlXqZvcEKYGMnL/Dcow1rkRhB5stiGgTkYxjiRSRpzAHISRJ/GGpsT+rRqK
HpBs5p9JshvRl7RWKwAu+DNGaEK1X/WYxc4/jw6dZFWX7lEWSMIPlr9zXgZCZ39y
V6lV1VVlT9/CSs1swKHUyhHHehlFsnIlQ6Fkiycr/KkuqBLs92Hyb7WhpVa819yX
osXdxSm6J54skiOLKYpBWHpnY09Tc+p28VEfMpErTExgp2oE8F34K7kdhoQPQb97
2mHiXNa+J4CLUNQ+sRmw
=HDBo
-----END PGP SIGNATURE-----
Merge commit 'v3.10.67' into msm-3.10
This merge brings us up to date with upstream kernel.org tag v3.10.67.
It also contains changes to allow forbidden warnings introduced in
the commit 'core, nfqueue, openvswitch: Orphan frags in skb_zerocopy
and handle errors'. Once upstream has corrected these warnings, the
changes to scripts/gcc-wrapper.py, in this commit, can be reverted.
* commit 'v3.10.67' (915 commits)
Linux 3.10.67
md/raid5: fetch_block must fetch all the blocks handle_stripe_dirtying wants.
ext4: fix warning in ext4_da_update_reserve_space()
quota: provide interface for readding allocated space into reserved space
crypto: add missing crypto module aliases
crypto: include crypto- module prefix in template
crypto: prefix module autoloading with "crypto-"
drbd: merge_bvec_fn: properly remap bvm->bi_bdev
Revert "swiotlb-xen: pass dev_addr to swiotlb_tbl_unmap_single"
ipvs: uninitialized data with IP_VS_IPV6
KEYS: close race between key lookup and freeing
sata_dwc_460ex: fix resource leak on error path
x86/asm/traps: Disable tracing and kprobes in fixup_bad_iret and sync_regs
x86, tls: Interpret an all-zero struct user_desc as "no segment"
x86, tls, ldt: Stop checking lm in LDT_empty
x86/tsc: Change Fast TSC calibration failed from error to info
x86, hyperv: Mark the Hyper-V clocksource as being continuous
clocksource: exynos_mct: Fix bitmask regression for exynos4_mct_write
can: dev: fix crtlmode_supported check
bus: mvebu-mbus: fix support of MBus window 13
ARM: dts: imx25: Fix PWM "per" clocks
time: adjtimex: Validate the ADJ_FREQUENCY values
time: settimeofday: Validate the values of tv from user
dm cache: share cache-metadata object across inactive and active DM tables
ipr: wait for aborted command responses
drm/i915: Fix mutex->owner inspection race under DEBUG_MUTEXES
scripts/recordmcount.pl: There is no -m32 gcc option on Super-H anymore
ALSA: usb-audio: Add mic volume fix quirk for Logitech Webcam C210
libata: prevent HSM state change race between ISR and PIO
pinctrl: Fix two deadlocks
gpio: sysfs: fix gpio device-attribute leak
gpio: sysfs: fix gpio-chip device-attribute leak
Linux 3.10.66
s390/3215: fix tty output containing tabs
s390/3215: fix hanging console issue
fsnotify: next_i is freed during fsnotify_unmount_inodes.
netfilter: ipset: small potential read beyond the end of buffer
mmc: sdhci: Fix sleep in atomic after inserting SD card
LOCKD: Fix a race when initialising nlmsvc_timeout
x86, um: actually mark system call tables readonly
um: Skip futex_atomic_cmpxchg_inatomic() test
decompress_bunzip2: off by one in get_next_block()
ARM: shmobile: sh73a0 legacy: Set .control_parent for all irqpin instances
ARM: omap5/dra7xx: Fix frequency typos
ARM: clk-imx6q: fix video divider for rev T0 1.0
ARM: imx6q: drop unnecessary semicolon
ARM: dts: imx25: Fix the SPI1 clocks
Input: I8042 - add Acer Aspire 7738 to the nomux list
Input: i8042 - reset keyboard to fix Elantech touchpad detection
can: kvaser_usb: Don't send a RESET_CHIP for non-existing channels
can: kvaser_usb: Reset all URB tx contexts upon channel close
can: kvaser_usb: Don't free packets when tight on URBs
USB: keyspan: fix null-deref at probe
USB: cp210x: add IDs for CEL USB sticks and MeshWorks devices
USB: cp210x: fix ID for production CEL MeshConnect USB Stick
usb: dwc3: gadget: Stop TRB preparation after limit is reached
usb: dwc3: gadget: Fix TRB preparation during SG
OHCI: add a quirk for ULi M5237 blocking on reset
gpiolib: of: Correct error handling in of_get_named_gpiod_flags
NFSv4.1: Fix client id trunking on Linux
ftrace/jprobes/x86: Fix conflict between jprobes and function graph tracing
vfio-pci: Fix the check on pci device type in vfio_pci_probe()
uvcvideo: Fix destruction order in uvc_delete()
smiapp: Take mutex during PLL update in sensor initialisation
af9005: fix kernel panic on init if compiled without IR
smiapp-pll: Correct clock debug prints
video/logo: prevent use of logos after they have been freed
storvsc: ring buffer failures may result in I/O freeze
iscsi-target: Fail connection on short sendmsg writes
hp_accel: Add support for HP ZBook 15
cfg80211: Fix 160 MHz channels with 80+80 and 160 MHz drivers
ARC: [nsimosci] move peripherals to match model to FPGA
drm/i915: Force the CS stall for invalidate flushes
drm/i915: Invalidate media caches on gen7
drm/radeon: properly filter DP1.2 4k modes on non-DP1.2 hw
drm/radeon: check the right ring in radeon_evict_flags()
drm/vmwgfx: Fix fence event code
enic: fix rx skb checksum
alx: fix alx_poll()
tcp: Do not apply TSO segment limit to non-TSO packets
tg3: tg3_disable_ints using uninitialized mailbox value to disable interrupts
netlink: Don't reorder loads/stores before marking mmap netlink frame as available
netlink: Always copy on mmap TX.
Linux 3.10.65
mm: Don't count the stack guard page towards RLIMIT_STACK
mm: propagate error from stack expansion even for guard page
mm, vmscan: prevent kswapd livelock due to pfmemalloc-throttled process being killed
perf session: Do not fail on processing out of order event
perf: Fix events installation during moving group
perf/x86/intel/uncore: Make sure only uncore events are collected
Btrfs: don't delay inode ref updates during log replay
ARM: mvebu: disable I/O coherency on non-SMP situations on Armada 370/375/38x/XP
scripts/kernel-doc: don't eat struct members with __aligned
nilfs2: fix the nilfs_iget() vs. nilfs_new_inode() races
nfsd4: fix xdr4 inclusion of escaped char
fs: nfsd: Fix signedness bug in compare_blob
serial: samsung: wait for transfer completion before clock disable
writeback: fix a subtle race condition in I_DIRTY clearing
cdc-acm: memory leak in error case
genhd: check for int overflow in disk_expand_part_tbl()
USB: cdc-acm: check for valid interfaces
ALSA: hda - Fix wrong gpio_dir & gpio_mask hint setups for IDT/STAC codecs
ALSA: hda - using uninitialized data
ALSA: usb-audio: extend KEF X300A FU 10 tweak to Arcam rPAC
driver core: Fix unbalanced device reference in drivers_probe
x86, vdso: Use asm volatile in __getcpu
x86_64, vdso: Fix the vdso address randomization algorithm
HID: Add a new id 0x501a for Genius MousePen i608X
HID: add battery quirk for USB_DEVICE_ID_APPLE_ALU_WIRELESS_2011_ISO keyboard
HID: roccat: potential out of bounds in pyra_sysfs_write_settings()
HID: i2c-hid: prevent buffer overflow in early IRQ
HID: i2c-hid: fix race condition reading reports
iommu/vt-d: Fix an off-by-one bug in __domain_mapping()
UBI: Fix double free after do_sync_erase()
UBI: Fix invalid vfree()
pstore-ram: Allow optional mapping with pgprot_noncached
pstore-ram: Fix hangs by using write-combine mappings
PCI: Restore detection of read-only BARs
ASoC: dwc: Ensure FIFOs are flushed to prevent channel swap
ASoC: max98090: Fix ill-defined sidetone route
ASoC: sigmadsp: Refuse to load firmware files with a non-supported version
ath5k: fix hardware queue index assignment
swiotlb-xen: pass dev_addr to swiotlb_tbl_unmap_single
can: peak_usb: fix memset() usage
can: peak_usb: fix cleanup sequence order in case of error during init
ath9k: fix BE/BK queue order
ath9k_hw: fix hardware queue allocation
ocfs2: fix journal commit deadlock
Linux 3.10.64
Btrfs: fix fs corruption on transaction abort if device supports discard
Btrfs: do not move em to modified list when unpinning
eCryptfs: Remove buggy and unnecessary write in file name decode routine
eCryptfs: Force RO mount when encrypted view is enabled
udf: Verify symlink size before loading it
exit: pidns: alloc_pid() leaks pid_namespace if child_reaper is exiting
ncpfs: return proper error from NCP_IOC_SETROOT ioctl
crypto: af_alg - fix backlog handling
userns: Unbreak the unprivileged remount tests
userns: Allow setting gid_maps without privilege when setgroups is disabled
userns: Add a knob to disable setgroups on a per user namespace basis
userns: Rename id_map_mutex to userns_state_mutex
userns: Only allow the creator of the userns unprivileged mappings
userns: Check euid no fsuid when establishing an unprivileged uid mapping
userns: Don't allow unprivileged creation of gid mappings
userns: Don't allow setgroups until a gid mapping has been setablished
userns: Document what the invariant required for safe unprivileged mappings.
groups: Consolidate the setgroups permission checks
umount: Disallow unprivileged mount force
mnt: Update unprivileged remount test
mnt: Implicitly add MNT_NODEV on remount when it was implicitly added by mount
mac80211: free management frame keys when removing station
mac80211: fix multicast LED blinking and counter
KEYS: Fix stale key registration at error path
isofs: Fix unchecked printing of ER records
x86/tls: Don't validate lm in set_thread_area() after all
dm space map metadata: fix sm_bootstrap_get_nr_blocks()
dm bufio: fix memleak when using a dm_buffer's inline bio
nfs41: fix nfs4_proc_layoutget error handling
megaraid_sas: corrected return of wait_event from abort frame path
mmc: block: add newline to sysfs display of force_ro
mfd: tc6393xb: Fail ohci suspend if full state restore is required
md/bitmap: always wait for writes on unplug.
x86, kvm: Clear paravirt_enabled on KVM guests for espfix32's benefit
x86_64, switch_to(): Load TLS descriptors before switching DS and ES
x86/tls: Disallow unusual TLS segments
x86/tls: Validate TLS entries to protect espfix
isofs: Fix infinite looping over CE entries
Linux 3.10.63
ALSA: usb-audio: Don't resubmit pending URBs at MIDI error recovery
powerpc: 32 bit getcpu VDSO function uses 64 bit instructions
ARM: sched_clock: Load cycle count after epoch stabilizes
igb: bring link up when PHY is powered up
ext2: Fix oops in ext2_get_block() called from ext2_quota_write()
nEPT: Nested INVEPT
net: sctp: use MAX_HEADER for headroom reserve in output path
net: mvneta: fix Tx interrupt delay
rtnetlink: release net refcnt on error in do_setlink()
net/mlx4_core: Limit count field to 24 bits in qp_alloc_res
tg3: fix ring init when there are more TX than RX channels
ipv6: gre: fix wrong skb->protocol in WCCP
sata_fsl: fix error handling of irq_of_parse_and_map
ahci: disable MSI on SAMSUNG 0xa800 SSD
AHCI: Add DeviceIDs for Sunrise Point-LP SATA controller
media: smiapp: Only some selection targets are settable
drm/i915: Unlock panel even when LVDS is disabled
drm/radeon: kernel panic in drm_calc_vbltimestamp_from_scanoutpos with 3.18.0-rc6
i2c: davinci: generate STP always when NACK is received
i2c: omap: fix i207 errata handling
i2c: omap: fix NACK and Arbitration Lost irq handling
xen-netfront: Remove BUGs on paged skb data which crosses a page boundary
mm: fix swapoff hang after page migration and fork
mm: frontswap: invalidate expired data on a dup-store failure
Linux 3.10.62
nfsd: Fix ACL null pointer deref
powerpc/powernv: Honor the generic "no_64bit_msi" flag
bnx2fc: do not add shared skbs to the fcoe_rx_list
nfsd4: fix leak of inode reference on delegation failure
nfsd: Fix slot wake up race in the nfsv4.1 callback code
rt2x00: do not align payload on modern H/W
can: dev: avoid calling kfree_skb() from interrupt context
spi: dw: Fix dynamic speed change.
iser-target: Handle DEVICE_REMOVAL event on network portal listener correctly
target: Don't call TFO->write_pending if data_length == 0
srp-target: Retry when QP creation fails with ENOMEM
Input: xpad - use proper endpoint type
ARM: 8222/1: mvebu: enable strex backoff delay
ARM: 8216/1: xscale: correct auxiliary register in suspend/resume
ALSA: usb-audio: Add ctrl message delay quirk for Marantz/Denon devices
can: esd_usb2: fix memory leak on disconnect
USB: xhci: don't start a halted endpoint before its new dequeue is set
usb-quirks: Add reset-resume quirk for MS Wireless Laser Mouse 6000
usb: serial: ftdi_sio: add PIDs for Matrix Orbital products
USB: serial: cp210x: add IDs for CEL MeshConnect USB Stick
USB: keyspan: fix tty line-status reporting
USB: keyspan: fix overrun-error reporting
USB: ssu100: fix overrun-error reporting
iio: Fix IIO_EVENT_CODE_EXTRACT_DIR bit mask
powerpc/pseries: Fix endiannes issue in RTAS call from xmon
powerpc/pseries: Honor the generic "no_64bit_msi" flag
of/base: Fix PowerPC address parsing hack
ASoC: wm_adsp: Avoid attempt to free buffers that might still be in use
ASoC: sgtl5000: Fix SMALL_POP bit definition
PCI/MSI: Add device flag indicating that 64-bit MSIs don't work
ipx: fix locking regression in ipx_sendmsg and ipx_recvmsg
pptp: fix stack info leak in pptp_getname()
qmi_wwan: Add support for HP lt4112 LTE/HSPA+ Gobi 4G Modem
ieee802154: fix error handling in ieee802154fake_probe()
ipv4: Fix incorrect error code when adding an unreachable route
inetdevice: fixed signed integer overflow
sparc64: Fix constraints on swab helpers.
uprobes, x86: Fix _TIF_UPROBE vs _TIF_NOTIFY_RESUME
x86, mm: Set NX across entire PMD at boot
x86: Require exact match for 'noxsave' command line option
x86_64, traps: Rework bad_iret
x86_64, traps: Stop using IST for #SS
x86_64, traps: Fix the espfix64 #DF fixup and rewrite it in C
MIPS: Loongson: Make platform serial setup always built-in.
MIPS: oprofile: Fix backtrace on 64-bit kernel
Linux 3.10.61
mm: memcg: handle non-error OOM situations more gracefully
mm: memcg: do not trap chargers with full callstack on OOM
mm: memcg: rework and document OOM waiting and wakeup
mm: memcg: enable memcg OOM killer only for user faults
x86: finish user fault error path with fatal signal
arch: mm: pass userspace fault flag to generic fault handler
arch: mm: do not invoke OOM killer on kernel fault OOM
arch: mm: remove obsolete init OOM protection
mm: invoke oom-killer from remaining unconverted page fault handlers
net: sctp: fix skb_over_panic when receiving malformed ASCONF chunks
net: sctp: fix panic on duplicate ASCONF chunks
net: sctp: fix remote memory pressure from excessive queueing
KVM: x86: Don't report guest userspace emulation error to userspace
SCSI: hpsa: fix a race in cmd_free/scsi_done
net/mlx4_en: Fix BlueFlame race
ARM: Correct BUG() assembly to ensure it is endian-agnostic
perf/x86/intel: Use proper dTLB-load-misses event on IvyBridge
mei: bus: fix possible boundaries violation
perf: Handle compat ioctl
MIPS: Fix forgotten preempt_enable() when CPU has inclusive pcaches
dell-wmi: Fix access out of memory
ARM: probes: fix instruction fetch order with <asm/opcodes.h>
br: fix use of ->rx_handler_data in code executed on non-rx_handler path
netfilter: nf_nat: fix oops on netns removal
netfilter: xt_bpf: add mising opaque struct sk_filter definition
netfilter: nf_log: release skbuff on nlmsg put failure
netfilter: nfnetlink_log: fix maximum packet length logged to userspace
netfilter: nf_log: account for size of NLMSG_DONE attribute
ipc: always handle a new value of auto_msgmni
clocksource: Remove "weak" from clocksource_default_clock() declaration
kgdb: Remove "weak" from kgdb_arch_pc() declaration
media: ttusb-dec: buffer overflow in ioctl
NFSv4: Fix races between nfs_remove_bad_delegation() and delegation return
nfs: Fix use of uninitialized variable in nfs_getattr()
NFS: Don't try to reclaim delegation open state if recovery failed
NFSv4: Ensure that we remove NFSv4.0 delegations when state has expired
Input: alps - allow up to 2 invalid packets without resetting device
Input: alps - ignore potential bare packets when device is out of sync
dm raid: ensure superblock's size matches device's logical block size
dm btree: fix a recursion depth bug in btree walking code
block: Fix computation of merged request priority
parisc: Use compat layer for msgctl, shmat, shmctl and semtimedop syscalls
scsi: only re-lock door after EH on devices that were reset
nfs: fix pnfs direct write memory leak
firewire: cdev: prevent kernel stack leaking into ioctl arguments
arm64: __clear_user: handle exceptions on strb
ARM: 8198/1: make kuser helpers depend on MMU
drm/radeon: add missing crtc unlock when setting up the MC
mac80211: fix use-after-free in defragmentation
macvtap: Fix csum_start when VLAN tags are present
iwlwifi: configure the LTR
libceph: do not crash on large auth tickets
xtensa: re-wire umount syscall to sys_oldumount
ALSA: usb-audio: Fix memory leak in FTU quirk
ahci: disable MSI instead of NCQ on Samsung pci-e SSDs on macbooks
ahci: Add Device IDs for Intel Sunrise Point PCH
audit: keep inode pinned
x86, x32, audit: Fix x32's AUDIT_ARCH wrt audit
sparc32: Implement xchg and atomic_xchg using ATOMIC_HASH locks
sparc64: Do irq_{enter,exit}() around generic_smp_call_function*().
sparc64: Fix crashes in schizo_pcierr_intr_other().
sunvdc: don't call VD_OP_GET_VTOC
vio: fix reuse of vio_dring slot
sunvdc: limit each sg segment to a page
sunvdc: compute vdisk geometry from capacity
sunvdc: add cdrom and v1.1 protocol support
net: sctp: fix memory leak in auth key management
net: sctp: fix NULL pointer dereference in af->from_addr_param on malformed packet
gre6: Move the setting of dev->iflink into the ndo_init functions.
ip6_tunnel: Use ip6_tnl_dev_init as the ndo_init function.
Linux 3.10.60
libceph: ceph-msgr workqueue needs a resque worker
Btrfs: fix kfree on list_head in btrfs_lookup_csums_range error cleanup
of: Fix overflow bug in string property parsing functions
sysfs: driver core: Fix glue dir race condition by gdp_mutex
i2c: at91: don't account as iowait
acer-wmi: Add acpi_backlight=video quirk for the Acer KAV80
rbd: Fix error recovery in rbd_obj_read_sync()
drm/radeon: remove invalid pci id
usb: gadget: udc: core: fix kernel oops with soft-connect
usb: gadget: function: acm: make f_acm pass USB20CV Chapter9
usb: dwc3: gadget: fix set_halt() bug with pending transfers
crypto: algif - avoid excessive use of socket buffer in skcipher
mm: Remove false WARN_ON from pagecache_isize_extended()
x86, apic: Handle a bad TSC more gracefully
posix-timers: Fix stack info leak in timer_create()
mac80211: fix typo in starting baserate for rts_cts_rate_idx
PM / Sleep: fix recovery during resuming from hibernation
tty: Fix high cpu load if tty is unreleaseable
quota: Properly return errors from dquot_writeback_dquots()
ext3: Don't check quota format when there are no quota files
nfsd4: fix crash on unknown operation number
cpc925_edac: Report UE events properly
e7xxx_edac: Report CE events properly
i3200_edac: Report CE events properly
i82860_edac: Report CE events properly
scsi: Fix error handling in SCSI_IOCTL_SEND_COMMAND
lib/bitmap.c: fix undefined shift in __bitmap_shift_{left|right}()
cgroup/kmemleak: add kmemleak_free() for cgroup deallocations.
usb: Do not allow usb_alloc_streams on unconfigured devices
USB: opticon: fix non-atomic allocation in write path
usb-storage: handle a skipped data phase
spi: pxa2xx: toggle clocks on suspend if not disabled by runtime PM
spi: pl022: Fix incorrect dma_unmap_sg
usb: dwc3: gadget: Properly initialize LINK TRB
wireless: rt2x00: add new rt2800usb device
USB: option: add Haier CE81B CDMA modem
usb: option: add support for Telit LE910
USB: cdc-acm: only raise DTR on transitions from B0
USB: cdc-acm: add device id for GW Instek AFG-2225
usb: serial: ftdi_sio: add "bricked" FTDI device PID
usb: serial: ftdi_sio: add Awinda Station and Dongle products
USB: serial: cp210x: add Silicon Labs 358x VID and PID
serial: Fix divide-by-zero fault in uart_get_divisor()
staging:iio:ade7758: Remove "raw" from channel name
staging:iio:ade7758: Fix check if channels are enabled in prenable
staging:iio:ade7758: Fix NULL pointer deref when enabling buffer
staging:iio:ad5933: Drop "raw" from channel names
staging:iio:ad5933: Fix NULL pointer deref when enabling buffer
OOM, PM: OOM killed task shouldn't escape PM suspend
freezer: Do not freeze tasks killed by OOM killer
ext4: fix oops when loading block bitmap failed
cpufreq: intel_pstate: Fix setting max_perf_pct in performance policy
ext4: fix overflow when updating superblock backups after resize
ext4: check s_chksum_driver when looking for bg csum presence
ext4: fix reservation overflow in ext4_da_write_begin
ext4: add ext4_iget_normal() which is to be used for dir tree lookups
ext4: grab missed write_count for EXT4_IOC_SWAP_BOOT
ext4: don't check quota format when there are no quota files
ext4: check EA value offset when loading
jbd2: free bh when descriptor block checksum fails
MIPS: tlbex: Properly fix HUGE TLB Refill exception handler
target: Fix APTPL metadata handling for dynamic MappedLUNs
target: Fix queue full status NULL pointer for SCF_TRANSPORT_TASK_SENSE
qla_target: don't delete changed nacls
ARC: Update order of registers in KGDB to match GDB 7.5
ARC: [nsimosci] Allow "headless" models to boot
KVM: x86: Emulator fixes for eip canonical checks on near branches
KVM: x86: Fix wrong masking on relative jump/call
kvm: x86: don't kill guest on unknown exit reason
KVM: x86: Check non-canonical addresses upon WRMSR
KVM: x86: Improve thread safety in pit
KVM: x86: Prevent host from panicking on shared MSR writes.
kvm: fix excessive pages un-pinning in kvm_iommu_map error path.
media: tda7432: Fix setting TDA7432_MUTE bit for TDA7432_RF register
media: ds3000: fix LNB supply voltage on Tevii S480 on initialization
media: em28xx-v4l: give back all active video buffers to the vb2 core properly on streaming stop
media: v4l2-common: fix overflow in v4l_bound_align_image()
drm/nouveau/bios: memset dcb struct to zero before parsing
drm/tilcdc: Fix the error path in tilcdc_load()
drm/ast: Fix HW cursor image
Input: i8042 - quirks for Fujitsu Lifebook A544 and Lifebook AH544
Input: i8042 - add noloop quirk for Asus X750LN
framebuffer: fix border color
modules, lock around setting of MODULE_STATE_UNFORMED
dm log userspace: fix memory leak in dm_ulog_tfr_init failure path
block: fix alignment_offset math that assumes io_min is a power-of-2
drbd: compute the end before rb_insert_augmented()
dm bufio: update last_accessed when relinking a buffer
virtio_pci: fix virtio spec compliance on restore
selinux: fix inode security list corruption
pstore: Fix duplicate {console,ftrace}-efi entries
mfd: rtsx_pcr: Fix MSI enable error handling
mnt: Prevent pivot_root from creating a loop in the mount tree
UBI: add missing kmem_cache_free() in process_pool_aeb error path
random: add and use memzero_explicit() for clearing data
crypto: more robust crypto_memneq
fix misuses of f_count() in ppp and netlink
kill wbuf_queued/wbuf_dwork_lock
ALSA: pcm: Zero-clear reserved fields of PCM status ioctl in compat mode
evm: check xattr value length and type in evm_inode_setxattr()
x86, pageattr: Prevent overflow in slow_virt_to_phys() for X86_PAE
x86_64, entry: Fix out of bounds read on sysenter
x86_64, entry: Filter RFLAGS.NT on entry from userspace
x86, flags: Rename X86_EFLAGS_BIT1 to X86_EFLAGS_FIXED
x86, fpu: shift drop_init_fpu() from save_xstate_sig() to handle_signal()
x86, fpu: __restore_xstate_sig()->math_state_restore() needs preempt_disable()
x86: Reject x32 executables if x32 ABI not supported
vfs: fix data corruption when blocksize < pagesize for mmaped data
UBIFS: fix free log space calculation
UBIFS: fix a race condition
UBIFS: remove mst_mutex
fs: Fix theoretical division by 0 in super_cache_scan().
fs: make cont_expand_zero interruptible
mmc: rtsx_pci_sdmmc: fix incorrect last byte in R2 response
libata-sff: Fix controllers with no ctl port
pata_serverworks: disable 64-KB DMA transfers on Broadcom OSB4 IDE Controller
Revert "percpu: free percpu allocation info for uniprocessor system"
lockd: Try to reconnect if statd has moved
drivers/net: macvtap and tun depend on INET
ipv4: dst_entry leak in ip_send_unicast_reply()
ax88179_178a: fix bonding failure
ipv4: fix nexthop attlen check in fib_nh_match
tracing/syscalls: Ignore numbers outside NR_syscalls' range
Linux 3.10.59
ecryptfs: avoid to access NULL pointer when write metadata in xattr
ARM: at91/PMC: don't forget to write PMC_PCDR register to disable clocks
ALSA: usb-audio: Add support for Steinberg UR22 USB interface
ALSA: emu10k1: Fix deadlock in synth voice lookup
ALSA: pcm: use the same dma mmap codepath both for arm and arm64
arm64: compat: fix compat types affecting struct compat_elf_prpsinfo
spi: dw-mid: terminate ongoing transfers at exit
kernel: add support for gcc 5
fanotify: enable close-on-exec on events' fd when requested in fanotify_init()
mm: clear __GFP_FS when PF_MEMALLOC_NOIO is set
Bluetooth: Fix issue with USB suspend in btusb driver
Bluetooth: Fix HCI H5 corrupted ack value
rt2800: correct BBP1_TX_POWER_CTRL mask
PCI: Generate uppercase hex for modalias interface class
PCI: Increase IBM ipr SAS Crocodile BARs to at least system page size
iwlwifi: Add missing PCI IDs for the 7260 series
NFSv4.1: Fix an NFSv4.1 state renewal regression
NFSv4: fix open/lock state recovery error handling
NFSv4: Fix lock recovery when CREATE_SESSION/SETCLIENTID_CONFIRM fails
lzo: check for length overrun in variable length encoding.
Revert "lzo: properly check for overruns"
Documentation: lzo: document part of the encoding
m68k: Disable/restore interrupts in hwreg_present()/hwreg_write()
Drivers: hv: vmbus: Fix a bug in vmbus_open()
Drivers: hv: vmbus: Cleanup vmbus_establish_gpadl()
Drivers: hv: vmbus: Cleanup vmbus_teardown_gpadl()
Drivers: hv: vmbus: Cleanup vmbus_post_msg()
firmware_class: make sure fw requests contain a name
qla2xxx: Use correct offset to req-q-out for reserve calculation
mptfusion: enable no_write_same for vmware scsi disks
be2iscsi: check ip buffer before copying
regmap: fix NULL pointer dereference in _regmap_write/read
regmap: debugfs: fix possbile NULL pointer dereference
spi: dw-mid: check that DMA was inited before exit
spi: dw-mid: respect 8 bit mode
x86/intel/quark: Switch off CR4.PGE so TLB flush uses CR3 instead
kvm: don't take vcpu mutex for obviously invalid vcpu ioctls
KVM: s390: unintended fallthrough for external call
kvm: x86: fix stale mmio cache bug
fs: Add a missing permission check to do_umount
Btrfs: fix race in WAIT_SYNC ioctl
Btrfs: fix build_backref_tree issue with multiple shared blocks
Btrfs: try not to ENOSPC on log replay
Linux 3.10.58
USB: cp210x: add support for Seluxit USB dongle
USB: serial: cp210x: added Ketra N1 wireless interface support
USB: Add device quirk for ASUS T100 Base Station keyboard
ipv6: reallocate addrconf router for ipv6 address when lo device up
tcp: fixing TLP's FIN recovery
sctp: handle association restarts when the socket is closed.
ip6_gre: fix flowi6_proto value in xmit path
hyperv: Fix a bug in netvsc_start_xmit()
tg3: Allow for recieve of full-size 8021AD frames
tg3: Work around HW/FW limitations with vlan encapsulated frames
l2tp: fix race while getting PMTU on PPP pseudo-wire
openvswitch: fix panic with multiple vlan headers
packet: handle too big packets for PACKET_V3
tcp: fix tcp_release_cb() to dispatch via address family for mtu_reduced()
sit: Fix ipip6_tunnel_lookup device matching criteria
myri10ge: check for DMA mapping errors
Linux 3.10.57
cpufreq: ondemand: Change the calculation of target frequency
cpufreq: Fix wrong time unit conversion
nl80211: clear skb cb before passing to netlink
drbd: fix regression 'out of mem, failed to invoke fence-peer helper'
jiffies: Fix timeval conversion to jiffies
md/raid5: disable 'DISCARD' by default due to safety concerns.
media: vb2: fix VBI/poll regression
mm: numa: Do not mark PTEs pte_numa when splitting huge pages
mm, thp: move invariant bug check out of loop in __split_huge_page_map
ring-buffer: Fix infinite spin in reading buffer
init/Kconfig: Fix HAVE_FUTEX_CMPXCHG to not break up the EXPERT menu
perf: fix perf bug in fork()
udf: Avoid infinite loop when processing indirect ICBs
Linux 3.10.56
vm_is_stack: use for_each_thread() rather then buggy while_each_thread()
oom_kill: add rcu_read_lock() into find_lock_task_mm()
oom_kill: has_intersects_mems_allowed() needs rcu_read_lock()
oom_kill: change oom_kill.c to use for_each_thread()
introduce for_each_thread() to replace the buggy while_each_thread()
kernel/fork.c:copy_process(): unify CLONE_THREAD-or-thread_group_leader code
arm: multi_v7_defconfig: Enable Zynq UART driver
ext2: Fix fs corruption in ext2_get_xip_mem()
serial: 8250_dma: check the result of TX buffer mapping
ARM: 7748/1: oabi: handle faults when loading swi instruction from userspace
netfilter: nf_conntrack: avoid large timeout for mid-stream pickup
PM / sleep: Use valid_state() for platform-dependent sleep states only
PM / sleep: Add state field to pm_states[] entries
ipvs: fix ipv6 hook registration for local replies
ipvs: Maintain all DSCP and ECN bits for ipv6 tun forwarding
ipvs: avoid netns exit crash on ip_vs_conn_drop_conntrack
md/raid1: fix_read_error should act on all non-faulty devices.
media: cx18: fix kernel oops with tda8290 tuner
Fix nasty 32-bit overflow bug in buffer i/o code.
perf kmem: Make it work again on non NUMA machines
perf: Fix a race condition in perf_remove_from_context()
alarmtimer: Lock k_itimer during timer callback
alarmtimer: Do not signal SIGEV_NONE timers
parisc: Only use -mfast-indirect-calls option for 32-bit kernel builds
powerpc/perf: Fix ABIv2 kernel backtraces
sched: Fix unreleased llc_shared_mask bit during CPU hotplug
ocfs2/dlm: do not get resource spinlock if lockres is new
nilfs2: fix data loss with mmap()
fs/notify: don't show f_handle if exportfs_encode_inode_fh failed
fsnotify/fdinfo: use named constants instead of hardcoded values
kcmp: fix standard comparison bug
Revert "mac80211: disable uAPSD if all ACs are under ACM"
usb: dwc3: core: fix ordering for PHY suspend
usb: dwc3: core: fix order of PM runtime calls
usb: host: xhci: fix compliance mode workaround
genhd: fix leftover might_sleep() in blk_free_devt()
lockd: fix rpcbind crash on lockd startup failure
rtlwifi: rtl8192cu: Add new ID
percpu: perform tlb flush after pcpu_map_pages() failure
percpu: fix pcpu_alloc_pages() failure path
percpu: free percpu allocation info for uniprocessor system
ata_piix: Add Device IDs for Intel 9 Series PCH
Input: i8042 - add nomux quirk for Avatar AVIU-145A6
Input: i8042 - add Fujitsu U574 to no_timeout dmi table
Input: atkbd - do not try 'deactivate' keyboard on any LG laptops
Input: elantech - fix detection of touchpad on ASUS s301l
Input: synaptics - add support for ForcePads
Input: serport - add compat handling for SPIOCSTYPE ioctl
dm crypt: fix access beyond the end of allocated space
block: Fix dev_t minor allocation lifetime
workqueue: apply __WQ_ORDERED to create_singlethread_workqueue()
Revert "iwlwifi: dvm: don't enable CTS to self"
SCSI: libiscsi: fix potential buffer overrun in __iscsi_conn_send_pdu
NFC: microread: Potential overflows in microread_target_discovered()
iscsi-target: Fix memory corruption in iscsit_logout_post_handler_diffcid
iscsi-target: avoid NULL pointer in iscsi_copy_param_list failure
Target/iser: Don't put isert_conn inside disconnected handler
Target/iser: Get isert_conn reference once got to connected_handler
iio:inkern: fix overwritten -EPROBE_DEFER in of_iio_channel_get_by_name
iio:magnetometer: bugfix magnetometers gain values
iio: adc: ad_sigma_delta: Fix indio_dev->trig assignment
iio: st_sensors: Fix indio_dev->trig assignment
iio: meter: ade7758: Fix indio_dev->trig assignment
iio: inv_mpu6050: Fix indio_dev->trig assignment
iio: gyro: itg3200: Fix indio_dev->trig assignment
iio:trigger: modify return value for iio_trigger_get
CIFS: Fix SMB2 readdir error handling
CIFS: Fix directory rename error
ASoC: davinci-mcasp: Correct rx format unit configuration
shmem: fix nlink for rename overwrite directory
x86 early_ioremap: Increase FIX_BTMAPS_SLOTS to 8
KVM: x86: handle idiv overflow at kvm_write_tsc
regmap: Fix handling of volatile registers for format_write() chips
ACPICA: Update to GPIO region handler interface.
MIPS: mcount: Adjust stack pointer for static trace in MIPS32
MIPS: ZBOOT: add missing <linux/string.h> include
ARM: 8165/1: alignment: don't break misaligned NEON load/store
ARM: 7897/1: kexec: Use the right ISA for relocate_new_kernel
ARM: 8133/1: use irq_set_affinity with force=false when migrating irqs
ARM: 8128/1: abort: don't clear the exclusive monitors
NFSv4: Fix another bug in the close/open_downgrade code
NFSv4: nfs4_state_manager() vs. nfs_server_remove_lists()
usb:hub set hub->change_bits when over-current happens
usb: dwc3: omap: fix ordering for runtime pm calls
USB: EHCI: unlink QHs even after the controller has stopped
USB: storage: Add quirks for Entrega/Xircom USB to SCSI converters
USB: storage: Add quirk for Ariston Technologies iConnect USB to SCSI adapter
USB: storage: Add quirk for Adaptec USBConnect 2000 USB-to-SCSI Adapter
storage: Add single-LUN quirk for Jaz USB Adapter
usb: hub: take hub->hdev reference when processing from eventlist
xhci: fix oops when xhci resumes from hibernate with hw lpm capable devices
xhci: Fix null pointer dereference if xhci initialization fails
USB: zte_ev: fix removed PIDs
USB: ftdi_sio: add support for NOVITUS Bono E thermal printer
USB: sierra: add 1199:68AA device ID
USB: sierra: avoid CDC class functions on "68A3" devices
USB: zte_ev: remove duplicate Qualcom PID
USB: zte_ev: remove duplicate Gobi PID
Revert "USB: option,zte_ev: move most ZTE CDMA devices to zte_ev"
USB: option: add VIA Telecom CDS7 chipset device id
USB: option: reduce interrupt-urb logging verbosity
USB: serial: fix potential heap buffer overflow
USB: sisusb: add device id for Magic Control USB video
USB: serial: fix potential stack buffer overflow
USB: serial: pl2303: add device id for ztek device
xtensa: fix a6 and a7 handling in fast_syscall_xtensa
xtensa: fix TLBTEMP_BASE_2 region handling in fast_second_level_miss
xtensa: fix access to THREAD_RA/THREAD_SP/THREAD_DS
xtensa: fix address checks in dma_{alloc,free}_coherent
xtensa: replace IOCTL code definitions with constants
drm/radeon: add connector quirk for fujitsu board
drm/vmwgfx: Fix a potential infinite spin waiting for fifo idle
drm/ast: AST2000 cannot be detected correctly
drm/i915: Wait for vblank before enabling the TV encoder
drm/i915: Remove bogus __init annotation from DMI callbacks
HID: logitech-dj: prevent false errors to be shown
HID: magicmouse: sanity check report size in raw_event() callback
HID: picolcd: sanity check report size in raw_event() callback
cfq-iosched: Fix wrong children_weight calculation
ALSA: pcm: fix fifo_size frame calculation
ALSA: hda - Fix invalid pin powermap without jack detection
ALSA: hda - Fix COEF setups for ALC1150 codec
ALSA: core: fix buffer overflow in snd_info_get_line()
arm64: ptrace: fix compat hardware watchpoint reporting
trace: Fix epoll hang when we race with new entries
i2c: at91: Fix a race condition during signal handling in at91_do_twi_xfer.
i2c: at91: add bound checking on SMBus block length bytes
arm64: flush TLS registers during exec
ibmveth: Fix endian issues with rx_no_buffer statistic
ahci: add pcid for Marvel 0x9182 controller
ahci: Add Device IDs for Intel 9 Series PCH
pata_scc: propagate return value of scc_wait_after_reset
drm/i915: read HEAD register back in init_ring_common() to enforce ordering
drm/radeon: load the lm63 driver for an lm64 thermal chip.
drm/ttm: Choose a pool to shrink correctly in ttm_dma_pool_shrink_scan().
drm/ttm: Fix possible division by 0 in ttm_dma_pool_shrink_scan().
drm/tilcdc: fix double kfree
drm/tilcdc: fix release order on exit
drm/tilcdc: panel: fix leak when unloading the module
drm/tilcdc: tfp410: fix dangling sysfs connector node
drm/tilcdc: slave: fix dangling sysfs connector node
drm/tilcdc: panel: fix dangling sysfs connector node
carl9170: fix sending URBs with wrong type when using full-speed
Linux 3.10.55
libceph: gracefully handle large reply messages from the mon
libceph: rename ceph_msg::front_max to front_alloc_len
tpm: Provide a generic means to override the chip returned timeouts
vfs: fix bad hashing of dentries
dcache.c: get rid of pointless macros
IB/srp: Fix deadlock between host removal and multipathd
blkcg: don't call into policy draining if root_blkg is already gone
mtd: nand: omap: Fix 1-bit Hamming code scheme, omap_calculate_ecc()
mtd/ftl: fix the double free of the buffers allocated in build_maps()
CIFS: Fix wrong restart readdir for SMB1
CIFS: Fix wrong filename length for SMB2
CIFS: Fix wrong directory attributes after rename
CIFS: Possible null ptr deref in SMB2_tcon
CIFS: Fix async reading on reconnects
CIFS: Fix STATUS_CANNOT_DELETE error mapping for SMB2
libceph: do not hard code max auth ticket len
libceph: add process_one_ticket() helper
libceph: set last_piece in ceph_msg_data_pages_cursor_init() correctly
md/raid1,raid10: always abort recover on write error.
xfs: don't zero partial page cache pages during O_DIRECT writes
xfs: don't zero partial page cache pages during O_DIRECT writes
xfs: don't dirty buffers beyond EOF
xfs: quotacheck leaves dquot buffers without verifiers
RDMA/iwcm: Use a default listen backlog if needed
md/raid10: Fix memory leak when raid10 reshape completes.
md/raid10: fix memory leak when reshaping a RAID10.
md/raid6: avoid data corruption during recovery of double-degraded RAID6
Bluetooth: Avoid use of session socket after the session gets freed
Bluetooth: never linger on process exit
mnt: Add tests for unprivileged remount cases that have found to be faulty
mnt: Change the default remount atime from relatime to the existing value
mnt: Correct permission checks in do_remount
mnt: Move the test for MNT_LOCK_READONLY from change_mount_flags into do_remount
mnt: Only change user settable mount flags in remount
ring-buffer: Up rb_iter_peek() loop count to 3
ring-buffer: Always reset iterator to reader page
ACPI / cpuidle: fix deadlock between cpuidle_lock and cpu_hotplug.lock
ACPI: Run fixed event device notifications in process context
ACPICA: Utilities: Fix memory leak in acpi_ut_copy_iobject_to_iobject
bfa: Fix undefined bit shift on big-endian architectures with 32-bit DMA address
ASoC: pxa-ssp: drop SNDRV_PCM_FMTBIT_S24_LE
ASoC: max98090: Fix missing free_irq
ASoC: samsung: Correct I2S DAI suspend/resume ops
ASoC: wm_adsp: Add missing MODULE_LICENSE
ASoC: pcm: fix dpcm_path_put in dpcm runtime update
openrisc: Rework signal handling
MIPS: Fix accessing to per-cpu data when flushing the cache
MIPS: OCTEON: make get_system_type() thread-safe
MIPS: asm: thread_info: Add _TIF_SECCOMP flag
MIPS: Cleanup flags in syscall flags handlers.
MIPS: asm/reg.h: Make 32- and 64-bit definitions available at the same time
MIPS: Remove BUG_ON(!is_fpu_owner()) in do_ade()
MIPS: tlbex: Fix a missing statement for HUGETLB
MIPS: Prevent user from setting FCSR cause bits
MIPS: GIC: Prevent array overrun
drivers: scsi: storvsc: Correctly handle TEST_UNIT_READY failure
Drivers: scsi: storvsc: Implement a eh_timed_out handler
powerpc/pseries: Failure on removing device node
powerpc/mm: Use read barrier when creating real_pte
powerpc/mm/numa: Fix break placement
regulator: arizona-ldo1: remove bypass functionality
mfd: omap-usb-host: Fix improper mask use.
kernel/smp.c:on_each_cpu_cond(): fix warning in fallback path
CAPABILITIES: remove undefined caps from all processes
tpm: missing tpm_chip_put in tpm_get_random()
firmware: Do not use WARN_ON(!spin_is_locked())
spi: omap2-mcspi: Configure hardware when slave driver changes mode
spi: orion: fix incorrect handling of cell-index DT property
iommu/amd: Fix cleanup_domain for mass device removal
media: media-device: Remove duplicated memset() in media_enum_entities()
media: au0828: Only alt setting logic when needed
media: xc4000: Fix get_frequency()
media: xc5000: Fix get_frequency()
Linux 3.10.54
USB: fix build error with CONFIG_PM_RUNTIME disabled
NFSv4: Fix problems with close in the presence of a delegation
NFSv3: Fix another acl regression
svcrdma: Select NFSv4.1 backchannel transport based on forward channel
NFSD: Decrease nfsd_users in nfsd_startup_generic fail
usb: hub: Prevent hub autosuspend if usbcore.autosuspend is -1
USB: whiteheat: Added bounds checking for bulk command response
USB: ftdi_sio: Added PID for new ekey device
USB: ftdi_sio: add Basic Micro ATOM Nano USB2Serial PID
ARM: OMAP2+: hwmod: Rearm wake-up interrupts for DT when MUSB is idled
usb: xhci: amd chipset also needs short TX quirk
xhci: Treat not finding the event_seg on COMP_STOP the same as COMP_STOP_INVAL
Staging: speakup: Update __speakup_paste_selection() tty (ab)usage to match vt
jbd2: fix infinite loop when recovering corrupt journal blocks
mei: nfc: fix memory leak in error path
mei: reset client state on queued connect request
Btrfs: fix csum tree corruption, duplicate and outdated checksums
hpsa: fix bad -ENOMEM return value in hpsa_big_passthru_ioctl
x86/efi: Enforce CONFIG_RELOCATABLE for EFI boot stub
x86_64/vsyscall: Fix warn_bad_vsyscall log output
x86: don't exclude low BIOS area when allocating address space for non-PCI cards
drm/radeon: add additional SI pci ids
ext4: fix BUG_ON in mb_free_blocks()
kvm: iommu: fix the third parameter of kvm_iommu_put_pages (CVE-2014-3601)
Revert "KVM: x86: Increase the number of fixed MTRR regs to 10"
KVM: nVMX: fix "acknowledge interrupt on exit" when APICv is in use
KVM: x86: always exit on EOIs for interrupts listed in the IOAPIC redir table
KVM: x86: Inter-privilege level ret emulation is not implemeneted
crypto: ux500 - make interrupt mode plausible
serial: core: Preserve termios c_cflag for console resume
ext4: fix ext4_discard_allocated_blocks() if we can't allocate the pa struct
drivers/i2c/busses: use correct type for dma_map/unmap
hwmon: (dme1737) Prevent overflow problem when writing large limits
hwmon: (ads1015) Fix out-of-bounds array access
hwmon: (lm85) Fix various errors on attribute writes
hwmon: (ads1015) Fix off-by-one for valid channel index checking
hwmon: (gpio-fan) Prevent overflow problem when writing large limits
hwmon: (lm78) Fix overflow problems seen when writing large temperature limits
hwmon: (sis5595) Prevent overflow problem when writing large limits
drm: omapdrm: fix compiler errors
ARM: OMAP3: Fix choice of omap3_restore_es function in OMAP34XX rev3.1.2 case.
mei: start disconnect request timer consistently
ALSA: hda/realtek - Avoid setting wrong COEF on ALC269 & co
ALSA: hda/ca0132 - Don't try loading firmware at resume when already failed
ALSA: virtuoso: add Xonar Essence STX II support
ALSA: hda - fix an external mic jack problem on a HP machine
USB: Fix persist resume of some SS USB devices
USB: ehci-pci: USB host controller support for Intel Quark X1000
USB: serial: ftdi_sio: Add support for new Xsens devices
USB: serial: ftdi_sio: Annotate the current Xsens PID assignments
USB: OHCI: don't lose track of EDs when a controller dies
isofs: Fix unbounded recursion when processing relocated directories
HID: fix a couple of off-by-ones
HID: logitech: perform bounds checking on device_id early enough
stable_kernel_rules: Add pointer to netdev-FAQ for network patches
Linux 3.10.53
arch/sparc/math-emu/math_32.c: drop stray break operator
sparc64: ldc_connect() should not return EINVAL when handshake is in progress.
sunsab: Fix detection of BREAK on sunsab serial console
bbc-i2c: Fix BBC I2C envctrl on SunBlade 2000
sparc64: Guard against flushing openfirmware mappings.
sparc64: Do not insert non-valid PTEs into the TSB hash table.
sparc64: Add membar to Niagara2 memcpy code.
sparc64: Fix huge TSB mapping on pre-UltraSPARC-III cpus.
sparc64: Don't bark so loudly about 32-bit tasks generating 64-bit fault addresses.
sparc64: Fix top-level fault handling bugs.
sparc64: Handle 32-bit tasks properly in compute_effective_address().
sparc64: Make itc_sync_lock raw
sparc64: Fix argument sign extension for compat_sys_futex().
sctp: fix possible seqlock seadlock in sctp_packet_transmit()
iovec: make sure the caller actually wants anything in memcpy_fromiovecend
net: Correctly set segment mac_len in skb_segment().
macvlan: Initialize vlan_features to turn on offload support.
net: sctp: inherit auth_capable on INIT collisions
tcp: Fix integer-overflow in TCP vegas
tcp: Fix integer-overflows in TCP veno
net: sendmsg: fix NULL pointer dereference
ip: make IP identifiers less predictable
inetpeer: get rid of ip_id_count
bnx2x: fix crash during TSO tunneling
Linux 3.10.52
x86/espfix/xen: Fix allocation of pages for paravirt page tables
lib/btree.c: fix leak of whole btree nodes
net/l2tp: don't fall back on UDP [get|set]sockopt
net: mvneta: replace Tx timer with a real interrupt
net: mvneta: add missing bit descriptions for interrupt masks and causes
net: mvneta: do not schedule in mvneta_tx_timeout
net: mvneta: use per_cpu stats to fix an SMP lock up
net: mvneta: increase the 64-bit rx/tx stats out of the hot path
Revert "mac80211: move "bufferable MMPDU" check to fix AP mode scan"
staging: vt6655: Fix Warning on boot handle_irq_event_percpu.
x86_64/entry/xen: Do not invoke espfix64 on Xen
x86, espfix: Make it possible to disable 16-bit support
x86, espfix: Make espfix64 a Kconfig option, fix UML
x86, espfix: Fix broken header guard
x86, espfix: Move espfix definitions into a separate header file
x86-64, espfix: Don't leak bits 31:16 of %esp returning to 16-bit stack
Revert "x86-64, modify_ldt: Make support for 16-bit segments a runtime option"
timer: Fix lock inversion between hrtimer_bases.lock and scheduler locks
printk: rename printk_sched to printk_deferred
iio: buffer: Fix demux table creation
staging: vt6655: Fix disassociated messages every 10 seconds
mm, thp: do not allow thp faults to avoid cpuset restrictions
scsi: handle flush errors properly
rapidio/tsi721_dma: fix failure to obtain transaction descriptor
cfg80211: fix mic_failure tracing
ARM: 8115/1: LPAE: reduce damage caused by idmap to virtual memory layout
crypto: af_alg - properly label AF_ALG socket
Linux 3.10.51
core, nfqueue, openvswitch: Orphan frags in skb_zerocopy and handle errors
x86/efi: Include a .bss section within the PE/COFF headers
s390/ptrace: fix PSW mask check
Fix gcc-4.9.0 miscompilation of load_balance() in scheduler
mm: hugetlb: fix copy_hugetlb_page_range()
x86_32, entry: Store badsys error code in %eax
hwmon: (smsc47m192) Fix temperature limit and vrm write operations
parisc: Remove SA_RESTORER define
coredump: fix the setting of PF_DUMPCORE
Input: fix defuzzing logic
slab_common: fix the check for duplicate slab names
slab_common: Do not check for duplicate slab names
tracing: Fix wraparound problems in "uptime" trace clock
blkcg: don't call into policy draining if root_blkg is already gone
ahci: add support for the Promise FastTrak TX8660 SATA HBA (ahci mode)
libata: introduce ata_host->n_tags to avoid oops on SAS controllers
libata: support the ata host which implements a queue depth less than 32
block: don't assume last put of shared tags is for the host
block: provide compat ioctl for BLKZEROOUT
media: tda10071: force modulation to QPSK on DVB-S
media: hdpvr: fix two audio bugs
Linux 3.10.50
ARC: Implement ptrace(PTRACE_GET_THREAD_AREA)
sched: Fix possible divide by zero in avg_atom() calculation
locking/mutex: Disable optimistic spinning on some architectures
PM / sleep: Fix request_firmware() error at resume
dm cache metadata: do not allow the data block size to change
dm thin metadata: do not allow the data block size to change
alarmtimer: Fix bug where relative alarm timers were treated as absolute
drm/radeon: avoid leaking edid data
drm/qxl: return IRQ_NONE if it was not our irq
drm/radeon: set default bl level to something reasonable
irqchip: gic: Fix core ID calculation when topology is read from DT
irqchip: gic: Add support for cortex a7 compatible string
ring-buffer: Fix polling on trace_pipe
mwifiex: fix Tx timeout issue
perf/x86/intel: ignore CondChgd bit to avoid false NMI handling
ipv4: fix buffer overflow in ip_options_compile()
dns_resolver: Null-terminate the right string
dns_resolver: assure that dns_query() result is null-terminated
sunvnet: clean up objects created in vnet_new() on vnet_exit()
net: pppoe: use correct channel MTU when using Multilink PPP
net: sctp: fix information leaks in ulpevent layer
tipc: clear 'next'-pointer of message fragments before reassembly
be2net: set EQ DB clear-intr bit in be_open()
netlink: Fix handling of error from netlink_dump().
net: mvneta: Fix big endian issue in mvneta_txq_desc_csum()
net: mvneta: fix operation in 10 Mbit/s mode
appletalk: Fix socket referencing in skb
tcp: fix false undo corner cases
igmp: fix the problem when mc leave group
net: qmi_wwan: add two Sierra Wireless/Netgear devices
net: qmi_wwan: Add ID for Telewell TW-LTE 4G v2
ipv4: icmp: Fix pMTU handling for rare case
tcp: Fix divide by zero when pushing during tcp-repair
bnx2x: fix possible panic under memory stress
net: fix sparse warning in sk_dst_set()
ipv4: irq safe sk_dst_[re]set() and ipv4_sk_update_pmtu() fix
ipv4: fix dst race in sk_dst_get()
8021q: fix a potential memory leak
net: sctp: check proc_dointvec result in proc_sctp_do_auth
tcp: fix tcp_match_skb_to_sack() for unaligned SACK at end of an skb
ip_tunnel: fix ip_tunnel_lookup
shmem: fix splicing from a hole while it's punched
shmem: fix faulting into a hole, not taking i_mutex
shmem: fix faulting into a hole while it's punched
iwlwifi: dvm: don't enable CTS to self
igb: do a reset on SR-IOV re-init if device is down
hwmon: (adt7470) Fix writes to temperature limit registers
hwmon: (da9052) Don't use dash in the name attribute
hwmon: (da9055) Don't use dash in the name attribute
tracing: Add ftrace_trace_stack into __trace_puts/__trace_bputs
tracing: Fix graph tracer with stack tracer on other archs
fuse: handle large user and group ID
Bluetooth: Ignore H5 non-link packets in non-active state
Drivers: hv: util: Fix a bug in the KVP code
media: gspca_pac7302: Add new usb-id for Genius i-Look 317
usb: Check if port status is equal to RxDetect
Signed-off-by: Ian Maund <imaund@codeaurora.org>
commit b928095b0a7cff7fb9fcf4c706348ceb8ab2c295 upstream.
If overwriting an empty directory with rename, then need to drop the extra
nlink.
Test prog:
#include <stdio.h>
#include <fcntl.h>
#include <err.h>
#include <sys/stat.h>
int main(void)
{
const char *test_dir1 = "test-dir1";
const char *test_dir2 = "test-dir2";
int res;
int fd;
struct stat statbuf;
res = mkdir(test_dir1, 0777);
if (res == -1)
err(1, "mkdir(\"%s\")", test_dir1);
res = mkdir(test_dir2, 0777);
if (res == -1)
err(1, "mkdir(\"%s\")", test_dir2);
fd = open(test_dir2, O_RDONLY);
if (fd == -1)
err(1, "open(\"%s\")", test_dir2);
res = rename(test_dir1, test_dir2);
if (res == -1)
err(1, "rename(\"%s\", \"%s\")", test_dir1, test_dir2);
res = fstat(fd, &statbuf);
if (res == -1)
err(1, "fstat(%i)", fd);
if (statbuf.st_nlink != 0) {
fprintf(stderr, "nlink is %lu, should be 0\n", statbuf.st_nlink);
return 1;
}
return 0;
}
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b1a366500bd537b50c3aad26dc7df083ec03a448 upstream.
shmem_fault() is the actual culprit in trinity's hole-punch starvation,
and the most significant cause of such problems: since a page faulted is
one that then appears page_mapped(), needing unmap_mapping_range() and
i_mmap_mutex to be unmapped again.
But it is not the only way in which a page can be brought into a hole in
the radix_tree while that hole is being punched; and Vlastimil's testing
implies that if enough other processors are busy filling in the hole,
then shmem_undo_range() can be kept from completing indefinitely.
shmem_file_splice_read() is the main other user of SGP_CACHE, which can
instantiate shmem pagecache pages in the read-only case (without holding
i_mutex, so perhaps concurrently with a hole-punch). Probably it's
silly not to use SGP_READ already (using the ZERO_PAGE for holes): which
ought to be safe, but might bring surprises - not a change to be rushed.
shmem_read_mapping_page_gfp() is an internal interface used by
drivers/gpu/drm GEM (and next by uprobes): it should be okay. And
shmem_file_read_iter() uses the SGP_DIRTY variant of SGP_CACHE, when
called internally by the kernel (perhaps for a stacking filesystem,
which might rely on holes to be reserved): it's unclear whether it could
be provoked to keep hole-punch busy or not.
We could apply the same umbrella as now used in shmem_fault() to
shmem_file_splice_read() and the others; but it looks ugly, and use over
a range raises questions - should it actually be per page? can these get
starved themselves?
The origin of this part of the problem is my v3.1 commit d0823576bf
("mm: pincer in truncate_inode_pages_range"), once it was duplicated
into shmem.c. It seemed like a nice idea at the time, to ensure
(barring RCU lookup fuzziness) that there's an instant when the entire
hole is empty; but the indefinitely repeated scans to ensure that make
it vulnerable.
Revert that "enhancement" to hole-punch from shmem_undo_range(), but
retain the unproblematic rescanning when it's truncating; add a couple
of comments there.
Remove the "indices[0] >= end" test: that is now handled satisfactorily
by the inner loop, and mem_cgroup_uncharge_start()/end() are too light
to be worth avoiding here.
But if we do not always loop indefinitely, we do need to handle the case
of swap swizzled back to page before shmem_free_swap() gets it: add a
retry for that case, as suggested by Konstantin Khlebnikov; and for the
case of page swizzled back to swap, as suggested by Johannes Weiner.
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Lukas Czerner <lczerner@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8e205f779d1443a94b5ae81aa359cb535dd3021e upstream.
Commit f00cdc6df7d7 ("shmem: fix faulting into a hole while it's
punched") was buggy: Sasha sent a lockdep report to remind us that
grabbing i_mutex in the fault path is a no-no (write syscall may already
hold i_mutex while faulting user buffer).
We tried a completely different approach (see following patch) but that
proved inadequate: good enough for a rational workload, but not good
enough against trinity - which forks off so many mappings of the object
that contention on i_mmap_mutex while hole-puncher holds i_mutex builds
into serious starvation when concurrent faults force the puncher to fall
back to single-page unmap_mapping_range() searches of the i_mmap tree.
So return to the original umbrella approach, but keep away from i_mutex
this time. We really don't want to bloat every shmem inode with a new
mutex or completion, just to protect this unlikely case from trinity.
So extend the original with wait_queue_head on stack at the hole-punch
end, and wait_queue item on the stack at the fault end.
This involves further use of i_lock to guard against the races: lockdep
has been happy so far, and I see fs/inode.c:unlock_new_inode() holds
i_lock around wake_up_bit(), which is comparable to what we do here.
i_lock is more convenient, but we could switch to shmem's info->lock.
This issue has been tagged with CVE-2014-4171, which will require commit
f00cdc6df7d7 and this and the following patch to be backported: we
suggest to 3.1+, though in fact the trinity forkbomb effect might go
back as far as 2.6.16, when madvise(,,MADV_REMOVE) came in - or might
not, since much has changed, with i_mmap_mutex a spinlock before 3.0.
Anyone running trinity on 3.0 and earlier? I don't think we need care.
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Tested-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Lukas Czerner <lczerner@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f00cdc6df7d7cfcabb5b740911e6788cb0802bdb upstream.
Trinity finds that mmap access to a hole while it's punched from shmem
can prevent the madvise(MADV_REMOVE) or fallocate(FALLOC_FL_PUNCH_HOLE)
from completing, until the reader chooses to stop; with the puncher's
hold on i_mutex locking out all other writers until it can complete.
It appears that the tmpfs fault path is too light in comparison with its
hole-punching path, lacking an i_data_sem to obstruct it; but we don't
want to slow down the common case.
Extend shmem_fallocate()'s existing range notification mechanism, so
shmem_fault() can refrain from faulting pages into the hole while it's
punched, waiting instead on i_mutex (when safe to sleep; or repeatedly
faulting when not).
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Tested-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Dave Jones <davej@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
dynamic_dname() is both too much and too little for those - the
output may be well in excess of 64 bytes dynamic_dname() assumes
to be enough (thanks to ashmem feeding really long names to
shmem_file_setup()) and vsnprintf() is an overkill for those
guys.
Change-Id: Ibbcb570e6634e34016cd3c8d07f817f01be59210
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Git-commit: 118b23022512eb2f41ce42db70dc0568d00be4ba
Git-repo: https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git
Signed-off-by: Subbaraman Narayanamurthy <subbaram@codeaurora.org>
commit 118b23022512eb2f41ce42db70dc0568d00be4ba upstream.
dynamic_dname() is both too much and too little for those - the
output may be well in excess of 64 bytes dynamic_dname() assumes
to be enough (thanks to ashmem feeding really long names to
shmem_file_setup()) and vsnprintf() is an overkill for those
guys.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Colin Cross <ccross@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
NOT FOR STAGING
This patch re-adds the original shmem_set_file to mm/shmem.c
and converts ashmem.c back to using it.
CC: Brian Swetland <swetland@google.com>
CC: Colin Cross <ccross@android.com>
CC: Arve Hjønnevåg <arve@android.com>
CC: Dima Zavin <dima@android.com>
CC: Robert Love <rlove@google.com>
CC: Greg KH <greg@kroah.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Create a CONFIG_MMU=y stub for ramfs_nommu_expand_for_mapping() in the
usual fashion.
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Wolfram Sang <wsa@the-dreams.de>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull vfs pile (part one) from Al Viro:
"Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
locking violations, etc.
The most visible changes here are death of FS_REVAL_DOT (replaced with
"has ->d_weak_revalidate()") and a new helper getting from struct file
to inode. Some bits of preparation to xattr method interface changes.
Misc patches by various people sent this cycle *and* ocfs2 fixes from
several cycles ago that should've been upstream right then.
PS: the next vfs pile will be xattr stuff."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
saner proc_get_inode() calling conventions
proc: avoid extra pde_put() in proc_fill_super()
fs: change return values from -EACCES to -EPERM
fs/exec.c: make bprm_mm_init() static
ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
ocfs2: fix possible use-after-free with AIO
ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
target: writev() on single-element vector is pointless
export kernel_write(), convert open-coded instances
fs: encode_fh: return FILEID_INVALID if invalid fid_type
kill f_vfsmnt
vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
nfsd: handle vfs_getattr errors in acl protocol
switch vfs_getattr() to struct path
default SET_PERSONALITY() in linux/elf.h
ceph: prepopulate inodes only when request is aborted
d_hash_and_lookup(): export, switch open-coded instances
9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
9p: split dropping the acls from v9fs_set_create_acl()
...
This patch is a follow up on below patch:
[PATCH] exportfs: add FILEID_INVALID to indicate invalid fid_type
commit: 216b6cbdcb
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Vivek Trivedi <t.vivek@samsung.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Acked-by: Sage Weil <sage@inktank.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Note that provided ->d_dname() reproduces what we used to get for
those guys in e.g. /proc/self/maps; it might be a good idea to change
that to something less ugly, but for now let's keep the existing
user-visible behaviour
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull user namespace and namespace infrastructure changes from Eric W Biederman:
"This set of changes starts with a few small enhnacements to the user
namespace. reboot support, allowing more arbitrary mappings, and
support for mounting devpts, ramfs, tmpfs, and mqueuefs as just the
user namespace root.
I do my best to document that if you care about limiting your
unprivileged users that when you have the user namespace support
enabled you will need to enable memory control groups.
There is a minor bug fix to prevent overflowing the stack if someone
creates way too many user namespaces.
The bulk of the changes are a continuation of the kuid/kgid push down
work through the filesystems. These changes make using uids and gids
typesafe which ensures that these filesystems are safe to use when
multiple user namespaces are in use. The filesystems converted for
3.9 are ceph, 9p, afs, ocfs2, gfs2, ncpfs, nfs, nfsd, and cifs. The
changes for these filesystems were a little more involved so I split
the changes into smaller hopefully obviously correct changes.
XFS is the only filesystem that remains. I was hoping I could get
that in this release so that user namespace support would be enabled
with an allyesconfig or an allmodconfig but it looks like the xfs
changes need another couple of days before it they are ready."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (93 commits)
cifs: Enable building with user namespaces enabled.
cifs: Convert struct cifs_ses to use a kuid_t and a kgid_t
cifs: Convert struct cifs_sb_info to use kuids and kgids
cifs: Modify struct smb_vol to use kuids and kgids
cifs: Convert struct cifsFileInfo to use a kuid
cifs: Convert struct cifs_fattr to use kuid and kgids
cifs: Convert struct tcon_link to use a kuid.
cifs: Modify struct cifs_unix_set_info_args to hold a kuid_t and a kgid_t
cifs: Convert from a kuid before printing current_fsuid
cifs: Use kuids and kgids SID to uid/gid mapping
cifs: Pass GLOBAL_ROOT_UID and GLOBAL_ROOT_GID to keyring_alloc
cifs: Use BUILD_BUG_ON to validate uids and gids are the same size
cifs: Override unmappable incoming uids and gids
nfsd: Enable building with user namespaces enabled.
nfsd: Properly compare and initialize kuids and kgids
nfsd: Store ex_anon_uid and ex_anon_gid as kuids and kgids
nfsd: Modify nfsd4_cb_sec to use kuids and kgids
nfsd: Handle kuids and kgids in the nfs4acl to posix_acl conversion
nfsd: Convert nfsxdr to use kuids and kgids
nfsd: Convert nfs3xdr to use kuids and kgids
...
Fix several mempolicy leaks in the tmpfs mount logic. These leaks are
slow - on the order of one object leaked per mount attempt.
Leak 1 (umount doesn't free mpol allocated in mount):
while true; do
mount -t tmpfs -o mpol=interleave,size=100M nodev /mnt
umount /mnt
done
Leak 2 (errors parsing remount options will leak mpol):
mount -t tmpfs -o size=100M nodev /mnt
while true; do
mount -o remount,mpol=interleave,size=x /mnt 2> /dev/null
done
umount /mnt
Leak 3 (multiple mpol per mount leak mpol):
while true; do
mount -t tmpfs -o mpol=interleave,mpol=interleave,size=100M nodev /mnt
umount /mnt
done
This patch fixes all of the above. I could have broken the patch into
three pieces but is seemed easier to review as one.
[akpm@linux-foundation.org: fix handling of mpol_parse_str() errors, per Hugh]
Signed-off-by: Greg Thelen <gthelen@google.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The tmpfs remount logic preserves filesystem mempolicy if the mpol=M
option is not specified in the remount request. A new policy can be
specified if mpol=M is given.
Before this patch remounting an mpol bound tmpfs without specifying
mpol= mount option in the remount request would set the filesystem's
mempolicy object to a freed mempolicy object.
To reproduce the problem boot a DEBUG_PAGEALLOC kernel and run:
# mkdir /tmp/x
# mount -t tmpfs -o size=100M,mpol=interleave nodev /tmp/x
# grep /tmp/x /proc/mounts
nodev /tmp/x tmpfs rw,relatime,size=102400k,mpol=interleave:0-3 0 0
# mount -o remount,size=200M nodev /tmp/x
# grep /tmp/x /proc/mounts
nodev /tmp/x tmpfs rw,relatime,size=204800k,mpol=??? 0 0
# note ? garbage in mpol=... output above
# dd if=/dev/zero of=/tmp/x/f count=1
# panic here
Panic:
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [< (null)>] (null)
[...]
Oops: 0010 [#1] SMP DEBUG_PAGEALLOC
Call Trace:
mpol_shared_policy_init+0xa5/0x160
shmem_get_inode+0x209/0x270
shmem_mknod+0x3e/0xf0
shmem_create+0x18/0x20
vfs_create+0xb5/0x130
do_last+0x9a1/0xea0
path_openat+0xb3/0x4d0
do_filp_open+0x42/0xa0
do_sys_open+0xfe/0x1e0
compat_sys_open+0x1b/0x20
cstar_dispatch+0x7/0x1f
Non-debug kernels will not crash immediately because referencing the
dangling mpol will not cause a fault. Instead the filesystem will
reference a freed mempolicy object, which will cause unpredictable
behavior.
The problem boils down to a dropped mpol reference below if
shmem_parse_options() does not allocate a new mpol:
config = *sbinfo
shmem_parse_options(data, &config, true)
mpol_put(sbinfo->mpol)
sbinfo->mpol = config.mpol /* BUG: saves unreferenced mpol */
This patch avoids the crash by not releasing the mempolicy if
shmem_parse_options() doesn't create a new mpol.
How far back does this issue go? I see it in both 2.6.36 and 3.3. I did
not look back further.
Signed-off-by: Greg Thelen <gthelen@google.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In shmem_find_get_pages_and_swap(), use the faster radix tree iterator
construct from commit 78c1d78488 ("radix-tree: introduce bit-optimized
iterator").
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Allocating a file structure in function get_empty_filp() might fail because
of several reasons:
- not enough memory for file structures
- operation is not allowed
- user is over its limit
Currently the function returns NULL in all cases and we loose the exact
reason of the error. All callers of get_empty_filp() assume that the function
can fail with ENFILE only.
Return error through pointer. Change all callers to preserve this error code.
[AV: cleaned up a bit, carved the get_empty_filp() part out into a separate commit
(things remaining here deal with alloc_file()), removed pipe(2) behaviour change]
Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Reviewed-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
There is no backing store to tmpfs and file creation rules are the
same as for any other filesystem so it is semantically safe to allow
unprivileged users to mount it. ramfs is safe for the same reasons so
allow either flavor of tmpfs to be mounted by a user namespace root
user.
The memory control group successfully limits how much memory tmpfs can
consume on any system that cares about a user namespace root using
tmpfs to exhaust memory the memory control group can be deployed.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Remove the unused argument (formerly no_context) from mpol_parse_str()
and from mpol_to_str().
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
But the kernel decided to call it "origin" instead. Fix most of the
sites.
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Revert 3.5's commit f21f806220 ("tmpfs: revert SEEK_DATA and
SEEK_HOLE") to reinstate 4fb5ef089b ("tmpfs: support SEEK_DATA and
SEEK_HOLE"), with the intervening additional arg to
generic_file_llseek_size().
In 3.8, ext4 is expected to join btrfs, ocfs2 and xfs with proper
SEEK_DATA and SEEK_HOLE support; and a good case has now been made for
it on tmpfs, so let's join the party.
It's quite easy for tmpfs to scan the radix_tree to support llseek's new
SEEK_DATA and SEEK_HOLE options: so add them while the minutiae are
still on my mind (in particular, the !PageUptodate-ness of pages
fallocated but still unwritten).
[akpm@linux-foundation.org: fix warning with CONFIG_TMPFS=n]
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jaegeuk Hanse <jaegeuk.hanse@gmail.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Zheng Liu <wenqing.lz@taobao.com>
Cc: Jeff liu <jeff.liu@oracle.com>
Cc: Paul Eggert <eggert@cs.ucla.edu>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Josef Bacik <josef@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Andreas Dilger <adilger@dilger.ca>
Cc: Marco Stornelli <marco.stornelli@gmail.com>
Cc: Chris Mason <chris.mason@fusionio.com>
Cc: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This fixes a regression in 3.7-rc, which has since gone into stable.
Commit 00442ad04a ("mempolicy: fix a memory corruption by refcount
imbalance in alloc_pages_vma()") changed get_vma_policy() to raise the
refcount on a shmem shared mempolicy; whereas shmem_alloc_page() went
on expecting alloc_page_vma() to drop the refcount it had acquired.
This deserves a rework: but for now fix the leak in shmem_alloc_page().
Hugh: shmem_swapin() did not need a fix, but surely it's clearer to use
the same refcounting there as in shmem_alloc_page(), delete its onstack
mempolicy, and the strange mpol_cond_copy() and __mpol_cond_copy() -
those were invented to let swapin_readahead() make an unknown number of
calls to alloc_pages_vma() with one mempolicy; but since 00442ad04a,
alloc_pages_vma() has kept refcount in balance, so now no problem.
Reported-and-tested-by: Tommi Rantala <tt.rantala@gmail.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Under a particular load on one machine, I have hit shmem_evict_inode()'s
BUG_ON(inode->i_blocks), enough times to narrow it down to a particular
race between swapout and eviction.
It comes from the "if (freed > 0)" asymmetry in shmem_recalc_inode(),
and the lack of coherent locking between mapping's nrpages and shmem's
swapped count. There's a window in shmem_writepage(), between lowering
nrpages in shmem_delete_from_page_cache() and then raising swapped
count, when the freed count appears to be +1 when it should be 0, and
then the asymmetry stops it from being corrected with -1 before hitting
the BUG.
One answer is coherent locking: using tree_lock throughout, without
info->lock; reasonable, but the raw_spin_lock in percpu_counter_add() on
used_blocks makes that messier than expected. Another answer may be a
further effort to eliminate the weird shmem_recalc_inode() altogether,
but previous attempts at that failed.
So far undecided, but for now change the BUG_ON to WARN_ON: in usual
circumstances it remains a useful consistency check.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fuzzing with trinity hit the "impossible" VM_BUG_ON(error) (which Fedora
has converted to WARNING) in shmem_getpage_gfp():
WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70()
Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ #49
Call Trace:
warn_slowpath_common+0x7f/0xc0
warn_slowpath_null+0x1a/0x20
shmem_getpage_gfp+0xa5c/0xa70
shmem_fault+0x4f/0xa0
__do_fault+0x71/0x5c0
handle_pte_fault+0x97/0xae0
handle_mm_fault+0x289/0x350
__do_page_fault+0x18e/0x530
do_page_fault+0x2b/0x50
page_fault+0x28/0x30
tracesys+0xe1/0xe6
Thanks to Johannes for pointing to truncation: free_swap_and_cache()
only does a trylock on the page, so the page lock we've held since
before confirming swap is not enough to protect against truncation.
What cleanup is needed in this case? Just delete_from_swap_cache(),
which takes care of the memcg uncharge.
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Dave Jones <davej@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fuzzing with trinity oopsed on the 1st instruction of shmem_fh_to_dentry(),
u64 inum = fid->raw[2];
which is unhelpfully reported as at the end of shmem_alloc_inode():
BUG: unable to handle kernel paging request at ffff880061cd3000
IP: [<ffffffff812190d0>] shmem_alloc_inode+0x40/0x40
Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Call Trace:
[<ffffffff81488649>] ? exportfs_decode_fh+0x79/0x2d0
[<ffffffff812d77c3>] do_handle_open+0x163/0x2c0
[<ffffffff812d792c>] sys_open_by_handle_at+0xc/0x10
[<ffffffff83a5f3f8>] tracesys+0xe1/0xe6
Right, tmpfs is being stupid to access fid->raw[2] before validating that
fh_len includes it: the buffer kmalloc'ed by do_sys_name_to_handle() may
fall at the end of a page, and the next page not be present.
But some other filesystems (ceph, gfs2, isofs, reiserfs, xfs) are being
careless about fh_len too, in fh_to_dentry() and/or fh_to_parent(), and
could oops in the same way: add the missing fh_len checks to those.
Reported-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Sage Weil <sage@inktank.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Move actual pte filling for non-linear file mappings into the new special
vma operation: ->remap_pages().
Filesystems must implement this method to get non-linear mapping support,
if it uses filemap_fault() then generic_file_remap_pages() can be used.
Now device drivers can implement this method and obtain nonlinear vma support.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com> #arch/tile
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Venkatesh Pallipadi <venki@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Extract in-memory xattr APIs from tmpfs. Will be used by cgroup.
$ size vmlinux.o
text data bss dec hex filename
4658782 880729 5195032 10734543 a3cbcf vmlinux.o
$ size vmlinux.o
text data bss dec hex filename
4658957 880729 5195032 10734718 a3cc7e vmlinux.o
v7:
- checkpatch warnings fixed
- Implement the changes requested by Hugh Dickins:
- make simple_xattrs_init and simple_xattrs_free inline
- get rid of locking and list reinitialization in simple_xattrs_free,
they're not needed
v6:
- no changes
v5:
- no changes
v4:
- move simple_xattrs_free() to fs/xattr.c
v3:
- in kmem_xattrs_free(), reinitialize the list
- use simple_xattr_* prefix
- introduce simple_xattr_add() to prevent direct list usage
Original-patch-by: Li Zefan <lizefan@huawei.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lennart Poettering <lpoetter@redhat.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
When tmpfs has the interleave memory policy, it always starts allocating
for each file from node 0 at offset 0. When there are many small files,
the lower nodes fill up disproportionately.
This patch spreads out node usage by starting files at nodes other than 0,
by using the inode number to bias the starting node for interleave.
Signed-off-by: Nathan Zimmer <nzimmer@sgi.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
boolean "does it have to be exclusive?" flag is passed instead;
Local filesystem should just ignore it - the object is guaranteed
not to be there yet.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
shmem_add_to_page_cache() has three callsites, but only one of them wants
the radix_tree_preload() (an exceptional entry guarantees that the radix
tree node is present in the other cases), and only that site can achieve
mem_cgroup_uncharge_cache_page() (PageSwapCache makes it a no-op in the
other cases). We did it this way originally to reflect
add_to_page_cache_locked(); but it's confusing now, so move the radix_tree
preloading and mem_cgroup uncharging to that one caller.
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When adding the page_private checks before calling shmem_replace_page(), I
did realize that there is a further race, but thought it too unlikely to
need a hurried fix.
But independently I've been chasing why a mem cgroup's memory.stat
sometimes shows negative rss after all tasks have gone: I expected it to
be a stats gathering bug, but actually it's shmem swapping's fault.
It's an old surprise, that when you lock_page(lookup_swap_cache(swap)),
the page may have been removed from swapcache before getting the lock; or
it may have been freed and reused and be back in swapcache; and it can
even be using the same swap location as before (page_private same).
The swapoff case is already secure against this (swap cannot be reused
until the whole area has been swapped off, and a new swapped on); and
shmem_getpage_gfp() is protected by shmem_add_to_page_cache()'s check for
the expected radix_tree entry - but a little too late.
By that time, we might have already decided to shmem_replace_page(): I
don't know of a problem from that, but I'd feel more at ease not to do so
spuriously. And we have already done mem_cgroup_cache_charge(), on
perhaps the wrong mem cgroup: and this charge is not then undone on the
error path, because PageSwapCache ends up preventing that.
It's this last case which causes the occasional negative rss in
memory.stat: the page is charged here as cache, but (sometimes) found to
be anon when eventually it's uncharged - and in between, it's an
undeserved charge on the wrong memcg.
Fix this by adding an earlier check on the radix_tree entry: it's
inelegant to descend the tree twice, but swapping is not the fast path,
and a better solution would need a pair (try+commit) of memcg calls, and a
rework of shmem_replace_page() to keep out of the swapcache.
We can use the added shmem_confirm_swap() function to replace the
find_get_page+page_cache_release we were already doing on the error path.
And add a comment on that -EEXIST: it seems a peculiar errno to be using,
but originates from its use in radix_tree_insert().
[It can be surprising to see positive rss left in a memcg's memory.stat
after all tasks have gone, since it is supposed to count anonymous but not
shmem. Aside from sharing anon pages via fork with a task in some other
memcg, it often happens after swapping: because a swap page can't be freed
while under writeback, nor while locked. So it's not an error, and these
residual pages are easily freed once pressure demands.]
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Revert 4fb5ef089b ("tmpfs: support SEEK_DATA and SEEK_HOLE"). I believe
it's correct, and it's been nice to have from rc1 to rc6; but as the
original commit said:
I don't know who actually uses SEEK_DATA or SEEK_HOLE, and whether it
would be of any use to them on tmpfs. This code adds 92 lines and 752
bytes on x86_64 - is that bloat or worthwhile?
Nobody asked for it, so I conclude that it's bloat: let's revert tmpfs to
the dumb generic support for v3.5. We can always reinstate it later if
useful, and anyone needing it in a hurry can just get it out of git.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Josef Bacik <josef@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Andreas Dilger <adilger@dilger.ca>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Marco Stornelli <marco.stornelli@gmail.com>
Cc: Jeff liu <jeff.liu@oracle.com>
Cc: Chris Mason <chris.mason@fusionio.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull block bits from Jens Axboe:
"As vacation is coming up, thought I'd better get rid of my pending
changes in my for-linus branch for this iteration. It contains:
- Two patches for mtip32xx. Killing a non-compliant sysfs interface
and moving it to debugfs, where it belongs.
- A few patches from Asias. Two legit bug fixes, and one killing an
interface that is no longer in use.
- A patch from Jan, making the annoying partition ioctl warning a bit
less annoying, by restricting it to !CAP_SYS_RAWIO only.
- Three bug fixes for drbd from Lars Ellenberg.
- A fix for an old regression for umem, it hasn't really worked since
the plugging scheme was changed in 3.0.
- A few fixes from Tejun.
- A splice fix from Eric Dumazet, fixing an issue with pipe
resizing."
* 'for-linus' of git://git.kernel.dk/linux-block:
scsi: Silence unnecessary warnings about ioctl to partition
block: Drop dead function blk_abort_queue()
block: Mitigate lock unbalance caused by lock switching
block: Avoid missed wakeup in request waitqueue
umem: fix up unplugging
splice: fix racy pipe->buffers uses
drbd: fix null pointer dereference with on-congestion policy when diskless
drbd: fix list corruption by failing but already aborted reads
drbd: fix access of unallocated pages and kernel panic
xen/blkfront: Add WARN to deal with misbehaving backends.
blkcg: drop local variable @q from blkg_destroy()
mtip32xx: Create debugfs entries for troubleshooting
mtip32xx: Remove 'registers' and 'flags' from sysfs
blkcg: fix blkg_alloc() failure path
block: blkcg_policy_cfq shouldn't be used if !CONFIG_CFQ_GROUP_IOSCHED
block: fix return value on cfq_init() failure
mtip32xx: Remove version.h header file inclusion
xen/blkback: Copy id field when doing BLKIF_DISCARD.
Dave Jones reported a kernel BUG at mm/slub.c:3474! triggered
by splice_shrink_spd() called from vmsplice_to_pipe()
commit 35f3d14dbb (pipe: add support for shrinking and growing pipes)
added capability to adjust pipe->buffers.
Problem is some paths don't hold pipe mutex and assume pipe->buffers
doesn't change for their duration.
Fix this by adding nr_pages_max field in struct splice_pipe_desc, and
use it in place of pipe->buffers where appropriate.
splice_shrink_spd() loses its struct pipe_inode_info argument.
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Tom Herbert <therbert@google.com>
Cc: stable <stable@vger.kernel.org> # 2.6.35
Tested-by: Dave Jones <davej@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Commit bde05d1ccd ("shmem: replace page if mapping excludes its zone")
is not at all likely to break for anyone, but it was an earlier version
from before review feedback was incorporated. Fix that up now.
* shmem_replace_page must flush_dcache_page after copy_highpage [akpm]
* Expand comment on why shmem_unuse_inode needs page_swapcount [akpm]
* Remove excess of VM_BUG_ONs from shmem_replace_page [wangcong]
* Check page_private matches swap before calling shmem_replace_page [hughd]
* shmem_replace_page allow for unexpected race in radix_tree lookup [hughd]
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Stephane Marchesin <marcheu@chromium.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Dave Airlie <airlied@gmail.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Rob Clark <rob.clark@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull vfs changes from Al Viro.
"A lot of misc stuff. The obvious groups:
* Miklos' atomic_open series; kills the damn abuse of
->d_revalidate() by NFS, which was the major stumbling block for
all work in that area.
* ripping security_file_mmap() and dealing with deadlocks in the
area; sanitizing the neighborhood of vm_mmap()/vm_munmap() in
general.
* ->encode_fh() switched to saner API; insane fake dentry in
mm/cleancache.c gone.
* assorted annotations in fs (endianness, __user)
* parts of Artem's ->s_dirty work (jff2 and reiserfs parts)
* ->update_time() work from Josef.
* other bits and pieces all over the place.
Normally it would've been in two or three pull requests, but
signal.git stuff had eaten a lot of time during this cycle ;-/"
Fix up trivial conflicts in Documentation/filesystems/vfs.txt (the
'truncate_range' inode method was removed by the VM changes, the VFS
update adds an 'update_time()' method), and in fs/btrfs/ulist.[ch] (due
to sparse fix added twice, with other changes nearby).
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (95 commits)
nfs: don't open in ->d_revalidate
vfs: retry last component if opening stale dentry
vfs: nameidata_to_filp(): don't throw away file on error
vfs: nameidata_to_filp(): inline __dentry_open()
vfs: do_dentry_open(): don't put filp
vfs: split __dentry_open()
vfs: do_last() common post lookup
vfs: do_last(): add audit_inode before open
vfs: do_last(): only return EISDIR for O_CREAT
vfs: do_last(): check LOOKUP_DIRECTORY
vfs: do_last(): make ENOENT exit RCU safe
vfs: make follow_link check RCU safe
vfs: do_last(): use inode variable
vfs: do_last(): inline walk_component()
vfs: do_last(): make exit RCU safe
vfs: split do_lookup()
Btrfs: move over to use ->update_time
fs: introduce inode operation ->update_time
reiserfs: get rid of resierfs_sync_super
reiserfs: mark the superblock as dirty a bit later
...