Commit Graph

446338 Commits

Author SHA1 Message Date
Jan Kara e1f8b72142 bdi: Fix oops in wb_workfn()
commit b8b784958eccbf8f51ebeee65282ca3fd59ea391 upstream.

Syzbot has reported that it can hit a NULL pointer dereference in
wb_workfn() due to wb->bdi->dev being NULL. This indicates that
wb_workfn() was called for an already unregistered bdi which should not
happen as wb_shutdown() called from bdi_unregister() should make sure
all pending writeback works are completed before bdi is unregistered.
Except that wb_workfn() itself can requeue the work with:

	mod_delayed_work(bdi_wq, &wb->dwork, 0);

and if this happens while wb_shutdown() is waiting in:

	flush_delayed_work(&wb->dwork);

the dwork can get executed after wb_shutdown() has finished and
bdi_unregister() has cleared wb->bdi->dev.

Make wb_workfn() use wakeup_wb() for requeueing the work which takes all
the necessary precautions against racing with bdi unregistration.

CC: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
CC: Tejun Heo <tj@kernel.org>
Fixes: 839a8e8660
Reported-by: syzbot <syzbot+9873874c735f2892e7e9@syzkaller.appspotmail.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
[bwh: Backported to 3.16:
 - Use bdi_wakeup_thread() instead of wb_wakeup()
 - Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:52:12 +02:00
Linus Torvalds 444fda6e37 mmap: relax file size limit for regular files
commit 423913ad4ae5b3e8fb8983f70969fb522261ba26 upstream.

Commit be83bbf80682 ("mmap: introduce sane default mmap limits") was
introduced to catch problems in various ad-hoc character device drivers
doing mmap and getting the size limits wrong.  In the process, it used
"known good" limits for the normal cases of mapping regular files and
block device drivers.

It turns out that the "s_maxbytes" limit was less "known good" than I
thought.  In particular, /proc doesn't set it, but exposes one regular
file to mmap: /proc/vmcore.  As a result, that file got limited to the
default MAX_INT s_maxbytes value.

This went unnoticed for a while, because apparently the only thing that
needs it is the s390 kernel zfcpdump, but there might be other tools
that use this too.

Vasily suggested just changing s_maxbytes for all of /proc, which isn't
wrong, but makes me nervous at this stage.  So instead, just make the
new mmap limit always be MAX_LFS_FILESIZE for regular files, which won't
affect anything else.  It wasn't the regular file case I was worried
about.

I'd really prefer for maxsize to have been per-inode, but that is not
how things are today.

Fixes: be83bbf80682 ("mmap: introduce sane default mmap limits")
Reported-by: Vasily Gorbik <gor@linux.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:52:12 +02:00
Dave Airlie 300bead6dd drm: set FMODE_UNSIGNED_OFFSET for drm files
commit 76ef6b28ea4f81c3d511866a9b31392caa833126 upstream.

Since we have the ttm and gem vma managers using a subset
of the file address space for objects, and these start at
0x100000000 they will overflow the new mmap checks.

I've checked all the mmap routines I could see for any
bad behaviour but overall most people use GEM/TTM VMA
managers even the legacy drivers have a hashtable.

Reported-and-Tested-by: Arthur Marsh (amarsh04 on #radeon)
Fixes: be83bbf8068 (mmap: introduce sane default mmap limits)
Signed-off-by: Dave Airlie <airlied@redhat.com>
[bwh: Backported to 3.16: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:52:11 +02:00
Linus Torvalds c5e0b7057d mmap: introduce sane default mmap limits
commit be83bbf806822b1b89e0a0f23cd87cddc409e429 upstream.

The internal VM "mmap()" interfaces are based on the mmap target doing
everything using page indexes rather than byte offsets, because
traditionally (ie 32-bit) we had the situation that the byte offset
didn't fit in a register.  So while the mmap virtual address was limited
by the word size of the architecture, the backing store was not.

So we're basically passing "pgoff" around as a page index, in order to
be able to describe backing store locations that are much bigger than
the word size (think files larger than 4GB etc).

But while this all makes a ton of sense conceptually, we've been dogged
by various drivers that don't really understand this, and internally
work with byte offsets, and then try to work with the page index by
turning it into a byte offset with "pgoff << PAGE_SHIFT".

Which obviously can overflow.

Adding the size of the mapping to it to get the byte offset of the end
of the backing store just exacerbates the problem, and if you then use
this overflow-prone value to check various limits of your device driver
mmap capability, you're just setting yourself up for problems.

The correct thing for drivers to do is to do their limit math in page
indices, the way the interface is designed.  Because the generic mmap
code _does_ test that the index doesn't overflow, since that's what the
mmap code really cares about.

HOWEVER.

Finding and fixing various random drivers is a sisyphean task, so let's
just see if we can just make the core mmap() code do the limiting for
us.  Realistically, the only "big" backing stores we need to care about
are regular files and block devices, both of which are known to do this
properly, and which have nice well-defined limits for how much data they
can access.

So let's special-case just those two known cases, and then limit other
random mmap users to a backing store that still fits in "unsigned long".
Realistically, that's not much of a limit at all on 64-bit, and on
32-bit architectures the only worry might be the GPU drivers, which can
have big physical address spaces.

To make it possible for drivers like that to say that they are 64-bit
clean, this patch does repurpose the "FMODE_UNSIGNED_OFFSET" bit in the
file flags to allow drivers to mark their file descriptors as safe in
the full 64-bit mmap address space.

[ The timing for doing this is less than optimal, and this should really
  go in a merge window. But realistically, this needs wide testing more
  than it needs anything else, and being main-line is the only way to do
  that.

  So the earlier the better, even if it's outside the proper development
  cycle        - Linus ]

Change-Id: I6886a262e0a7156aa1d99af301f47bbce8301800
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Dave Airlie <airlied@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:52:11 +02:00
Oleg Nesterov a078112827 mm: do_mmap_pgoff: cleanup the usage of file_inode()
Simple cleanup.  Move "struct inode *inode" variable into "if (file)"
block to simplify the code and avoid the unnecessary check.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Colin Cross <ccross@android.com>
Cc: David Rientjes <rientjes@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-27 21:52:11 +02:00
Gustavo A. R. Silva 18b3f2bd17 kernel/sys.c: fix potential Spectre v1 issue
commit 23d6aef74da86a33fa6bb75f79565e0a16ee97c2 upstream.

`resource' can be controlled by user-space, hence leading to a potential
exploitation of the Spectre variant 1 vulnerability.

This issue was detected with the help of Smatch:

  kernel/sys.c:1474 __do_compat_sys_old_getrlimit() warn: potential spectre issue 'get_current()->signal->rlim' (local cap)
  kernel/sys.c:1455 __do_sys_old_getrlimit() warn: potential spectre issue 'get_current()->signal->rlim' (local cap)

Fix this by sanitizing *resource* before using it to index
current->signal->rlim

Notice that given that speculation windows are large, the policy is to
kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].

[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2

Link: http://lkml.kernel.org/r/20180515030038.GA11822@embeddedor.com
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.16:
 - Drop changes to compat implementation, which is a wrapper for the
   regular implementation here
 - Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:52:10 +02:00
Davidlohr Bueso b567a01552 ipc/shm: fix shmat() nil address after round-down when remapping
commit 8f89c007b6dec16a1793cb88de88fcc02117bbbc upstream.

shmat()'s SHM_REMAP option forbids passing a nil address for; this is in
fact the very first thing we check for.  Andrea reported that for
SHM_RND|SHM_REMAP cases we can end up bypassing the initial addr check,
but we need to check again if the address was rounded down to nil.  As
of this patch, such cases will return -EINVAL.

Link: http://lkml.kernel.org/r/20180503204934.kk63josdu6u53fbd@linux-n805
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:52:10 +02:00
Davidlohr Bueso d7b7bd3aaf Revert "ipc/shm: Fix shmat mmap nil-page protection"
commit a73ab244f0dad8fffb3291b905f73e2d3eaa7c00 upstream.

Patch series "ipc/shm: shmat() fixes around nil-page".

These patches fix two issues reported[1] a while back by Joe and Andrea
around how shmat(2) behaves with nil-page.

The first reverts a commit that it was incorrectly thought that mapping
nil-page (address=0) was a no no with MAP_FIXED.  This is not the case,
with the exception of SHM_REMAP; which is address in the second patch.

I chose two patches because it is easier to backport and it explicitly
reverts bogus behaviour.  Both patches ought to be in -stable and ltp
testcases need updated (the added testcase around the cve can be
modified to just test for SHM_RND|SHM_REMAP).

[1] lkml.kernel.org/r/20180430172152.nfa564pvgpk3ut7p@linux-n805

This patch (of 2):

Commit 95e91b831f87 ("ipc/shm: Fix shmat mmap nil-page protection")
worked on the idea that we should not be mapping as root addr=0 and
MAP_FIXED.  However, it was reported that this scenario is in fact
valid, thus making the patch both bogus and breaks userspace as well.

For example X11's libint10.so relies on shmat(1, SHM_RND) for lowmem
initialization[1].

[1] https://cgit.freedesktop.org/xorg/xserver/tree/hw/xfree86/os-support/linux/int10/linux.c#n347
Link: http://lkml.kernel.org/r/20180503203243.15045-2-dave@stgolabs.net
Fixes: 95e91b831f87 ("ipc/shm: Fix shmat mmap nil-page protection")
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reported-by: Joe Lawrence <joe.lawrence@redhat.com>
Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:52:09 +02:00
Paolo Abeni 6f265f09de netfilter: ebtables: handle string from userspace with care
commit 94c752f99954797da583a84c4907ff19e92550a4 upstream.

strlcpy() can't be safely used on a user-space provided string,
as it can try to read beyond the buffer's end, if the latter is
not NULL terminated.

Leveraging the above, syzbot has been able to trigger the following
splat:

BUG: KASAN: stack-out-of-bounds in strlcpy include/linux/string.h:300
[inline]
BUG: KASAN: stack-out-of-bounds in compat_mtw_from_user
net/bridge/netfilter/ebtables.c:1957 [inline]
BUG: KASAN: stack-out-of-bounds in ebt_size_mwt
net/bridge/netfilter/ebtables.c:2059 [inline]
BUG: KASAN: stack-out-of-bounds in size_entry_mwt
net/bridge/netfilter/ebtables.c:2155 [inline]
BUG: KASAN: stack-out-of-bounds in compat_copy_entries+0x96c/0x14a0
net/bridge/netfilter/ebtables.c:2194
Write of size 33 at addr ffff8801b0abf888 by task syz-executor0/4504

CPU: 0 PID: 4504 Comm: syz-executor0 Not tainted 4.17.0-rc2+ #40
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1b9/0x294 lib/dump_stack.c:113
  print_address_description+0x6c/0x20b mm/kasan/report.c:256
  kasan_report_error mm/kasan/report.c:354 [inline]
  kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
  check_memory_region_inline mm/kasan/kasan.c:260 [inline]
  check_memory_region+0x13e/0x1b0 mm/kasan/kasan.c:267
  memcpy+0x37/0x50 mm/kasan/kasan.c:303
  strlcpy include/linux/string.h:300 [inline]
  compat_mtw_from_user net/bridge/netfilter/ebtables.c:1957 [inline]
  ebt_size_mwt net/bridge/netfilter/ebtables.c:2059 [inline]
  size_entry_mwt net/bridge/netfilter/ebtables.c:2155 [inline]
  compat_copy_entries+0x96c/0x14a0 net/bridge/netfilter/ebtables.c:2194
  compat_do_replace+0x483/0x900 net/bridge/netfilter/ebtables.c:2285
  compat_do_ebt_set_ctl+0x2ac/0x324 net/bridge/netfilter/ebtables.c:2367
  compat_nf_sockopt net/netfilter/nf_sockopt.c:144 [inline]
  compat_nf_setsockopt+0x9b/0x140 net/netfilter/nf_sockopt.c:156
  compat_ip_setsockopt+0xff/0x140 net/ipv4/ip_sockglue.c:1279
  inet_csk_compat_setsockopt+0x97/0x120 net/ipv4/inet_connection_sock.c:1041
  compat_tcp_setsockopt+0x49/0x80 net/ipv4/tcp.c:2901
  compat_sock_common_setsockopt+0xb4/0x150 net/core/sock.c:3050
  __compat_sys_setsockopt+0x1ab/0x7c0 net/compat.c:403
  __do_compat_sys_setsockopt net/compat.c:416 [inline]
  __se_compat_sys_setsockopt net/compat.c:413 [inline]
  __ia32_compat_sys_setsockopt+0xbd/0x150 net/compat.c:413
  do_syscall_32_irqs_on arch/x86/entry/common.c:323 [inline]
  do_fast_syscall_32+0x345/0xf9b arch/x86/entry/common.c:394
  entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139
RIP: 0023:0xf7fb3cb9
RSP: 002b:00000000fff0c26c EFLAGS: 00000282 ORIG_RAX: 000000000000016e
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000000000
RDX: 0000000000000080 RSI: 0000000020000300 RDI: 00000000000005f4
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000

The buggy address belongs to the page:
page:ffffea0006c2afc0 count:0 mapcount:0 mapping:0000000000000000 index:0x0
flags: 0x2fffc0000000000()
raw: 02fffc0000000000 0000000000000000 0000000000000000 00000000ffffffff
raw: 0000000000000000 ffffea0006c20101 0000000000000000 0000000000000000
page dumped because: kasan: bad access detected

Fix the issue replacing the unsafe function with strscpy() and
taking care of possible errors.

Fixes: 81e675c227 ("netfilter: ebtables: add CONFIG_COMPAT support")
Reported-and-tested-by: syzbot+4e42a04e0bc33cb6c087@syzkaller.appspotmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:52:09 +02:00
Chris Metcalf 1818613828 string: provide strscpy()
commit 30035e45753b708e7d47a98398500ca005e02b86 upstream.

The strscpy() API is intended to be used instead of strlcpy(),
and instead of most uses of strncpy().

- Unlike strlcpy(), it doesn't read from memory beyond (src + size).

- Unlike strlcpy() or strncpy(), the API provides an easy way to check
  for destination buffer overflow: an -E2BIG error return value.

- The provided implementation is robust in the face of the source
  buffer being asynchronously changed during the copy, unlike the
  current implementation of strlcpy().

- Unlike strncpy(), the destination buffer will be NUL-terminated
  if the string in the source buffer is too long.

- Also unlike strncpy(), the destination buffer will not be updated
  beyond the NUL termination, avoiding strncpy's behavior of zeroing
  the entire tail end of the destination buffer.  (A memset() after
  the strscpy() can be used if this behavior is desired.)

- The implementation should be reasonably performant on all
  platforms since it uses the asm/word-at-a-time.h API rather than
  simple byte copy.  Kernel-to-kernel string copy is not considered
  to be performance critical in any case.

Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:52:09 +02:00
tinlin 3db795696d qcacld-2.0: Fix buffer overwrite in csrRoamCheckForLinkStatusChange
Propagation from cld3.0 to cld2.0

Fix possible buffer overwrite in csrRoamCheckForLinkStatusChange
function.

Change-Id: Icf4a39e0a2a291f1c084353985aa7952e3c8e136
CRs-Fixed: 2276642
2019-07-27 21:52:08 +02:00
gaolez bef634bd07 qcacld-2.0: Fix OOB write in wma_unified_debug_print_event_handler
propagation from qcacld-3.0 to qcacld-2.0

The routine wma_unified_debug_print_event_handler logs the data from debug
print event handler. The param event data from firmware is copied to a
destination buffer .If the maximum size of the data exceeds or equals
BIG_ENDIAN_MAX_DEBUG_BUF for big endian hosts then possible OOB write will
occur in wma_unified_debug_print_event_handler. For other hosts, OOB read
could occur if datalen exceeds maximum firmware message size
WMI_SVC_MAX_SIZE.

Add check to validate datalen doesnot exceed the maximum firmware msg size
WMI_SVC_MAX_SIZE. Return failure if it exceeds.
Add check to ensure datalen doesnot exceed or equal the maximum buffer
length value for big endian hosts BIG_ENDIAN_MAX_DEBUG_BUF.
Add null termination at the end of the data recieved from the firmware.

Change-Id: Ibb662cb8e17ef8be8b7591308c422a78b71e331a
CRs-Fixed: 2273985
2019-07-27 21:52:08 +02:00
Dundi Raviteja 8ff2fd11fc qcacld-2.0: OOB access may occur due to total numChannels exceeds max value
Out of Buffer access may occur in wmi_get_buf_extscan_start_cmd()
function if user provided inputs are different for below parameters
which are assigned in hdd_extscan_start_fill_bucket_channel_spec()
function

1. QCA_WLAN_VENDOR_ATTR_EXTSCAN_BUCKET_SPEC_NUM_CHANNEL_SPECS
2. QCA_WLAN_VENDOR_ATTR_EXTSCAN_CHANNEL_SPEC

To address this issue return failure status if numChannels is not
equal to the total number of channel entries.

Change-Id: I60d74161dc3752bd7f609af3910d7c86a99488ec
CRs-Fixed: 2294049
2019-07-27 21:52:08 +02:00
Martijn Coenen f99ddf91e8 android: binder: add padding to binder_fd_array_object.
binder_fd_array_object starts with a 4-byte header,
followed by a few fields that are 8 bytes when
ANDROID_BINDER_IPC_32BIT=N.

This can cause alignment issues in a 64-bit kernel
with a 32-bit userspace, as on x86_32 an 8-byte primitive
may be aligned to a 4-byte address. Pad with a __u32
to fix this.

Change-Id: I4374ed2cc3ccd3c6a1474cb7209b53ebfd91077b
Signed-off-by: Martijn Coenen <maco@android.com>
2019-07-27 21:52:07 +02:00
Eric Biggers f21e4737f8 binder: check for binder_thread allocation failure in binder_poll()
commit f88982679f54f75daa5b8eff3da72508f1e7422f upstream.

If the kzalloc() in binder_get_thread() fails, binder_poll()
dereferences the resulting NULL pointer.

Fix it by returning POLLERR if the memory allocation failed.

This bug was found by syzkaller using fault injection.

Change-Id: I1535910349c8266c0d785b0f8d2085d290ea6e92
Reported-by: syzbot <syzkaller@googlegroups.com>
Fixes: 457b9a6f09 ("Staging: android: add binder driver")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.16:
 - Drop the binder global lock before returning
 - Adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:52:07 +02:00
Ganesh Mahendran 62840f247b android: binder: use VM_ALLOC to get vm area
commit aac6830ec1cb681544212838911cdc57f2638216 upstream.

VM_IOREMAP is used to access hardware through a mechanism called
I/O mapped memory. Android binder is a IPC machanism which will
not access I/O memory.

And VM_IOREMAP has alignment requiement which may not needed in
binder.
    __get_vm_area_node()
    {
    ...
        if (flags & VM_IOREMAP)
            align = 1ul << clamp_t(int, fls_long(size),
               PAGE_SHIFT, IOREMAP_MAX_ORDER);
    ...
    }

This patch will save some kernel vm area, especially for 32bit os.

In 32bit OS, kernel vm area is only 240MB. We may got below
error when launching a app:

<3>[ 4482.440053] binder_alloc: binder_alloc_mmap_handler: 15728 8ce67000-8cf65000 get_vm_area failed -12
<3>[ 4483.218817] binder_alloc: binder_alloc_mmap_handler: 15745 8ce67000-8cf65000 get_vm_area failed -12

Signed-off-by: Ganesh Mahendran <opensource.ganesh@gmail.com>
Acked-by: Martijn Coenen <maco@android.com>
Acked-by: Todd Kjos <tkjos@google.com>

----
V3: update comments
V2: update comments
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.16: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

Change-Id: Ibd8aedeeabbfa643e8a7a20e365fe74992fbbd0d
2019-07-27 21:52:07 +02:00
Peter Zijlstra e1989d554d sched/autogroup: Fix possible Spectre-v1 indexing for sched_prio_to_weight[]
commit 354d7793070611b4df5a79fbb0f12752d0ed0cc5 upstream.

> kernel/sched/autogroup.c:230 proc_sched_autogroup_set_nice() warn: potential spectre issue 'sched_prio_to_weight'

Userspace controls @nice, sanitize the array index.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.16: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:52:06 +02:00
Mike Galbraith 0025b68cb9 sched/autogroup: Fix 64-bit kernel nice level adjustment
commit 83929cce95251cc77e5659bf493bd424ae0e7a67 upstream.

Michael Kerrisk reported:

> Regarding the previous paragraph...  My tests indicate
> that writing *any* value to the autogroup [nice priority level]
> file causes the task group to get a lower priority.

Because autogroup didn't call the then meaningless scale_load()...

Autogroup nice level adjustment has been broken ever since load
resolution was increased for 64-bit kernels.  Use scale_load() to
scale group weight.

Michael Kerrisk tested this patch to fix the problem:

> Applied and tested against 4.9-rc6 on an Intel u7 (4 cores).
> Test setup:
>
> Terminal window 1: running 40 CPU burner jobs
> Terminal window 2: running 40 CPU burner jobs
> Terminal window 1: running  1 CPU burner job
>
> Demonstrated that:
> * Writing "0" to the autogroup file for TW1 now causes no change
>   to the rate at which the process on the terminal consume CPU.
> * Writing -20 to the autogroup file for TW1 caused those processes
>   to get the lion's share of CPU while TW2 TW3 get a tiny amount.
> * Writing -20 to the autogroup files for TW1 and TW3 allowed the
>   process on TW3 to get as much CPU as it was getting as when
>   the autogroup nice values for both terminals were 0.

Reported-by: Michael Kerrisk <mtk.manpages@gmail.com>
Tested-by: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-man <linux-man@vger.kernel.org>
Link: http://lkml.kernel.org/r/1479897217.4306.6.camel@gmx.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.16: s/sched_prio_to_weight/prio_to_weight/]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:52:06 +02:00
Peter Zijlstra 9f2bb2d35f sched/core: Fix possible Spectre-v1 indexing for sched_prio_to_weight[]
commit 7281c8dec8a87685cb54d503d8cceef5a0fc2fdd upstream.

> kernel/sched/core.c:6921 cpu_weight_nice_write_s64() warn: potential spectre issue 'sched_prio_to_weight'

Userspace controls @nice, so sanitize the value before using it to
index an array.

Change-Id: I0e59bc7ecbaf2367e59391c8955e2c246aeeb946
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.16: Vulnerable array lookup is in set_load_weight()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:52:06 +02:00
Peter Zijlstra 8fec935545 clocksource: Initialize cs->wd_list
commit 5b9e886a4af97574ca3ce1147f35545da0e7afc7 upstream.

A number of places relies on list_empty(&cs->wd_list), however the
list_head does not get initialized. Do so upon registration, such that
thereafter it is possible to rely on list_empty() correctly reflecting
the list membership status.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Diego Viola <diego.viola@gmail.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: len.brown@intel.com
Cc: rjw@rjwysocki.net
Cc: rui.zhang@intel.com
Link: https://lkml.kernel.org/r/20180430100344.472662715@infradead.org
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:52:05 +02:00
Takashi Iwai 59ea496a22 ALSA: control: Hardening for potential Spectre v1
commit 088e861edffb84879cf0c0d1b02eda078c3a0ffe upstream.

As recently Smatch suggested, a few places in ALSA control core codes
may expand the array directly from the user-space value with
speculation:

  sound/core/control.c:1003 snd_ctl_elem_lock() warn: potential spectre issue 'kctl->vd'
  sound/core/control.c:1031 snd_ctl_elem_unlock() warn: potential spectre issue 'kctl->vd'
  sound/core/control.c:844 snd_ctl_elem_info() warn: potential spectre issue 'kctl->vd'
  sound/core/control.c:891 snd_ctl_elem_read() warn: potential spectre issue 'kctl->vd'
  sound/core/control.c:939 snd_ctl_elem_write() warn: potential spectre issue 'kctl->vd'

Although all these seem doing only the first load without further
reference, we may want to stay in a safer side, so hardening with
array_index_nospec() would still make sense.

In this patch, we put array_index_nospec() to the common
snd_ctl_get_ioff*() helpers instead of each caller.  These helpers are
also referred from some drivers, too, and basically all usages are to
calculate the array index from the user-space value, hence it's better
to cover there.

BugLink: https://marc.info/?l=linux-kernel&m=152411496503418&w=2
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:52:05 +02:00
Jann Horn 8b3b0cc293 tcp: don't read out-of-bounds opsize
commit 7e5a206ab686f098367b61aca989f5cdfa8114a3 upstream.

The old code reads the "opsize" variable from out-of-bounds memory (first
byte behind the segment) if a broken TCP segment ends directly after an
opcode that is neither EOL nor NOP.

The result of the read isn't used for anything, so the worst thing that
could theoretically happen is a pagefault; and since the physmap is usually
mostly contiguous, even that seems pretty unlikely.

The following C reproducer triggers the uninitialized read - however, you
can't actually see anything happen unless you put something like a
pr_warn() in tcp_parse_md5sig_option() to print the opsize.

====================================
#define _GNU_SOURCE
#include <arpa/inet.h>
#include <stdlib.h>
#include <errno.h>
#include <stdarg.h>
#include <net/if.h>
#include <linux/if.h>
#include <linux/ip.h>
#include <linux/tcp.h>
#include <linux/in.h>
#include <linux/if_tun.h>
#include <err.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/ioctl.h>
#include <assert.h>

void systemf(const char *command, ...) {
  char *full_command;
  va_list ap;
  va_start(ap, command);
  if (vasprintf(&full_command, command, ap) == -1)
    err(1, "vasprintf");
  va_end(ap);
  printf("systemf: <<<%s>>>\n", full_command);
  system(full_command);
}

char *devname;

int tun_alloc(char *name) {
  int fd = open("/dev/net/tun", O_RDWR);
  if (fd == -1)
    err(1, "open tun dev");
  static struct ifreq req = { .ifr_flags = IFF_TUN|IFF_NO_PI };
  strcpy(req.ifr_name, name);
  if (ioctl(fd, TUNSETIFF, &req))
    err(1, "TUNSETIFF");
  devname = req.ifr_name;
  printf("device name: %s\n", devname);
  return fd;
}

#define IPADDR(a,b,c,d) (((a)<<0)+((b)<<8)+((c)<<16)+((d)<<24))

void sum_accumulate(unsigned int *sum, void *data, int len) {
  assert((len&2)==0);
  for (int i=0; i<len/2; i++) {
    *sum += ntohs(((unsigned short *)data)[i]);
  }
}

unsigned short sum_final(unsigned int sum) {
  sum = (sum >> 16) + (sum & 0xffff);
  sum = (sum >> 16) + (sum & 0xffff);
  return htons(~sum);
}

void fix_ip_sum(struct iphdr *ip) {
  unsigned int sum = 0;
  sum_accumulate(&sum, ip, sizeof(*ip));
  ip->check = sum_final(sum);
}

void fix_tcp_sum(struct iphdr *ip, struct tcphdr *tcp) {
  unsigned int sum = 0;
  struct {
    unsigned int saddr;
    unsigned int daddr;
    unsigned char pad;
    unsigned char proto_num;
    unsigned short tcp_len;
  } fakehdr = {
    .saddr = ip->saddr,
    .daddr = ip->daddr,
    .proto_num = ip->protocol,
    .tcp_len = htons(ntohs(ip->tot_len) - ip->ihl*4)
  };
  sum_accumulate(&sum, &fakehdr, sizeof(fakehdr));
  sum_accumulate(&sum, tcp, tcp->doff*4);
  tcp->check = sum_final(sum);
}

int main(void) {
  int tun_fd = tun_alloc("inject_dev%d");
  systemf("ip link set %s up", devname);
  systemf("ip addr add 192.168.42.1/24 dev %s", devname);

  struct {
    struct iphdr ip;
    struct tcphdr tcp;
    unsigned char tcp_opts[20];
  } __attribute__((packed)) syn_packet = {
    .ip = {
      .ihl = sizeof(struct iphdr)/4,
      .version = 4,
      .tot_len = htons(sizeof(syn_packet)),
      .ttl = 30,
      .protocol = IPPROTO_TCP,
      /* FIXUP check */
      .saddr = IPADDR(192,168,42,2),
      .daddr = IPADDR(192,168,42,1)
    },
    .tcp = {
      .source = htons(1),
      .dest = htons(1337),
      .seq = 0x12345678,
      .doff = (sizeof(syn_packet.tcp)+sizeof(syn_packet.tcp_opts))/4,
      .syn = 1,
      .window = htons(64),
      .check = 0 /*FIXUP*/
    },
    .tcp_opts = {
      /* INVALID: trailing MD5SIG opcode after NOPs */
      1, 1, 1, 1, 1,
      1, 1, 1, 1, 1,
      1, 1, 1, 1, 1,
      1, 1, 1, 1, 19
    }
  };
  fix_ip_sum(&syn_packet.ip);
  fix_tcp_sum(&syn_packet.ip, &syn_packet.tcp);
  while (1) {
    int write_res = write(tun_fd, &syn_packet, sizeof(syn_packet));
    if (write_res != sizeof(syn_packet))
      err(1, "packet write failed");
  }
}
====================================

Fixes: cfb6eeb4c8 ("[TCP]: MD5 Signature Option (RFC2385) support.")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:52:04 +02:00
Eric Dumazet d312f5176b crypto: af_alg - fix possible uninit-value in alg_bind()
commit a466856e0b7ab269cdf9461886d007e88ff575b0 upstream.

syzbot reported :

BUG: KMSAN: uninit-value in alg_bind+0xe3/0xd90 crypto/af_alg.c:162

We need to check addr_len before dereferencing sa (or uaddr)

Fixes: bb30b8848c85 ("crypto: af_alg - whitelist mask and type")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Stephan Mueller <smueller@chronox.de>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:52:04 +02:00
Huacai Chen 24cb329bdf zboot: fix stack protector in compressed boot phase
commit 7bbaf27d9c83037b6e60a818e57bdbedf6bc15be upstream.

Calling __stack_chk_guard_setup() in decompress_kernel() is too late
that stack checking always fails for decompress_kernel() itself.  So
remove __stack_chk_guard_setup() and initialize __stack_chk_guard before
we call decompress_kernel().

Original code comes from ARM but also used for MIPS and SH, so fix them
together.  If without this fix, compressed booting of these archs will
fail because stack checking is enabled by default (>=4.16).

Link: http://lkml.kernel.org/r/1522226933-29317-1-git-send-email-chenhc@lemote.com
Fixes: 8779657d29c0 ("stackprotector: Introduce CONFIG_CC_STACKPROTECTOR_STRONG")
Signed-off-by: Huacai Chen <chenhc@lemote.com>
Acked-by: James Hogan <jhogan@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Rich Felker <dalias@libc.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.16: Only ARM has this problem]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:52:04 +02:00
Eric Dumazet 2037ffd53c ipv6: sit: better validate user provided tunnel names
commit b95211e066fc3494b7c115060b2297b4ba21f025 upstream.

Use dev_valid_name() to make sure user does not provide illegal
device name.

syzbot caught the following bug :

BUG: KASAN: stack-out-of-bounds in strlcpy include/linux/string.h:300 [inline]
BUG: KASAN: stack-out-of-bounds in ipip6_tunnel_locate+0x63b/0xaa0 net/ipv6/sit.c:254
Write of size 33 at addr ffff8801b64076d8 by task syzkaller932654/4453

CPU: 0 PID: 4453 Comm: syzkaller932654 Not tainted 4.16.0+ #1
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:17 [inline]
 dump_stack+0x1b9/0x29f lib/dump_stack.c:53
 print_address_description+0x6c/0x20b mm/kasan/report.c:256
 kasan_report_error mm/kasan/report.c:354 [inline]
 kasan_report.cold.7+0xac/0x2f5 mm/kasan/report.c:412
 check_memory_region_inline mm/kasan/kasan.c:260 [inline]
 check_memory_region+0x13e/0x1b0 mm/kasan/kasan.c:267
 memcpy+0x37/0x50 mm/kasan/kasan.c:303
 strlcpy include/linux/string.h:300 [inline]
 ipip6_tunnel_locate+0x63b/0xaa0 net/ipv6/sit.c:254
 ipip6_tunnel_ioctl+0xe71/0x241b net/ipv6/sit.c:1221
 dev_ifsioc+0x43e/0xb90 net/core/dev_ioctl.c:334
 dev_ioctl+0x69a/0xcc0 net/core/dev_ioctl.c:525
 sock_ioctl+0x47e/0x680 net/socket.c:1015
 vfs_ioctl fs/ioctl.c:46 [inline]
 file_ioctl fs/ioctl.c:500 [inline]
 do_vfs_ioctl+0x1cf/0x1650 fs/ioctl.c:684
 ksys_ioctl+0xa9/0xd0 fs/ioctl.c:701
 SYSC_ioctl fs/ioctl.c:708 [inline]
 SyS_ioctl+0x24/0x30 fs/ioctl.c:706
 do_syscall_64+0x29e/0x9d0 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x42/0xb7

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:52:03 +02:00
Eric Dumazet 105b0cd42f ip_tunnel: better validate user provided tunnel names
commit 9cb726a212a82c88c98aa9f0037fd04777cd8fe5 upstream.

Use dev_valid_name() to make sure user does not provide illegal
device name.

syzbot caught the following bug :

BUG: KASAN: stack-out-of-bounds in strlcpy include/linux/string.h:300 [inline]
BUG: KASAN: stack-out-of-bounds in __ip_tunnel_create+0xca/0x6b0 net/ipv4/ip_tunnel.c:257
Write of size 20 at addr ffff8801ac79f810 by task syzkaller268107/4482

CPU: 0 PID: 4482 Comm: syzkaller268107 Not tainted 4.16.0+ #1
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:17 [inline]
 dump_stack+0x1b9/0x29f lib/dump_stack.c:53
 print_address_description+0x6c/0x20b mm/kasan/report.c:256
 kasan_report_error mm/kasan/report.c:354 [inline]
 kasan_report.cold.7+0xac/0x2f5 mm/kasan/report.c:412
 check_memory_region_inline mm/kasan/kasan.c:260 [inline]
 check_memory_region+0x13e/0x1b0 mm/kasan/kasan.c:267
 memcpy+0x37/0x50 mm/kasan/kasan.c:303
 strlcpy include/linux/string.h:300 [inline]
 __ip_tunnel_create+0xca/0x6b0 net/ipv4/ip_tunnel.c:257
 ip_tunnel_create net/ipv4/ip_tunnel.c:352 [inline]
 ip_tunnel_ioctl+0x818/0xd40 net/ipv4/ip_tunnel.c:861
 ipip_tunnel_ioctl+0x1c5/0x420 net/ipv4/ipip.c:350
 dev_ifsioc+0x43e/0xb90 net/core/dev_ioctl.c:334
 dev_ioctl+0x69a/0xcc0 net/core/dev_ioctl.c:525
 sock_ioctl+0x47e/0x680 net/socket.c:1015
 vfs_ioctl fs/ioctl.c:46 [inline]
 file_ioctl fs/ioctl.c:500 [inline]
 do_vfs_ioctl+0x1cf/0x1650 fs/ioctl.c:684
 ksys_ioctl+0xa9/0xd0 fs/ioctl.c:701
 SYSC_ioctl fs/ioctl.c:708 [inline]
 SyS_ioctl+0x24/0x30 fs/ioctl.c:706
 do_syscall_64+0x29e/0x9d0 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x42/0xb7

Fixes: c544193214 ("GRE: Refactor GRE tunneling code.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:52:03 +02:00
Theodore Ts'o 7f2a5b0aa0 ext4: force revalidation of directory pointer after seekdir(2)
commit e40ff213898502d299351cc2fe1e350cd186f0d3 upstream.

A malicious user could force the directory pointer to be in an invalid
spot by using seekdir(2).  Use the mechanism we already have to notice
if the directory has changed since the last time we called
ext4_readdir() to force a revalidation of the pointer.

Reported-by: syzbot+1236ce66f79263e8a862@syzkaller.appspotmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
[bwh: Backported to 3.16: open-code inode_peek_iversion()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:52:03 +02:00
Theodore Ts'o 5412a3ef79 ext4: add extra checks to ext4_xattr_block_get()
commit 54dd0e0a1b255f115f8647fc6fb93273251b01b9 upstream.

Add explicit checks in ext4_xattr_block_get() just in case the
e_value_offs and e_value_size fields in the the xattr block are
corrupted in memory after the buffer_verified bit is set on the xattr
block.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
[bwh: Backported to 3.16:
 - Drop change to ext4_xattr_check_entries() which is only needed for the
   xattr-in-inode case
 - Adjust context, indentation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:52:02 +02:00
Eric Biggers 9abd5e1cad ext4: correctly detect when an xattr value has an invalid size
commit d7614cc16146e3f0b4c33e71875c19607602aed5 upstream.

It was possible for an xattr value to have a very large size, which
would then pass validation on 32-bit architectures due to a pointer
wraparound.  Fix this by validating the size in a way which avoids
pointer wraparound.

It was also possible that a value's size would fit in the available
space but its padded size would not.  This would cause an out-of-bounds
memory write in ext4_xattr_set_entry when replacing the xattr value.
For example, if an xattr value of unpadded size 253 bytes went until the
very end of the inode or block, then using setxattr(2) to replace this
xattr's value with 256 bytes would cause a write to the 3 bytes past the
end of the inode or buffer, and the new xattr value would be incorrectly
truncated.  Fix this by requiring that the padded size fit in the
available space rather than the unpadded size.

This patch shouldn't have any noticeable effect on
non-corrupted/non-malicious filesystems.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
[bwh: Backported to 3.16:
 - s/EFSCORRUPTED/EIO/
 - Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:52:02 +02:00
Theodore Ts'o db363c02ac ext4: add bounds checking to ext4_xattr_find_entry()
commit 9496005d6ca4cf8f5ee8f828165a8956872dc59d upstream.

Add some paranoia checks to make sure we don't stray beyond the end of
the valid memory region containing ext4 xattr entries while we are
scanning for a match.

Also rename the function to xattr_find_entry() since it is static and
thus only used in fs/ext4/xattr.c

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
[bwh: Backported to 3.16:
 - Keep passing an explicit size to xattr_find_entry()
 - s/EFSCORRUPTED/EIO/]]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:52:02 +02:00
Herbert Xu c5b5cc96bf crypto: ahash - Fix early termination in hash walk
commit 900a081f6912a8985dc15380ec912752cb66025a upstream.

When we have an unaligned SG list entry where there is no leftover
aligned data, the hash walk code will incorrectly return zero as if
the entire SG list has been processed.

This patch fixes it by moving onto the next page instead.

Reported-by: Eli Cooper <elicooper@gmx.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:52:01 +02:00
Eryu Guan 157e757a30 ext4: protect i_disksize update by i_data_sem in direct write path
commit 73fdad00b208b139cf43f3163fbc0f67e4c6047c upstream.

i_disksize update should be protected by i_data_sem, by either taking
the lock explicitly or by using ext4_update_i_disksize() helper. But the
i_disksize updates in ext4_direct_IO_write() are not protected at all,
which may be racing with i_disksize updates in writeback path in
delalloc buffer write path.

This is found by code inspection, and I didn't hit any i_disksize
corruption due to this bug. Thanks to Jan Kara for catching this bug and
suggesting the fix!

Reported-by: Jan Kara <jack@suse.cz>
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
[bwh: Backported to 3.16: The relevant code is in ext4_ind_direct_IO()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:52:01 +02:00
Theodore Ts'o 2a623323eb ext4: don't update checksum of new initialized bitmaps
commit 044e6e3d74a3d7103a0c8a9305dfd94d64000660 upstream.

When reading the inode or block allocation bitmap, if the bitmap needs
to be initialized, do not update the checksum in the block group
descriptor.  That's because we're not set up to journal those changes.
Instead, just set the verified bit on the bitmap block, so that it's
not necessary to validate the checksum.

When a block or inode allocation actually happens, at that point the
checksum will be calculated, and update of the bg descriptor block
will be properly journalled.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
[bwh: Backported to 3.16:
 - Deleted code is slightly different
 - Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:52:01 +02:00
Theodore Ts'o f70124ee1b jbd2: if the journal is aborted then don't allow update of the log tail
commit 85e0c4e89c1b864e763c4e3bb15d0b6d501ad5d9 upstream.

This updates the jbd2 superblock unnecessarily, and on an abort we
shouldn't truncate the log.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:52:00 +02:00
Theodore Ts'o 4162634595 ext4: fix check to prevent initializing reserved inodes
commit 5012284700775a4e6e3fbe7eac4c543c4874b559 upstream.

Commit 8844618d8aa7: "ext4: only look at the bg_flags field if it is
valid" will complain if block group zero does not have the
EXT4_BG_INODE_ZEROED flag set.  Unfortunately, this is not correct,
since a freshly created file system has this flag cleared.  It gets
almost immediately after the file system is mounted read-write --- but
the following somewhat unlikely sequence will end up triggering a
false positive report of a corrupted file system:

   mkfs.ext4 /dev/vdc
   mount -o ro /dev/vdc /vdc
   mount -o remount,rw /dev/vdc

Instead, when initializing the inode table for block group zero, test
to make sure that itable_unused count is not too large, since that is
the case that will result in some or all of the reserved inodes
getting cleared.

This fixes the failures reported by Eric Whiteney when running
generic/230 and generic/231 in the the nojournal test case.

Fixes: 8844618d8aa7 ("ext4: only look at the bg_flags field if it is valid")
Reported-by: Eric Whitney <enwlinux@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:52:00 +02:00
Theodore Ts'o 4775d67ccc ext4: only look at the bg_flags field if it is valid
commit 8844618d8aa7a9973e7b527d038a2a589665002c upstream.

The bg_flags field in the block group descripts is only valid if the
uninit_bg or metadata_csum feature is enabled.  We were not
consistently looking at this field; fix this.

Also block group #0 must never have uninitialized allocation bitmaps,
or need to be zeroed, since that's where the root inode, and other
special inodes are set up.  Check for these conditions and mark the
file system as corrupted if they are detected.

This addresses CVE-2018-10876.

https://bugzilla.kernel.org/show_bug.cgi?id=199403

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
[bwh: Backported to 3.16:
 - ext4_read_block_bitmap_nowait() and ext4_read_inode_bitmap() return
   a pointer (NULL on error) instead of an error code
 - Open-code sb_rdonly()
 - Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:52:00 +02:00
Dmitry Monakhov 4236639985 ext4: move error report out of atomic context in ext4_init_block_bitmap()
commit aef4885ae14f1df75b58395c5314d71f613d26d9 upstream.

Error report likely result in IO so it is bad idea to do it from
atomic context.

This patch should fix following issue:

BUG: sleeping function called from invalid context at include/linux/buffer_head.h:349
in_atomic(): 1, irqs_disabled(): 0, pid: 137, name: kworker/u128:1
5 locks held by kworker/u128:1/137:
 #0:  ("writeback"){......}, at: [<ffffffff81085618>] process_one_work+0x228/0x4d0
 #1:  ((&(&wb->dwork)->work)){......}, at: [<ffffffff81085618>] process_one_work+0x228/0x4d0
 #2:  (jbd2_handle){......}, at: [<ffffffff81242622>] start_this_handle+0x712/0x7b0
 #3:  (&ei->i_data_sem){......}, at: [<ffffffff811fa387>] ext4_map_blocks+0x297/0x430
 #4:  (&(&bgl->locks[i].lock)->rlock){......}, at: [<ffffffff811f3180>] ext4_read_block_bitmap_nowait+0x5d0/0x630
CPU: 3 PID: 137 Comm: kworker/u128:1 Not tainted 3.17.0-rc2-00184-g82752e4 #165
Hardware name: Intel Corporation W2600CR/W2600CR, BIOS SE5C600.86B.99.99.x028.061320111235 06/13/2011
Workqueue: writeback bdi_writeback_workfn (flush-1:0)
 0000000000000411 ffff880813777288 ffffffff815c7fdc ffff880813777288
 ffff880813a8bba0 ffff8808137772a8 ffffffff8108fb30 ffff880803e01e38
 ffff880803e01e38 ffff8808137772c8 ffffffff811a8d53 ffff88080ecc6000
Call Trace:
 [<ffffffff815c7fdc>] dump_stack+0x51/0x6d
 [<ffffffff8108fb30>] __might_sleep+0xf0/0x100
 [<ffffffff811a8d53>] __sync_dirty_buffer+0x43/0xe0
 [<ffffffff811a8e03>] sync_dirty_buffer+0x13/0x20
 [<ffffffff8120f581>] ext4_commit_super+0x1d1/0x230
 [<ffffffff8120fa03>] save_error_info+0x23/0x30
 [<ffffffff8120fd06>] __ext4_error+0xb6/0xd0
 [<ffffffff8120f260>] ? ext4_group_desc_csum+0x140/0x190
 [<ffffffff811f2d8c>] ext4_read_block_bitmap_nowait+0x1dc/0x630
 [<ffffffff8122e23a>] ext4_mb_init_cache+0x21a/0x8f0
 [<ffffffff8113ae95>] ? lru_cache_add+0x55/0x60
 [<ffffffff8112e16c>] ? add_to_page_cache_lru+0x6c/0x80
 [<ffffffff8122eaa0>] ext4_mb_init_group+0x190/0x280
 [<ffffffff8122ec51>] ext4_mb_good_group+0xc1/0x190
 [<ffffffff8123309a>] ext4_mb_regular_allocator+0x17a/0x410
 [<ffffffff8122c821>] ? ext4_mb_use_preallocated+0x31/0x380
 [<ffffffff81233535>] ? ext4_mb_new_blocks+0x205/0x8e0
 [<ffffffff8116ed5c>] ? kmem_cache_alloc+0xfc/0x180
 [<ffffffff812335b0>] ext4_mb_new_blocks+0x280/0x8e0
 [<ffffffff8116f2c4>] ? __kmalloc+0x144/0x1c0
 [<ffffffff81221797>] ? ext4_find_extent+0x97/0x320
 [<ffffffff812257f4>] ext4_ext_map_blocks+0xbc4/0x1050
 [<ffffffff811fa387>] ? ext4_map_blocks+0x297/0x430
 [<ffffffff811fa3ab>] ext4_map_blocks+0x2bb/0x430
 [<ffffffff81200e43>] ? ext4_init_io_end+0x23/0x50
 [<ffffffff811feb44>] ext4_writepages+0x564/0xaf0
 [<ffffffff815cde3b>] ? _raw_spin_unlock+0x2b/0x40
 [<ffffffff810ac7bd>] ? lock_release_non_nested+0x2fd/0x3c0
 [<ffffffff811a009e>] ? writeback_sb_inodes+0x10e/0x490
 [<ffffffff811a009e>] ? writeback_sb_inodes+0x10e/0x490
 [<ffffffff811377e3>] do_writepages+0x23/0x40
 [<ffffffff8119c8ce>] __writeback_single_inode+0x9e/0x280
 [<ffffffff811a026b>] writeback_sb_inodes+0x2db/0x490
 [<ffffffff811a0664>] wb_writeback+0x174/0x2d0
 [<ffffffff810ac359>] ? lock_release_holdtime+0x29/0x190
 [<ffffffff811a0863>] wb_do_writeback+0xa3/0x200
 [<ffffffff811a0a40>] bdi_writeback_workfn+0x80/0x230
 [<ffffffff81085618>] ? process_one_work+0x228/0x4d0
 [<ffffffff810856cd>] process_one_work+0x2dd/0x4d0
 [<ffffffff81085618>] ? process_one_work+0x228/0x4d0
 [<ffffffff81085c1d>] worker_thread+0x35d/0x460
 [<ffffffff810858c0>] ? process_one_work+0x4d0/0x4d0
 [<ffffffff810858c0>] ? process_one_work+0x4d0/0x4d0
 [<ffffffff8108a885>] kthread+0xf5/0x100
 [<ffffffff810990e5>] ? local_clock+0x25/0x30
 [<ffffffff8108a790>] ? __init_kthread_worker+0x70/0x70
 [<ffffffff815ce2ac>] ret_from_fork+0x7c/0xb0
 [<ffffffff8108a790>] ? __init_kthread_work

Change-Id: I271c54527fab40a085e96af58e53865086c47959
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
2019-07-27 21:51:59 +02:00
Namjae Jeon 2916d4d914 ext4: fix potential null pointer dereference in ext4_free_inode
Fix potential null pointer dereferencing problem caused by e43bb4e612
("ext4: decrement free clusters/inodes counters when block group declared bad")

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
2019-07-27 21:51:59 +02:00
Namjae Jeon ff43f5624b ext4: decrement free clusters/inodes counters when block group declared bad
We should decrement free clusters counter when block bitmap is marked
as corrupt and free inodes counter when the allocation bitmap is
marked as corrupt to avoid misunderstanding due to incorrect available
size in statfs result.  User can get immediately ENOSPC error from
write begin without reaching for the writepages.

Change-Id: I4074192c82fa1cc27e975d29b3717e9c8872d79e
Cc: Darrick J. Wong<darrick.wong@oracle.com>
Reported-by: Amit Sahrawat <amit.sahrawat83@gmail.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com>
2019-07-27 21:51:59 +02:00
Darrick J. Wong 76e59e5225 ext4: mark group corrupt on group descriptor checksum
If the group descriptor fails validation, mark the whole blockgroup
corrupt so that the inode/block allocators skip this group.  The
previous approach takes the risk of writing to a damaged group
descriptor; hopefully it was never the case that the [ib]bitmap fields
pointed to another valid block and got dirtied, since the memset would
fill the page with 1s.

Change-Id: I6a79d8790c276bb4f17b4fa2ef7f827b4877e9ef
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2019-07-27 21:51:58 +02:00
Darrick J. Wong 0b46451d4d ext4: mark block group as corrupt on inode bitmap error
If we detect either a discrepancy between the inode bitmap and the
inode counts or the inode bitmap fails to pass validation checks, mark
the block group corrupt and refuse to allocate or deallocate inodes
from the group.

Change-Id: Ie920071f5ed9bc3f1b841fa189307a111cf33821
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2019-07-27 21:51:58 +02:00
Darrick J. Wong 265a3492fd ext4: mark block group as corrupt on block bitmap error
When we notice a block-bitmap corruption (because of device failure or
something else), we should mark this group as corrupt and prevent
further block allocations/deallocations from it. Currently, we end up
generating one error message for every block in the bitmap. This
potentially could make the system unstable as noticed in some
bugs. With this patch, the error will be printed only the first time
and mark the entire block group as corrupted. This prevents future
access allocations/deallocations from it.

Also tested by corrupting the block
bitmap and forcefully introducing the mb_free_blocks error:
(1) create a largefile (2Gb)
$ dd if=/dev/zero of=largefile oflag=direct bs=10485760 count=200
(2) umount filesystem. use dumpe2fs to see which block-bitmaps
are in use by largefile and note their block numbers
(3) use dd to zero-out the used block bitmaps
$ dd if=/dev/zero of=/dev/hdc4 bs=4096 seek=14 count=8 oflag=direct
(4) mount the FS and delete the largefile.
(5) recreate the largefile. verify that the new largefile does not
get any blocks from the groups marked as bad.
Without the patch, we will see mb_free_blocks error for each bit in
each zero'ed out bitmap at (4). With the patch, we only see the error
once per blockgroup:
[  309.706803] EXT4-fs error (device sdb4): ext4_mb_generate_buddy:735: group 15: 32768 clusters in bitmap, 0 in gd. blk grp corrupted.
[  309.720824] EXT4-fs error (device sdb4): ext4_mb_generate_buddy:735: group 14: 32768 clusters in bitmap, 0 in gd. blk grp corrupted.
[  309.732858] EXT4-fs error (device sdb4) in ext4_free_blocks:4802: IO failure
[  309.748321] EXT4-fs error (device sdb4): ext4_mb_generate_buddy:735: group 13: 32768 clusters in bitmap, 0 in gd. blk grp corrupted.
[  309.760331] EXT4-fs error (device sdb4) in ext4_free_blocks:4802: IO failure
[  309.769695] EXT4-fs error (device sdb4): ext4_mb_generate_buddy:735: group 12: 32768 clusters in bitmap, 0 in gd. blk grp corrupted.
[  309.781721] EXT4-fs error (device sdb4) in ext4_free_blocks:4802: IO failure
[  309.798166] EXT4-fs error (device sdb4): ext4_mb_generate_buddy:735: group 11: 32768 clusters in bitmap, 0 in gd. blk grp corrupted.
[  309.810184] EXT4-fs error (device sdb4) in ext4_free_blocks:4802: IO failure
[  309.819532] EXT4-fs error (device sdb4): ext4_mb_generate_buddy:735: group 10: 32768 clusters in bitmap, 0 in gd. blk grp corrupted.

Google-Bug-Id: 7258357

[darrick.wong@oracle.com]
Further modifications (by Darrick) to make more obvious that this corruption
bit applies to blocks only.  Set the corruption flag if the block group bitmap
verification fails.

Original-author: Aditya Kali <adityakali@google.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2019-07-27 21:51:57 +02:00
Darrick J. Wong 4065a212ca ext4: fix type declaration of ext4_validate_block_bitmap
The block_group parameter to ext4_validate_block_bitmap is both used
as a ext4_group_t inside the function and the same type is passed in
by all callers.  We might as well use the typedef consistently instead
of open-coding the 'unsigned int'.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2019-07-27 21:51:57 +02:00
Darrick J. Wong f687ec758b ext4: error out if verifying the block bitmap fails
The block bitmap verification code assumes that calling ext4_error()
either panics the system or makes the fs readonly.  However, this is
not always true: when 'errors=continue' is specified, an error is
printed but we don't return any indication of error to the caller,
which is (probably) the block allocator, which pretends that the crud
we read in off the disk is a usable bitmap.  Yuck.

A block bitmap that fails the check should at least return no bitmap
to the caller.  The block allocator should be told to go look in a
different group, but that's a separate issue.

The easiest way to reproduce this is to modify bg_block_bitmap (on a
^flex_bg fs) to point to a block outside the block group; or you can
create a metadata_csum filesystem and zero out the block bitmaps.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2019-07-27 21:51:57 +02:00
Theodore Ts'o 245167c564 ext4: add sanity check to ext4_get_group_info()
The group number passed to ext4_get_group_info() should be valid, but
let's add an assert to check this before we start creating a pointer
based on that group number and dereferencing it.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2019-07-27 21:51:56 +02:00
Vasily Khoruzhick f3b7b6d01f neighbour: confirm neigh entries when ARP packet is received
[ Upstream commit f0e0d04413fcce9bc76388839099aee93cd0d33b ]

Update 'confirmed' timestamp when ARP packet is received. It shouldn't
affect locktime logic and anyway entry can be confirmed by any higher-layer
protocol. Thus it makes sense to confirm it when ARP packet is received.

Fixes: 77d7123342dc ("neighbour: update neigh timestamps iff update is effective")
Signed-off-by: Vasily Khoruzhick <vasilykh@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-27 21:51:56 +02:00
Ihar Hrachyshka 4691e7d1ee neighbour: update neigh timestamps iff update is effective
[ Upstream commit 77d7123342dcf6442341b67816321d71da8b2b16 ]

It's a common practice to send gratuitous ARPs after moving an
IP address to another device to speed up healing of a service. To
fulfill service availability constraints, the timing of network peers
updating their caches to point to a new location of an IP address can be
particularly important.

Sometimes neigh_update calls won't touch neither lladdr nor state, for
example if an update arrives in locktime interval. The neigh->updated
value is tested by the protocol specific neigh code, which in turn
will influence whether NEIGH_UPDATE_F_OVERRIDE gets set in the
call to neigh_update() or not. As a result, we may effectively ignore
the update request, bailing out of touching the neigh entry, except that
we still bump its timestamps inside neigh_update.

This may be a problem for updates arriving in quick succession. For
example, consider the following scenario:

A service is moved to another device with its IP address. The new device
sends three gratuitous ARP requests into the network with ~1 seconds
interval between them. Just before the first request arrives to one of
network peer nodes, its neigh entry for the IP address transitions from
STALE to DELAY.  This transition, among other things, updates
neigh->updated. Once the kernel receives the first gratuitous ARP, it
ignores it because its arrival time is inside the locktime interval. The
kernel still bumps neigh->updated. Then the second gratuitous ARP
request arrives, and it's also ignored because it's still in the (new)
locktime interval. Same happens for the third request. The node
eventually heals itself (after delay_first_probe_time seconds since the
initial transition to DELAY state), but it just wasted some time and
require a new ARP request/reply round trip. This unfortunate behaviour
both puts more load on the network, as well as reduces service
availability.

This patch changes neigh_update so that it bumps neigh->updated (as well
as neigh->confirmed) only once we are sure that either lladdr or entry
state will change). In the scenario described above, it means that the
second gratuitous ARP request will actually update the entry lladdr.

Ideally, we would update the neigh entry on the very first gratuitous
ARP request. The locktime mechanism is designed to ignore ARP updates in
a short timeframe after a previous ARP update was honoured by the kernel
layer. This would require tracking timestamps for state transitions
separately from timestamps when actual updates are received. This would
probably involve changes in neighbour struct. Therefore, the patch
doesn't tackle the issue of the first gratuitous APR ignored, leaving
it for a follow-up.

Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-27 21:51:56 +02:00
Theodore Ts'o 68ebaa5e0f ext4: avoid divide by zero fault when deleting corrupted inline directories
commit 4d982e25d0bdc83d8c64e66fdeca0b89240b3b85 upstream.

A specially crafted file system can trick empty_inline_dir() into
reading past the last valid entry in a inline directory, and then run
into the end of xattr marker. This will trigger a divide by zero
fault.  Fix this by using the size of the inline directory instead of
dir->i_size.

Also clean up error reporting in __ext4_check_dir_entry so that the
message is clearer and more understandable --- and avoids the division
by zero trap if the size passed in is zero.  (I'm not sure why we
coded it that way in the first place; printing offset % size is
actually more confusing and less useful.)

https://bugzilla.kernel.org/show_bug.cgi?id=200933

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reported-by: Wen Xu <wen.xu@gatech.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-27 21:51:55 +02:00
Eric Dumazet 23fffe9854 ipv6: fix possible use-after-free in ip6_xmit()
[ Upstream commit bbd6528d28c1b8e80832b3b018ec402b6f5c3215 ]

In the unlikely case ip6_xmit() has to call skb_realloc_headroom(),
we need to call skb_set_owner_w() before consuming original skb,
otherwise we risk a use-after-free.

Bring IPv6 in line with what we do in IPv4 to fix this.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-27 21:51:55 +02:00
Sakari Ailus 0daf8a201b media: v4l: event: Prevent freeing event subscriptions while accessed
commit ad608fbcf166fec809e402d548761768f602702c upstream.

The event subscriptions are added to the subscribed event list while
holding a spinlock, but that lock is subsequently released while still
accessing the subscription object. This makes it possible to unsubscribe
the event --- and freeing the subscription object's memory --- while
the subscription object is simultaneously accessed.

Prevent this by adding a mutex to serialise the event subscription and
unsubscription. This also gives a guarantee to the callback ops that the
add op has returned before the del op is called.

This change also results in making the elems field less special:
subscriptions are only added to the event list once they are fully
initialised.

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Reviewed-by: Hans Verkuil <hans.verkuil@cisco.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Cc: stable@vger.kernel.org # for 4.14 and up
Fixes: c3b5b0241f ("V4L/DVB: V4L: Events: Add backend")
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-27 21:51:55 +02:00