commit dd111be69114cc867f8e826284559bfbc1c40e37 upstream.
When root activates a swap partition whose header has the wrong
endianness, nr_badpages elements of badpages are swabbed before
nr_badpages has been checked, leading to a buffer overrun of up to 8GB.
This normally is not a security issue because it can only be exploited
by root (more specifically, a process with CAP_SYS_ADMIN or the ability
to modify a swap file/partition), and such a process can already e.g.
modify swapped-out memory of any other userspace process on the system.
Link: http://lkml.kernel.org/r/1477949533-2509-1-git-send-email-jann@thejh.net
Signed-off-by: Jann Horn <jann@thejh.net>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit e23d4159b109167126e5bcd7f3775c95de7fee47 upstream.
Switching iov_iter fault-in to multipages variants has exposed an old
bug in underlying fault_in_multipages_...(); they break if the range
passed to them wraps around. Normally access_ok() done by callers will
prevent such (and it's a guaranteed EFAULT - ERR_PTR() values fall into
such a range and they should not point to any valid objects).
However, on architectures where userland and kernel live in different
MMU contexts (e.g. s390) access_ok() is a no-op and on those a range
with a wraparound can reach fault_in_multipages_...().
Since any wraparound means EFAULT there, the fix is trivial - turn
those
while (uaddr <= end)
...
into
if (unlikely(uaddr > end))
return -EFAULT;
do
...
while (uaddr <= end);
Reported-by: Jan Stancek <jstancek@redhat.com>
Tested-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 2545e5da080b4839dd859e3b09343a884f6ab0e3 upstream.
... in all cases, including the failing access_ok()
Note that some architectures using asm-generic/uaccess.h have
__copy_from_user() not zeroing the tail on failure halfway
through. This variant works either way.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[wt: s/might_fault/might_sleep]
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 9ad18b75c2f6e4a78ce204e79f37781f8815c0fa upstream.
both for access_ok() failures and for faults halfway through
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 3a402a709500c5a3faca2111668c33d96555e35a upstream.
When TIF_SINGLESTEP is set for a task, the single-step state machine is
enabled and we must take care not to reset it to the active-not-pending
state if it is already in the active-pending state.
Unfortunately, that's exactly what user_enable_single_step does, by
unconditionally setting the SS bit in the SPSR for the current task.
This causes failures in the GDB testsuite, where GDB ends up missing
expected step traps if the instruction being stepped generates another
trap, e.g. PTRACE_EVENT_FORK from an SVC instruction.
This patch fixes the problem by preserving the current state of the
stepping state machine when TIF_SINGLESTEP is set on the current thread.
Cc: <stable@vger.kernel.org>
Reported-by: Yao Qi <yao.qi@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 872c63fbf9e153146b07f0cece4da0d70b283eeb upstream.
smp_mb__before_spinlock() is intended to upgrade a spin_lock() operation
to a full barrier, such that prior stores are ordered with respect to
loads and stores occuring inside the critical section.
Unfortunately, the core code defines the barrier as smp_wmb(), which
is insufficient to provide the required ordering guarantees when used in
conjunction with our load-acquire-based spinlock implementation.
This patch overrides the arm64 definition of smp_mb__before_spinlock()
to map to a full smp_mb().
Cc: Peter Zijlstra <peterz@infradead.org>
Reported-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 3146bc64d12377a74dbda12b96ea32da3774ae07 upstream.
AT_VECTOR_SIZE_ARCH should be defined with the maximum number of
NEW_AUX_ENT entries that ARCH_DLINFO can contain, but it wasn't defined
for arm64 at all even though ARCH_DLINFO will contain one NEW_AUX_ENT
for the VDSO address.
This shouldn't be a problem as AT_VECTOR_SIZE_BASE includes space for
AT_BASE_PLATFORM which arm64 doesn't use, but lets define it now and add
the comment above ARCH_DLINFO as found in several other architectures to
remind future modifiers of ARCH_DLINFO to keep AT_VECTOR_SIZE_ARCH up to
date.
Fixes: f668cd1673 ("arm64: ELF definitions")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 7d9e8f71b989230bc613d121ca38507d34ada849 upstream.
Generally, taking an unexpected exception should be a fatal event, and
bad_mode is intended to cater for this. However, it should be possible
to contain unexpected synchronous exceptions from EL0 without bringing
the kernel down, by sending a SIGILL to the task.
We tried to apply this approach in commit 9955ac47f4 ("arm64:
don't kill the kernel on a bad esr from el0"), by sending a signal for
any bad_mode call resulting from an EL0 exception.
However, this also applies to other unexpected exceptions, such as
SError and FIQ. The entry paths for these exceptions branch to bad_mode
without configuring the link register, and have no kernel_exit. Thus, if
we take one of these exceptions from EL0, bad_mode will eventually
return to the original user link register value.
This patch fixes this by introducing a new bad_el0_sync handler to cater
for the recoverable case, and restoring bad_mode to its original state,
whereby it calls panic() and never returns. The recoverable case
branches to bad_el0_sync with a bl, and returns to userspace via the
usual ret_to_user mechanism.
Change-Id: Icefe3c0df25271229112ac9a924b4b213e4134fe
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Fixes: 9955ac47f4 ("arm64: don't kill the kernel on a bad esr from el0")
Reported-by: Mark Salter <msalter@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 06dfe5cc0cc684e735cb0232fdb756d30780b05d upstream.
SA1111 PCMCIA was broken when PCMCIA switched to using dev_pm_ops for
the PCMCIA socket class. PCMCIA used to handle suspend/resume via the
socket hosting device, which happened at normal device suspend/resume
time.
However, the referenced commit changed this: much of the resume now
happens much earlier, in the noirq resume handler of dev_pm_ops.
However, on SA1111, the PCMCIA device is not accessible as the SA1111
has not been resumed at _noirq time. It's slightly worse than that,
because the SA1111 has already been put to sleep at _noirq time, so
suspend doesn't work properly.
Fix this by converting the core SA1111 code to use dev_pm_ops as well,
and performing its own suspend/resume at noirq time.
This fixes these errors in the kernel log:
pcmcia_socket pcmcia_socket0: time out after reset
pcmcia_socket pcmcia_socket1: time out after reset
and the resulting lack of PCMCIA cards after a S2RAM cycle.
Fixes: d7646f7632 ("pcmcia: use dev_pm_ops for class pcmcia_socket_class")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit da60626e7d02a4f385cae80e450afc8b07035368 upstream.
Clear the current reset status prior to rebooting the platform. This
adds the bit missing from 04fef228fb ("[ARM] pxa: introduce
reset_status and clear_reset_status for driver's usage").
Fixes: 04fef228fb ("[ARM] pxa: introduce reset_status and clear_reset_status for driver's usage")
Change-Id: I4f9ad2831f4d55f15fc555b3605b020236f4ae37
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 117e5e9c4cfcb7628f08de074fbfefec1bb678b7 upstream.
If the bootloader uses the long descriptor format and jumps to
kernel decompressor code, TTBCR may not be in a right state.
Before enabling the MMU, it is required to clear the TTBCR.PD0
field to use TTBR0 for translation table walks.
The commit dbece45894 ("ARM: 7501/1: decompressor:
reset ttbcr for VMSA ARMv7 cores") does the reset of TTBCR.N, but
doesn't consider all the bits for the size of TTBCR.N.
Clear TTBCR.PD0 field and reset all the three bits of TTBCR.N to
indicate the use of TTBR0 and the correct base address width.
Fixes: dbece45894 ("ARM: 7501/1: decompressor: reset ttbcr for VMSA ARMv7 cores")
Acked-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Srinivas Ramana <sramana@codeaurora.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit ba6dea4f7cedb4b1c17e36f4087675d817c2e24b upstream.
Whilst MPIDR values themselves are less than 32 bits, it is still
perfectly valid for a DT to have #address-cells > 1 in the CPUs node,
resulting in the "reg" property having leading zero cell(s). In that
situation, the big-endian nature of the data conspires with the current
behaviour of only reading the first cell to cause the kernel to think
all CPUs have ID 0, and become resoundingly unhappy as a consequence.
Take the full property length into account when parsing CPUs so as to
be correct under any circumstances.
Cc: Russell King <linux@armlinux.org.uk>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 205e1e255c479f3fd77446415706463b282f94e4 upstream
Matt reported that we have a NULL pointer dereference
in ppp_pernet() from ppp_connect_channel(),
i.e. pch->chan_net is NULL.
This is due to that a parallel ppp_unregister_channel()
could happen while we are in ppp_connect_channel(), during
which pch->chan_net set to NULL. Since we need a reference
to net per channel, it makes sense to sync the refcnt
with the life time of the channel, therefore we should
release this reference when we destroy it.
Fixes: 1f461dcdd296 ("ppp: take reference on channels netns")
Reported-by: Matt Bennett <Matt.Bennett@alliedtelesis.co.nz>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linux-ppp@vger.kernel.org
Cc: Guillaume Nault <g.nault@alphalink.fr>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 50d2e6dc1f83db0563c7d6603967bf9585ce934b upstream.
The cipher block size for GCM is 16 bytes, and thus the CTR transform
used in crypto_gcm_setkey() will also expect a 16-byte IV. However,
the code currently reserves only 8 bytes for the IV, causing
an out-of-bounds access in the CTR transform. This patch fixes
the issue by setting the size of the IV buffer to 16 bytes.
Fixes: 84c9115230 ("[CRYPTO] gcm: Add support for async ciphers")
Signed-off-by: Ondrej Mosnacek <omosnacek@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit acdb04d0b36769b3e05990c488dc74d8b7ac8060 upstream.
When we need to allocate a temporary blkcipher_walk_next and it
fails, the code is supposed to take the slow path of processing
the data block by block. However, due to an unrelated change
we instead end up dereferencing the NULL pointer.
This patch fixes it by moving the unrelated bsize setting out
of the way so that we enter the slow path as inteded.
Fixes: 7607bd8ff0 ("[CRYPTO] blkcipher: Added blkcipher_walk_virt_block")
Change-Id: I1f555ebb673b471679370b25c5f11d8246afd2ae
Cc: stable@vger.kernel.org
Reported-by: xiakaixu <xiakaixu@huawei.com>
Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 0bd2223594a4dcddc1e34b15774a3a4776f7749e upstream.
When calling .import() on a cryptd ahash_request, the structure members
that describe the child transform in the shash_desc need to be initialized
like they are when calling .init()
Cc: stable@vger.kernel.org
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 1822793a523e5d5730b19cc21160ff1717421bc8 upstream.
We need to lock the child socket in skcipher_check_key as otherwise
two simultaneous calls can cause the parent socket to be freed.
Cc: stable@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit ad46d7e33219218605ea619e32553daf4f346b9f upstream.
We need to lock the child socket in hash_check_key as otherwise
two simultaneous calls can cause the parent socket to be freed.
Cc: stable@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit a6a48c565f6f112c6983e2a02b1602189ed6e26e upstream.
This patch forbids the calling of bind(2) when there are child
sockets created by accept(2) in existence, even if they are created
on the nokey path.
This is needed as those child sockets have references to the tfm
object which bind(2) will destroy.
Cc: stable@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit d7b65aee1e7b4c87922b0232eaba56a8a143a4a0 upstream.
This patch removes the custom release parent function as the
generic af_alg_release_parent now works for nokey sockets too.
Cc: stable@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit f1d84af1835846a5a2b827382c5848faf2bb0e75 upstream.
This patch removes the custom release parent function as the
generic af_alg_release_parent now works for nokey sockets too.
Cc: stable@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 6a935170a980024dd29199e9dbb5c4da4767a1b9 upstream.
This patch allows af_alg_release_parent to be called even for
nokey sockets.
Cc: stable@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 6e8d8ecf438792ecf7a3207488fb4eebc4edb040 upstream.
This patch adds an exception to the key check so that cipher_null
users may continue to use algif_skcipher without setting a key.
Cc: stable@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit a1383cd86a062fc798899ab20f0ec2116cce39cb upstream.
This patch adds a way for skcipher users to determine whether a key
is required by a transform.
Cc: stable@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 6de62f15b581f920ade22d758f4c338311c2f0d4 upstream.
Hash implementations that require a key may crash if you use
them without setting a key. This patch adds the necessary checks
so that if you do attempt to use them without a key that we return
-ENOKEY instead of proceeding.
This patch also adds a compatibility path to support old applications
that do acept(2) before setkey.
Cc: stable@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 00420a65fa2beb3206090ead86942484df2275f3 upstream.
The has_key logic is wrong for shash algorithms as they always
have a setkey function. So we should instead be testing against
shash_no_setkey.
Fixes: a5596d633278 ("crypto: hash - Add crypto_ahash_has_setkey")
Cc: stable@vger.kernel.org
Reported-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit a5596d6332787fd383b3b5427b41f94254430827 upstream.
This patch adds a way for ahash users to determine whether a key
is required by a crypto_ahash transform.
Cc: stable@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit a0fa2d037129a9849918a92d91b79ed6c7bd2818 upstream.
This patch adds a compatibility path to support old applications
that do acept(2) before setkey.
Cc: stable@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 37766586c965d63758ad542325a96d5384f4a8c9 upstream.
This patch adds a compatibility path to support old applications
that do acept(2) before setkey.
Cc: stable@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit c840ac6af3f8713a71b4d2363419145760bd6044 upstream.
Each af_alg parent socket obtained by socket(2) corresponds to a
tfm object once bind(2) has succeeded. An accept(2) call on that
parent socket creates a context which then uses the tfm object.
Therefore as long as any child sockets created by accept(2) exist
the parent socket must not be modified or freed.
This patch guarantees this by using locks and a reference count
on the parent socket. Any attempt to modify the parent socket will
fail with EBUSY.
Cc: stable@vger.kernel.org
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit dd504589577d8e8e70f51f997ad487a4cb6c026f upstream.
Some cipher implementations will crash if you try to use them
without calling setkey first. This patch adds a check so that
the accept(2) call will fail with -ENOKEY if setkey hasn't been
done on the socket yet.
Cc: stable@vger.kernel.org
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit ecf7d01c229d11a44609c0067889372c91fb4f36 upstream.
Oleg noticed that its possible to falsely observe p->on_cpu == 0 such
that we'll prematurely continue with the wakeup and effectively run p on
two CPUs at the same time.
Even though the overlap is very limited; the task is in the middle of
being scheduled out; it could still result in corruption of the
scheduler data structures.
CPU0 CPU1
set_current_state(...)
<preempt_schedule>
context_switch(X, Y)
prepare_lock_switch(Y)
Y->on_cpu = 1;
finish_lock_switch(X)
store_release(X->on_cpu, 0);
try_to_wake_up(X)
LOCK(p->pi_lock);
t = X->on_cpu; // 0
context_switch(Y, X)
prepare_lock_switch(X)
X->on_cpu = 1;
finish_lock_switch(Y)
store_release(Y->on_cpu, 0);
</preempt_schedule>
schedule();
deactivate_task(X);
X->on_rq = 0;
if (X->on_rq) // false
if (t) while (X->on_cpu)
cpu_relax();
context_switch(X, ..)
finish_lock_switch(X)
store_release(X->on_cpu, 0);
Avoid the load of X->on_cpu being hoisted over the X->on_rq load.
Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 135e8c9250dd5c8c9aae5984fde6f230d0cbfeaf upstream.
The origin of the issue I've seen is related to
a missing memory barrier between check for task->state and
the check for task->on_rq.
The task being woken up is already awake from a schedule()
and is doing the following:
do {
schedule()
set_current_state(TASK_(UN)INTERRUPTIBLE);
} while (!cond);
The waker, actually gets stuck doing the following in
try_to_wake_up():
while (p->on_cpu)
cpu_relax();
Analysis:
The instance I've seen involves the following race:
CPU1 CPU2
while () {
if (cond)
break;
do {
schedule();
set_current_state(TASK_UN..)
} while (!cond);
wakeup_routine()
spin_lock_irqsave(wait_lock)
raw_spin_lock_irqsave(wait_lock) wake_up_process()
} try_to_wake_up()
set_current_state(TASK_RUNNING); ..
list_del(&waiter.list);
CPU2 wakes up CPU1, but before it can get the wait_lock and set
current state to TASK_RUNNING the following occurs:
CPU3
wakeup_routine()
raw_spin_lock_irqsave(wait_lock)
if (!list_empty)
wake_up_process()
try_to_wake_up()
raw_spin_lock_irqsave(p->pi_lock)
..
if (p->on_rq && ttwu_wakeup())
..
while (p->on_cpu)
cpu_relax()
..
CPU3 tries to wake up the task on CPU1 again since it finds
it on the wait_queue, CPU1 is spinning on wait_lock, but immediately
after CPU2, CPU3 got it.
CPU3 checks the state of p on CPU1, it is TASK_UNINTERRUPTIBLE and
the task is spinning on the wait_lock. Interestingly since p->on_rq
is checked under pi_lock, I've noticed that try_to_wake_up() finds
p->on_rq to be 0. This was the most confusing bit of the analysis,
but p->on_rq is changed under runqueue lock, rq_lock, the p->on_rq
check is not reliable without this fix IMHO. The race is visible
(based on the analysis) only when ttwu_queue() does a remote wakeup
via ttwu_queue_remote. In which case the p->on_rq change is not
done uder the pi_lock.
The result is that after a while the entire system locks up on
the raw_spin_irqlock_save(wait_lock) and the holder spins infintely
Reproduction of the issue:
The issue can be reproduced after a long run on my system with 80
threads and having to tweak available memory to very low and running
memory stress-ng mmapfork test. It usually takes a long time to
reproduce. I am trying to work on a test case that can reproduce
the issue faster, but thats work in progress. I am still testing the
changes on my still in a loop and the tests seem OK thus far.
Big thanks to Benjamin and Nick for helping debug this as well.
Ben helped catch the missing barrier, Nick caught every missing
bit in my theory.
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
[ Updated comment to clarify matching barriers. Many
architectures do not have a full barrier in switch_to()
so that cannot be relied upon. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <nicholas.piggin@gmail.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/e02cce7b-d9ca-1ad0-7a61-ea97c7582b37@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit ad33bb04b2a6cee6c1f99fabb15cddbf93ff0433 upstream.
pmd_trans_unstable()/pmd_none_or_trans_huge_or_clear_bad() were
introduced to locklessy (but atomically) detect when a pmd is a regular
(stable) pmd or when the pmd is unstable and can infinitely transition
from pmd_none() and pmd_trans_huge() from under us, while only holding
the mmap_sem for reading (for writing not).
While holding the mmap_sem only for reading, MADV_DONTNEED can run from
under us and so before we can assume the pmd to be a regular stable pmd
we need to compare it against pmd_none() and pmd_trans_huge() in an
atomic way, with pmd_trans_unstable(). The old pmd_trans_huge() left a
tiny window for a race.
Useful applications are unlikely to notice the difference as doing
MADV_DONTNEED concurrently with a page fault would lead to undefined
behavior.
[js] 3.12 backport: no pmd_devmap in 3.12 yet.
[akpm@linux-foundation.org: tidy up comment grammar/layout]
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit e514cc0a492a3f39ef71b31590a7ef67537ee04b upstream.
The props->ap[] array is defined like this:
struct alg_props ap[NX_MAX_FC][NX_MAX_MODE][3];
So we can see that if msc->fc and msc->mode are == to NX_MAX_FC or
NX_MAX_MODE then we're off by one.
Fixes: ae0222b728 ('powerpc/crypto: nx driver code supporting nx encryption')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit dee08ab83d0378d922b67e7cf10bbec3e4ea343b upstream.
Function regulator_enable() may return an error that has to be checked.
This patch changes function rfkill_regulator_set_block() so that it checks
for the return code. Also, rfkill_data->reg_enabled is set to 'true' only
if there is no error.
This fixes the following compilation warning:
net/rfkill/rfkill-regulator.c:43:20: warning: ignoring return value of 'regulator_enable', declared with attribute warn_unused_result [-Wunused-result]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 09a5c34e8d6b05663ec4c3d22b1fbd9fec89aaf9 upstream.
GCC reports a -Wlogical-not-parentheses warning here; therefore
add parentheses to shut it up and to express our intent more.
Signed-off-by: James C Boyd <jcboyd.dev@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit bca014caaa6130e57f69b5bf527967aa8ee70fdd upstream.
Signing a module should only make it trusted by the specific kernel it
was built for, not anything else. Loading a signed module meant for a
kernel with a different ABI could have interesting effects.
Therefore, treat all signatures as invalid when a module is
force-loaded.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit b2e1c26f0b62531636509fbcb6dab65617ed8331 upstream.
glibc recently did a sync up (94e73c95d9b5 "elf.h: Sync with the gabi
webpage") that added a #define for EM_METAG but did not add relocations
This triggers build errors:
scripts/recordmcount.c: In function 'do_file':
scripts/recordmcount.c:466:28: error: 'R_METAG_ADDR32' undeclared (first use in this function)
case EM_METAG: reltype = R_METAG_ADDR32;
^~~~~~~~~~~~~~
scripts/recordmcount.c:466:28: note: each undeclared identifier is reported only once for each function it appears in
scripts/recordmcount.c:468:20: error: 'R_METAG_NONE' undeclared (first use in this function)
rel_type_nop = R_METAG_NONE;
^~~~~~~~~~~~
Work around this change with some more #ifdefery for the relocations.
Fedora Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1354034
Link: http://lkml.kernel.org/r/1468005530-14757-1-git-send-email-labbott@redhat.com
Cc: stable@vger.kernel.org # v3.9+
Cc: James Hogan <james.hogan@imgtec.com>
Fixes: 00512bdd45 ("metag: ftrace support")
Reported-by: Ross Burton <ross.burton@intel.com>
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 0e0e36774081534783aa8eeb9f6fbddf98d3c061 upstream.
It seems risky to always rely on the caller to ensure the socket's
address family is correct before passing it to the NetLabel kAPI,
especially since we see at least one LSM which didn't. Add address
family checks to the *_delattr() functions to help prevent future
problems.
Cc: <stable@vger.kernel.org>
Reported-by: Maninder Singh <maninder1.s@samsung.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 23bc6ab0a0912146fd674a0becc758c3162baabc upstream.
When we retrieve imtu value from userspace we should use 16 bit pointer
cast instead of 32 as it's defined that way in headers. Fixes setsockopt
calls on big-endian platforms.
Signed-off-by: Amadeusz SÅawiÅski <amadeusz.slawinski@tieto.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: stable@vger.kernel.org
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit a621bac3044ed6f7ec5fa0326491b2d4838bfa93 upstream.
When SCSI was written, all commands coming from the filesystem
(REQ_TYPE_FS commands) had data. This meant that our signal for needing
to complete the command was the number of bytes completed being equal to
the number of bytes in the request. Unfortunately, with the advent of
flush barriers, we can now get zero length REQ_TYPE_FS commands, which
confuse this logic because they satisfy the condition every time. This
means they never get retried even for retryable conditions, like UNIT
ATTENTION because we complete them early assuming they're done. Fix
this by special casing the early completion condition to recognise zero
length commands with errors and let them drop through to the retry code.
Reported-by: Sebastian Parschauer <s.parschauer@gmx.de>
Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
Tested-by: Jack Wang <jinpu.wang@profitbricks.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
[ jwang: backport from upstream 4.7 to fix scsi resize issue ]
Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 635682a14427d241bab7bbdeebb48a7d7b91638e upstream.
A case can occur when sctp_accept() is called by the user during
a heartbeat timeout event after the 4-way handshake. Since
sctp_assoc_migrate() changes both assoc->base.sk and assoc->ep, the
bh_sock_lock in sctp_generate_heartbeat_event() will be taken with
the listening socket but released with the new association socket.
The result is a deadlock on any future attempts to take the listening
socket lock.
Note that this race can occur with other SCTP timeouts that take
the bh_lock_sock() in the event sctp_accept() is called.
BUG: soft lockup - CPU#9 stuck for 67s! [swapper:0]
...
RIP: 0010:[<ffffffff8152d48e>] [<ffffffff8152d48e>] _spin_lock+0x1e/0x30
RSP: 0018:ffff880028323b20 EFLAGS: 00000206
RAX: 0000000000000002 RBX: ffff880028323b20 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff880028323be0 RDI: ffff8804632c4b48
RBP: ffffffff8100bb93 R08: 0000000000000000 R09: 0000000000000000
R10: ffff880610662280 R11: 0000000000000100 R12: ffff880028323aa0
R13: ffff8804383c3880 R14: ffff880028323a90 R15: ffffffff81534225
FS: 0000000000000000(0000) GS:ffff880028320000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000006df528 CR3: 0000000001a85000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffff880616b70000, task ffff880616b6cab0)
Stack:
ffff880028323c40 ffffffffa01c2582 ffff880614cfb020 0000000000000000
<d> 0100000000000000 00000014383a6c44 ffff8804383c3880 ffff880614e93c00
<d> ffff880614e93c00 0000000000000000 ffff8804632c4b00 ffff8804383c38b8
Call Trace:
<IRQ>
[<ffffffffa01c2582>] ? sctp_rcv+0x492/0xa10 [sctp]
[<ffffffff8148c559>] ? nf_iterate+0x69/0xb0
[<ffffffff814974a0>] ? ip_local_deliver_finish+0x0/0x2d0
[<ffffffff8148c716>] ? nf_hook_slow+0x76/0x120
[<ffffffff814974a0>] ? ip_local_deliver_finish+0x0/0x2d0
[<ffffffff8149757d>] ? ip_local_deliver_finish+0xdd/0x2d0
[<ffffffff81497808>] ? ip_local_deliver+0x98/0xa0
[<ffffffff81496ccd>] ? ip_rcv_finish+0x12d/0x440
[<ffffffff81497255>] ? ip_rcv+0x275/0x350
[<ffffffff8145cfeb>] ? __netif_receive_skb+0x4ab/0x750
...
With lockdep debugging:
=====================================
[ BUG: bad unlock balance detected! ]
-------------------------------------
CslRx/12087 is trying to release lock (slock-AF_INET) at:
[<ffffffffa01bcae0>] sctp_generate_timeout_event+0x40/0xe0 [sctp]
but there are no more locks to release!
other info that might help us debug this:
2 locks held by CslRx/12087:
#0: (&asoc->timers[i]){+.-...}, at: [<ffffffff8108ce1f>] run_timer_softirq+0x16f/0x3e0
#1: (slock-AF_INET){+.-...}, at: [<ffffffffa01bcac3>] sctp_generate_timeout_event+0x23/0xe0 [sctp]
Ensure the socket taken is also the same one that is released by
saving a copy of the socket before entering the timeout event
critical section.
Signed-off-by: Karl Heiss <kheiss@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[wt: adjusted, 3.10 uses sctp_bh_unlock_sock() instead of bh_lock_sock()]
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 54e430bbd490e18ab116afa4cd90dcc45787b3df upstream.
If we fall back to using LSI on the Croc or Crocodile chip we need to
clear the interrupt so we don't hang the system.
Cc: <stable@vger.kernel.org>
Tested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit f68381a70bb2b26c31b13fdaf67c778f92fd32b4 upstream.
The code that fills packed command header assumes that CPU runs in
little-endian mode. Hence the header is malformed in big-endian mode
and causes MMC data transfer errors:
[ 563.200828] mmcblk0: error -110 transferring data, sector 2048, nr 8, cmd response 0x900, card status 0xc40
[ 563.219647] mmcblk0: packed cmd failed, nr 2, sectors 16, failure index: -1
Convert header data to LE.
Signed-off-by: Taras Kondratiuk <takondra@cisco.com>
Fixes: ce39f9d17c ("mmc: support packed write command for eMMC4.5 devices")
Cc: <stable@vger.kernel.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>