commit b30bdfa86431afbafe15284a3ad5ac19b49b88e3 upstream.
As it is if you ask for a sync gcm you may actually end up with
an async one because it does not filter out async implementations
of ghash.
This patch fixes this by adding the necessary filter when looking
for ghash.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 19ced623db2fe91604d69f7d86b03144c5107739 upstream.
The hash buffer is really HASH_BLOCK_SIZE bytes, someone
must have thought that memmove takes n*u32 words by mistake.
Tests work as good/bad as before after this patch.
Cc: Joakim Bech <joakim.bech@linaro.org>
Cc: stable@vger.kernel.org
Reported-by: David Binderman <linuxdev.baldrick@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 3d56c25e3bb0726a5c5e16fc2d9e38f8ed763085 upstream.
Ascend-to-parent logics in d_walk() depends on all encountered child
dentries not getting freed without an RCU delay. Unfortunately, in
quite a few cases it is not true, with hard-to-hit oopsable race as
the result.
Fortunately, the fix is simiple; right now the rule is "if it ever
been hashed, freeing must be delayed" and changing it to "if it
ever had a parent, freeing must be delayed" closes that hole and
covers all cases the old rule used to cover. Moreover, pipes and
sockets remain _not_ covered, so we do not introduce RCU delay in
the cases which are the reason for having that delay conditional
in the first place.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[wt: add the required change to __d_materialise_dentry() for kernels
older than v3.17]
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit e2dfb4b880146bfd4b6aa8e138c0205407cebbaf upstream.
PTRACE_SETVFPREGS fails to properly mark the VFP register set to be
reloaded, because it undoes one of the effects of vfp_flush_hwstate().
Specifically vfp_flush_hwstate() sets thread->vfpstate.hard.cpu to
an invalid CPU number, but vfp_set() overwrites this with the original
CPU number, thereby rendering the hardware state as apparently "valid",
even though the software state is more recent.
Fix this by reverting the previous change.
Cc: <stable@vger.kernel.org>
Fixes: 8130b9d7b9 ("ARM: 7308/1: vfp: flush thread hwstate before copying ptrace registers")
Acked-by: Will Deacon <will.deacon@arm.com>
Tested-by: Simon Marchi <simon.marchi@ericsson.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 554a5ccc4e4a20c5f3ec859de0842db4b4b9c77e upstream.
If we hit this error when mounted with errors=continue or
errors=remount-ro:
EXT4-fs error (device loop0): ext4_mb_mark_diskspace_used:2940: comm ext4.exe: Allocating blocks 5090-6081 which overlap fs metadata
then ext4_mb_new_blocks() will call ext4_mb_release_context() and try to
continue. However, ext4_mb_release_context() is the wrong thing to call
here since we are still actually using the allocation context.
Instead, just error out. We could retry the allocation, but there is a
possibility of getting stuck in an infinite loop instead, so this seems
safer.
[ Fixed up so we don't return EAGAIN to userspace. --tytso ]
Fixes: 8556e8f3b6 ("ext4: Don't allow new groups to be added during block allocation")
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
[wt: 3.10 doesn't have EFSCORRUPTED, but XFS uses EUCLEAN as does 3.14
on this patch so use this instead]
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit c65d5c6c81a1f27dec5f627f67840726fcd146de upstream.
If we encounter a filesystem error during orphan cleanup, we should stop.
Otherwise, we may end up in an infinite loop where the same inode is
processed again and again.
EXT4-fs (loop0): warning: checktime reached, running e2fsck is recommended
EXT4-fs error (device loop0): ext4_mb_generate_buddy:758: group 2, block bitmap and bg descriptor inconsistent: 6117 vs 0 free clusters
Aborting journal on device loop0-8.
EXT4-fs (loop0): Remounting filesystem read-only
EXT4-fs error (device loop0) in ext4_free_blocks:4895: Journal has aborted
EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted
EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted
EXT4-fs error (device loop0) in ext4_ext_remove_space:3068: IO failure
EXT4-fs error (device loop0) in ext4_ext_truncate:4667: Journal has aborted
EXT4-fs error (device loop0) in ext4_orphan_del:2927: Journal has aborted
EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted
EXT4-fs (loop0): Inode 16 (00000000618192a0): orphan list check failed!
[...]
EXT4-fs (loop0): Inode 16 (0000000061819748): orphan list check failed!
[...]
EXT4-fs (loop0): Inode 16 (0000000061819bf0): orphan list check failed!
[...]
See-also: c9eb13a9105 ("ext4: fix hang when processing corrupted orphaned inode list")
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 6a7fd522a7c94cdef0a3b08acf8e6702056e635c upstream.
If ext4_fill_super() fails early, it's possible for ext4_evict_inode()
to call ext4_should_journal_data() before superblock options and flags
are fully set up. In that case, the iput() on the journal inode can
end up causing a BUG().
Work around this problem by reordering the tests so we only call
ext4_should_journal_data() after we know it's not the journal inode.
Fixes: 2d859db3e4 ("ext4: fix data corruption in inodes with journalled data")
Fixes: 2b405bfa84 ("ext4: fix data=journal fast mount/umount hang")
Cc: Jan Kara <jack@suse.cz>
Cc: stable@vger.kernel.org
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit f70749ca42943faa4d4dcce46dfdcaadb1d0c4b6 upstream.
An extent with lblock = 4294967295 and len = 1 will pass the
ext4_valid_extent() test:
ext4_lblk_t last = lblock + len - 1;
if (len == 0 || lblock > last)
return 0;
since last = 4294967295 + 1 - 1 = 4294967295. This would later trigger
the BUG_ON(es->es_lblk + es->es_len < es->es_lblk) in ext4_es_end().
We can simplify it by removing the - 1 altogether and changing the test
to use lblock + len <= lblock, since now if len = 0, then lblock + 0 ==
lblock and it fails, and if len > 0 then lblock + len > lblock in order
to pass (i.e. it doesn't overflow).
Fixes: 5946d0893 ("ext4: check for overlapping extents in ext4_valid_extent_entries()")
Fixes: 2f974865f ("ext4: check for zero length extent explicitly")
Cc: Eryu Guan <guaneryu@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Phil Turnbull <phil.turnbull@oracle.com>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 7bc9491645118c9461bd21099c31755ff6783593 upstream.
Although the extent tree depth of 5 should enough be for the worst
case of 2*32 extents of length 1, the extent tree code does not
currently to merge nodes which are less than half-full with a sibling
node, or to shrink the tree depth if possible. So it's possible, at
least in theory, for the tree depth to be greater than 5. However,
even in the worst case, a tree depth of 32 is highly unlikely, and if
the file system is maliciously corrupted, an insanely large eh_depth
can cause memory allocation failures that will trigger kernel warnings
(here, eh_depth = 65280):
JBD2: ext4.exe wants too many credits credits:195849 rsv_credits:0 max:256
------------[ cut here ]------------
WARNING: CPU: 0 PID: 50 at fs/jbd2/transaction.c:293 start_this_handle+0x569/0x580
CPU: 0 PID: 50 Comm: ext4.exe Not tainted 4.7.0-rc5+ #508
Stack:
604a8947 625badd8 0002fd09 00000000
60078643 00000000 62623910 601bf9bc
62623970 6002fc84 626239b0 900000125
Call Trace:
[<6001c2dc>] show_stack+0xdc/0x1a0
[<601bf9bc>] dump_stack+0x2a/0x2e
[<6002fc84>] __warn+0x114/0x140
[<6002fdff>] warn_slowpath_null+0x1f/0x30
[<60165829>] start_this_handle+0x569/0x580
[<60165d4e>] jbd2__journal_start+0x11e/0x220
[<60146690>] __ext4_journal_start_sb+0x60/0xa0
[<60120a81>] ext4_truncate+0x131/0x3a0
[<60123677>] ext4_setattr+0x757/0x840
[<600d5d0f>] notify_change+0x16f/0x2a0
[<600b2b16>] do_truncate+0x76/0xc0
[<600c3e56>] path_openat+0x806/0x1300
[<600c55c9>] do_filp_open+0x89/0xf0
[<600b4074>] do_sys_open+0x134/0x1e0
[<600b4140>] SyS_open+0x20/0x30
[<6001ea68>] handle_syscall+0x88/0x90
[<600295fd>] userspace+0x3fd/0x500
[<6001ac55>] fork_handler+0x85/0x90
---[ end trace 08b0b88b6387a244 ]---
[ Commit message modified and the extent tree depath check changed
from 5 to 32 -- tytso ]
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 935244cd54b86ca46e69bc6604d2adfb1aec2d42 upstream.
Currently, in ext4_mb_init(), there's a loop like the following:
do {
...
offset += 1 << (sb->s_blocksize_bits - i);
i++;
} while (i <= sb->s_blocksize_bits + 1);
Note that the updated offset is used in the loop's next iteration only.
However, at the last iteration, that is at i == sb->s_blocksize_bits + 1,
the shift count becomes equal to (unsigned)-1 > 31 (c.f. C99 6.5.7(3))
and UBSAN reports
UBSAN: Undefined behaviour in fs/ext4/mballoc.c:2621:15
shift exponent 4294967295 is too large for 32-bit type 'int'
[...]
Call Trace:
[<ffffffff818c4d25>] dump_stack+0xbc/0x117
[<ffffffff818c4c69>] ? _atomic_dec_and_lock+0x169/0x169
[<ffffffff819411ab>] ubsan_epilogue+0xd/0x4e
[<ffffffff81941cac>] __ubsan_handle_shift_out_of_bounds+0x1fb/0x254
[<ffffffff81941ab1>] ? __ubsan_handle_load_invalid_value+0x158/0x158
[<ffffffff814b6dc1>] ? kmem_cache_alloc+0x101/0x390
[<ffffffff816fc13b>] ? ext4_mb_init+0x13b/0xfd0
[<ffffffff814293c7>] ? create_cache+0x57/0x1f0
[<ffffffff8142948a>] ? create_cache+0x11a/0x1f0
[<ffffffff821c2168>] ? mutex_lock+0x38/0x60
[<ffffffff821c23ab>] ? mutex_unlock+0x1b/0x50
[<ffffffff814c26ab>] ? put_online_mems+0x5b/0xc0
[<ffffffff81429677>] ? kmem_cache_create+0x117/0x2c0
[<ffffffff816fcc49>] ext4_mb_init+0xc49/0xfd0
[...]
Observe that the mentioned shift exponent, 4294967295, equals (unsigned)-1.
Unless compilers start to do some fancy transformations (which at least
GCC 6.0.0 doesn't currently do), the issue is of cosmetic nature only: the
such calculated value of offset is never used again.
Silence UBSAN by introducing another variable, offset_incr, holding the
next increment to apply to offset and adjust that one by right shifting it
by one position per loop iteration.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=114701
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=112161
Cc: stable@vger.kernel.org
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit b5cb316cdf3a3f5f6125412b0f6065185240cfdc upstream.
Currently, in mb_find_order_for_block(), there's a loop like the following:
while (order <= e4b->bd_blkbits + 1) {
...
bb += 1 << (e4b->bd_blkbits - order);
}
Note that the updated bb is used in the loop's next iteration only.
However, at the last iteration, that is at order == e4b->bd_blkbits + 1,
the shift count becomes negative (c.f. C99 6.5.7(3)) and UBSAN reports
UBSAN: Undefined behaviour in fs/ext4/mballoc.c:1281:11
shift exponent -1 is negative
[...]
Call Trace:
[<ffffffff818c4d35>] dump_stack+0xbc/0x117
[<ffffffff818c4c79>] ? _atomic_dec_and_lock+0x169/0x169
[<ffffffff819411bb>] ubsan_epilogue+0xd/0x4e
[<ffffffff81941cbc>] __ubsan_handle_shift_out_of_bounds+0x1fb/0x254
[<ffffffff81941ac1>] ? __ubsan_handle_load_invalid_value+0x158/0x158
[<ffffffff816e93a0>] ? ext4_mb_generate_from_pa+0x590/0x590
[<ffffffff816502c8>] ? ext4_read_block_bitmap_nowait+0x598/0xe80
[<ffffffff816e7b7e>] mb_find_order_for_block+0x1ce/0x240
[...]
Unless compilers start to do some fancy transformations (which at least
GCC 6.0.0 doesn't currently do), the issue is of cosmetic nature only: the
such calculated value of bb is never used again.
Silence UBSAN by introducing another variable, bb_incr, holding the next
increment to apply to bb and adjust that one by right shifting it by one
position per loop iteration.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=114701
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=112161
Cc: stable@vger.kernel.org
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit c9eb13a9105e2e418f72e46a2b6da3f49e696902 upstream.
If the orphaned inode list contains inode #5, ext4_iget() returns a
bad inode (since the bootloader inode should never be referenced
directly). Because of the bad inode, we end up processing the inode
repeatedly and this hangs the machine.
This can be reproduced via:
mke2fs -t ext4 /tmp/foo.img 100
debugfs -w -R "ssv last_orphan 5" /tmp/foo.img
mount -o loop /tmp/foo.img /mnt
(But don't do this if you are using an unpatched kernel if you care
about the system staying functional. :-)
This bug was found by the port of American Fuzzy Lop into the kernel
to find file system problems[1]. (Since it *only* happens if inode #5
shows up on the orphan list --- 3, 7, 8, etc. won't do it, it's not
surprising that AFL needed two hours before it found it.)
[1] http://events.linuxfoundation.org/sites/events/files/slides/AFL%20filesystem%20fuzzing%2C%20Vault%202016_0.pdf
Cc: stable@vger.kernel.org
Reported by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit affa80bd97f7ca282d1faa91667b3ee9e4c590e6 upstream.
When running a 32-bit userspace on a 64-bit kernel, the UI_SET_PHYS
ioctl needs to be treated with special care, as it has the pointer
size encoded in the command.
Signed-off-by: Ricky Liang <jcliang@chromium.org>
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit f626300a3e776ccc9671b0dd94698fb3aa315966 upstream.
tcp_select_initial_window() intends to advertise a window
scaling for the maximum possible window size. To do so,
it considers the maximum of net.ipv4.tcp_rmem[2] and
net.core.rmem_max as the only possible upper-bounds.
However, users with CAP_NET_ADMIN can use SO_RCVBUFFORCE
to set the socket's receive buffer size to values
larger than net.ipv4.tcp_rmem[2] and net.core.rmem_max.
Thus, SO_RCVBUFFORCE is effectively ignored by
tcp_select_initial_window().
To fix this, consider the maximum of net.ipv4.tcp_rmem[2],
net.core.rmem_max and socket's initial buffer space.
Fixes: b0573dea1f ("[NET]: Introduce SO_{SND,RCV}BUFFORCE socket options")
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Suggested-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit ce3cf4ec0305919fc69a972f6c2b2efd35d36abc upstream.
The v6 tcp stats scan do not provide TLP and ER timer information
correctly like the v4 version . This patch fixes that.
Fixes: 6ba8a3b19e ("tcp: Tail loss probe (TLP)")
Fixes: eed530b6c6 ("tcp: early retransmit")
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 7f556567036cb7f89aabe2f0954b08566b4efb53 upstream.
The well-spotted fallocate undo fix is good in most cases, but not when
fallocate failed on the very first page. index 0 then passes lend -1
to shmem_undo_range(), and that has two bad effects: (a) that it will
undo every fallocation throughout the file, unrestricted by the current
range; but more importantly (b) it can cause the undo to hang, because
lend -1 is treated as truncation, which makes it keep on retrying until
every page has gone, but those already fully instantiated will never go
away. Big thank you to xfstests generic/269 which demonstrates this.
Fixes: b9b4bb26af01 ("tmpfs: don't undo fallocate past its last page")
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit b9b4bb26af017dbe930cd4df7f9b2fc3a0497bfe upstream.
When fallocate is interrupted it will undo a range that extends one byte
past its range of allocated pages. This can corrupt an in-use page by
zeroing out its first byte. Instead, undo using the inclusive byte
range.
Fixes: 1635f6a741 ("tmpfs: undo fallocation on failure")
Link: http://lkml.kernel.org/r/1462713387-16724-1-git-send-email-anthony.romano@coreos.com
Signed-off-by: Anthony Romano <anthony.romano@coreos.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Brandon Philips <brandon@ifup.co>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 8d91f8b15361dfb438ab6eb3b319e2ded43458ff upstream.
@console_may_schedule tracks whether console_sem was acquired through
lock or trylock. If the former, we're inside a sleepable context and
console_conditional_schedule() performs cond_resched(). This allows
console drivers which use console_lock for synchronization to yield
while performing time-consuming operations such as scrolling.
However, the actual console outputting is performed while holding
irq-safe logbuf_lock, so console_unlock() clears @console_may_schedule
before starting outputting lines. Also, only a few drivers call
console_conditional_schedule() to begin with. This means that when a
lot of lines need to be output by console_unlock(), for example on a
console registration, the task doing console_unlock() may not yield for
a long time on a non-preemptible kernel.
If this happens with a slow console devices, for example a serial
console, the outputting task may occupy the cpu for a very long time.
Long enough to trigger softlockup and/or RCU stall warnings, which in
turn pile more messages, sometimes enough to trigger the next cycle of
warnings incapacitating the system.
Fix it by making console_unlock() insert cond_resched() between lines if
@console_may_schedule.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Calvin Owens <calvinowens@fb.com>
Acked-by: Jan Kara <jack@suse.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[ciwillia@brocade.com: adjust context for 3.10.y]
Signed-off-by: Chas Williams <ciwillia@brocade.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 3c0415fa08548e3bc63ef741762664497ab187ed upstream.
This patch adds support for 0x1206 PID of Telit LE910.
Since the interfaces positions are the same than the ones for
0x1043 PID of Telit LE922, telit_le922_blacklist_usbcfg3 is used.
Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 7e8b3dfef16375dbfeb1f36a83eb9f27117c51fd upstream.
The HOSTPC extension registers found in some EHCI implementations form
a variable-length array, with one element for each port. Therefore
the hostpc field in struct ehci_regs should be declared as a
zero-length array, not a single-element array.
This fixes a problem reported by UBSAN.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Wilfried Klaebe <linux-kernel@lebenslange-mailadresse.de>
Tested-by: Wilfried Klaebe <linux-kernel@lebenslange-mailadresse.de>
CC: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Ben Hutchings reported that two patches were incorrectly backported
to 3.10 :
- ddbe1fca0bcb ("USB: Add device quirk for ASUS T100 Base Station keyboard")
- ad87e03213b5 ("USB: add quirk for devices with broken LPM")
These two patches introduce quirks which must be in usb_quirk_list and
not in usb_interface_quirk_list. These last one must only contain the
Logitech UVC camera.
Change-Id: I634599562ae47d00c4e2eeea302cb6bd574a47a4
Reported-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 69828dce7af2cb6d08ef5a03de687d422fb7ec1f upstream.
Sending SI_TKILL from rt_[tg]sigqueueinfo was deprecated, so now we issue
a warning on the first attempt of doing it. We use WARN_ON_ONCE, which is
not informative and, what is worse, taints the kernel, making the trinity
syscall fuzzer complain false-positively from time to time.
It does not look like we need this warning at all, because the behaviour
changed quite a long time ago (2.6.39), and if an application relies on
the old API, it gets EPERM anyway and can issue a warning by itself.
So let us zap the warning in kernel.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Vinson Lee <vlee@freedesktop.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 63ecb81aadf1c823c85c70a2bfd1ec9df3341a72 upstream.
commit d7591f0c41ce3e67600a982bab6989ef0f07b3ce upstream
The three variants use same copy&pasted code, condense this into a
helper and use that.
Make sure info.name is 0-terminated.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit d26e2c9ffa385dd1b646f43c1397ba12af9ed431 upstream.
This partially reverts commit 1086bbe97a07 ("netfilter: ensure number of
counters is >0 in do_replace()") in net/bridge/netfilter/ebtables.c.
Setting rules with ebtables does not work any more with 1086bbe97a07 place.
There is an error message and no rules set in the end.
e.g.
~# ebtables -t nat -A POSTROUTING --src 12:34:56:78:9a:bc -j DROP
Unable to update the kernel. Two possible causes:
1. Multiple ebtables programs were executing simultaneously. The ebtables
userspace tool doesn't by default support multiple ebtables programs
running
Reverting the ebtables part of 1086bbe97a07 makes this work again.
Signed-off-by: Bernhard Thaler <bernhard.thaler@wvnet.at>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 09d9686047dbbe1cf4faa558d3ecc4aae2046054 upstream.
This looks like refactoring, but its also a bug fix.
Problem is that the compat path (32bit iptables, 64bit kernel) lacks a few
sanity tests that are done in the normal path.
For example, we do not check for underflows and the base chain policies.
While its possible to also add such checks to the compat path, its more
copy&pastry, for instance we cannot reuse check_underflow() helper as
e->target_offset differs in the compat case.
Other problem is that it makes auditing for validation errors harder; two
places need to be checked and kept in sync.
At a high level 32 bit compat works like this:
1- initial pass over blob:
validate match/entry offsets, bounds checking
lookup all matches and targets
do bookkeeping wrt. size delta of 32/64bit structures
assign match/target.u.kernel pointer (points at kernel
implementation, needed to access ->compatsize etc.)
2- allocate memory according to the total bookkeeping size to
contain the translated ruleset
3- second pass over original blob:
for each entry, copy the 32bit representation to the newly allocated
memory. This also does any special match translations (e.g.
adjust 32bit to 64bit longs, etc).
4- check if ruleset is free of loops (chase all jumps)
5-first pass over translated blob:
call the checkentry function of all matches and targets.
The alternative implemented by this patch is to drop steps 3&4 from the
compat process, the translation is changed into an intermediate step
rather than a full 1:1 translate_table replacement.
In the 2nd pass (step #3), change the 64bit ruleset back to a kernel
representation, i.e. put() the kernel pointer and restore ->u.user.name .
This gets us a 64bit ruleset that is in the format generated by a 64bit
iptables userspace -- we can then use translate_table() to get the
'native' sanity checks.
This has two drawbacks:
1. we re-validate all the match and target entry structure sizes even
though compat translation is supposed to never generate bogus offsets.
2. we put and then re-lookup each match and target.
THe upside is that we get all sanity tests and ruleset validations
provided by the normal path and can remove some duplicated compat code.
iptables-restore time of autogenerated ruleset with 300k chains of form
-A CHAIN0001 -m limit --limit 1/s -j CHAIN0002
-A CHAIN0002 -m limit --limit 1/s -j CHAIN0003
shows no noticeable differences in restore times:
old: 0m30.796s
new: 0m31.521s
64bit: 0m25.674s
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 1086bbe97a074844188c6c988fa0b1a98c3ccbb9 upstream.
After improving setsockopt() coverage in trinity, I started triggering
vmalloc failures pretty reliably from this code path:
warn_alloc_failed+0xe9/0x140
__vmalloc_node_range+0x1be/0x270
vzalloc+0x4b/0x50
__do_replace+0x52/0x260 [ip_tables]
do_ipt_set_ctl+0x15d/0x1d0 [ip_tables]
nf_setsockopt+0x65/0x90
ip_setsockopt+0x61/0xa0
raw_setsockopt+0x16/0x60
sock_common_setsockopt+0x14/0x20
SyS_setsockopt+0x71/0xd0
It turns out we don't validate that the num_counters field in the
struct we pass in from userspace is initialized.
The same problem also exists in ebtables, arptables, ipv6, and the
compat variants.
Signed-off-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 7b7eba0f3515fca3296b8881d583f7c1042f5226 upstream.
Quoting John Stultz:
In updating a 32bit arm device from 4.6 to Linus' current HEAD, I
noticed I was having some trouble with networking, and realized that
/proc/net/ip_tables_names was suddenly empty.
Digging through the registration process, it seems we're catching on the:
if (strcmp(t->u.user.name, XT_STANDARD_TARGET) == 0 &&
target_offset + sizeof(struct xt_standard_target) != next_offset)
return -EINVAL;
Where next_offset seems to be 4 bytes larger then the
offset + standard_target struct size.
next_offset needs to be aligned via XT_ALIGN (so we can access all members
of ip(6)t_entry struct).
This problem didn't show up on i686 as it only needs 4-byte alignment for
u64, but iptables userspace on other 32bit arches does insert extra padding.
Reported-by: John Stultz <john.stultz@linaro.org>
Tested-by: John Stultz <john.stultz@linaro.org>
Fixes: 7ed2abddd20cf ("netfilter: x_tables: check standard target size too")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 13631bfc604161a9d69cd68991dff8603edd66f9 upstream.
Validate that all matches (if any) add up to the beginning of
the target and that each match covers at least the base structure size.
The compat path should be able to safely re-use the function
as the structures only differ in alignment; added a
BUILD_BUG_ON just in case we have an arch that adds padding as well.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit ce683e5f9d045e5d67d1312a42b359cb2ab2a13c upstream.
We're currently asserting that targetoff + targetsize <= nextoff.
Extend it to also check that targetoff is >= sizeof(xt_entry).
Since this is generic code, add an argument pointing to the start of the
match/target, we can then derive the base structure size from the delta.
We also need the e->elems pointer in a followup change to validate matches.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 7ed2abddd20cf8f6bd27f65bd218f26fa5bf7f44 upstream.
We have targets and standard targets -- the latter carries a verdict.
The ip/ip6tables validation functions will access t->verdict for the
standard targets to fetch the jump offset or verdict for chainloop
detection, but this happens before the targets get checked/validated.
Thus we also need to check for verdict presence here, else t->verdict
can point right after a blob.
Spotted with UBSAN while testing malformed blobs.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit fc1221b3a163d1386d1052184202d5dc50d302d1 upstream.
32bit rulesets have different layout and alignment requirements, so once
more integrity checks get added to xt_check_entry_offsets it will reject
well-formed 32bit rulesets.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit a08e4e190b866579896c09af59b3bdca821da2cd upstream.
The target size includes the size of the xt_entry_target struct.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit aa412ba225dd3bc36d404c28cdc3d674850d80d0 upstream.
Once we add more sanity testing to xt_check_entry_offsets it
becomes relvant if we're expecting a 32bit 'config_compat' blob
or a normal one.
Since we already have a lot of similar-named functions (check_entry,
compat_check_entry, find_and_check_entry, etc.) and the current
incarnation is short just fold its contents into the callers.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 7d35812c3214afa5b37a675113555259cfd67b98 upstream.
Currently arp/ip and ip6tables each implement a short helper to check that
the target offset is large enough to hold one xt_entry_target struct and
that t->u.target_size fits within the current rule.
Unfortunately these checks are not sufficient.
To avoid adding new tests to all of ip/ip6/arptables move the current
checks into a helper, then extend this helper in followup patches.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Because firmware caching generates uevent messages that are sent over a
netlink socket, it can prevent suspend on many platforms. It's also not
always useful, so make it a configurable option.
bug 32180327
Change-Id: I1250512b27edb56caa78d536e5ccf1fb669476ad
Signed-off-by: Ajay Dudani <adudani@google.com>
The man page for open(2) indicates that when O_CREAT is specified, the
'mode' argument applies only to future accesses to the file:
Note that this mode applies only to future accesses of the newly
created file; the open() call that creates a read-only file
may well return a read/write file descriptor.
The man page for open(2) implies that 'mode' is treated identically by
O_CREAT and O_TMPFILE.
O_TMPFILE, however, behaves differently:
int fd = open("/tmp", O_TMPFILE | O_RDWR, 0);
assert(fd == -1);
assert(errno == EACCES);
int fd = open("/tmp", O_TMPFILE | O_RDWR, 0600);
assert(fd > 0);
For O_CREAT, do_last() sets acc_mode to MAY_OPEN only:
if (*opened & FILE_CREATED) {
/* Don't check for write permission, don't truncate */
open_flag &= ~O_TRUNC;
will_truncate = false;
acc_mode = MAY_OPEN;
path_to_nameidata(path, nd);
goto finish_open_created;
}
But for O_TMPFILE, do_tmpfile() passes the full op->acc_mode to
may_open().
This patch lines up the behavior of O_TMPFILE with O_CREAT. After the
inode is created, may_open() is called with acc_mode = MAY_OPEN, in
do_tmpfile().
A different, but related glibc bug revealed the discrepancy:
https://sourceware.org/bugzilla/show_bug.cgi?id=17523
The glibc lazily loads the 'mode' argument of open() and openat() using
va_arg() only if O_CREAT is present in 'flags' (to support both the 2
argument and the 3 argument forms of open; same idea for openat()).
However, the glibc ignores the 'mode' argument if O_TMPFILE is in
'flags'.
On x86_64, for open(), it magically works anyway, as 'mode' is in
RDX when entering open(), and is still in RDX on SYSCALL, which is where
the kernel looks for the 3rd argument of a syscall.
But openat() is not quite so lucky: 'mode' is in RCX when entering the
glibc wrapper for openat(), while the kernel looks for the 4th argument
of a syscall in R10. Indeed, the syscall calling convention differs from
the regular calling convention in this respect on x86_64. So the kernel
sees mode = 0 when trying to use glibc openat() with O_TMPFILE, and
fails with EACCES.
Change-Id: Ib052bbc6fcc68d3060f91732a78ddbff6f71e0a6
Signed-off-by: Eric Rannaud <e@nanocritical.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As comment in include/uapi/asm-generic/fcntl.h described, when
introducing new O_* bits, we need to check its uniqueness in
fcntl_init(). But __O_TMPFILE bit is missing. So fix it.
Change-Id: I5c6ddf8ef05e96e4799bac823caa7dd60ce558dd
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
O_TMPFILE, like O_CREAT, should respect the requested mode and should
create regular files.
This fixes two bugs: O_TMPFILE required privilege (because the mode
ended up as 000) and it produced bogus inodes with no type.
Change-Id: Ie4da9ede57e481c7edb113c5bc6329fefef41f4e
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[suggested by Rasmus Villemoes] make O_DIRECTORY | O_RDWR part of O_TMPFILE;
that will fail on old kernels in a lot more cases than what I came up with.
And make sure O_CREAT doesn't get there...
Change-Id: Iaa3c8b487d44515b539150bdb5d0b749b87d3ea2
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>