In function ProcSetReqInternal, valueLen is obtained from the
message buffer pParam. This valueLen is used as argument to the
function GetStrValue where the contents of the buffer pParam is
copied to pMac->cfg.gSBuffer for valueLen number of bytes. However
the array pMac->cfg.gSBuffer is a static array of size CFG_MAX_STR_LEN.
If the value of valueLen exceeds CFG_MAX_STR_LEN, a buffer overwrite
will occur in GetStrValue.
Add Sanity check to make sure valueLen does not exceed CFG_MAX_STR_LEN.
Change-Id: Id16d4c4b8d2414c00a0fae8f8292f011d0763b84
CRs-Fixed: 2143847
The commit "qcacld-2.0: Fix incorrect length of encrypted auth frame" is
already allocating and setting memory for encrAuthFrame. Don't allocate and
set the memory twice.
Change-Id: Id5c30d4213b9e41040bca303d42f990b0a9932c9
WPA RSN IE is copied from source without a check on the given IE length.
A malicious IE length can cause buffer overflow.
Add maximum bound check on WPA RSN IE length.
Change-Id: Id159d307e8f9c1de720d4553a7c29f23cbd28571
CRs-Fixed: 2033213
STA is not able to connect to AP configured with WEP shared
due to incorrect frame length of encrypted auth frame.
Fix this by using the correct frame length.
Bug: 67754642
Change-Id: Ida8d78b512ecf79314200a7c96f5b5c293e5474e
Signed-off-by: Srinivas Girigowda <sgirigow@codeaurora.org>
Memory for encrypted auth frame is allocated based on macro
SIR_MAC_AUTH_CHALLENGE_LENGTH. SIR_MAC_AUTH_CHALLENGE_LENGTH
was updated to 253 from 128. Auth failure is observed on
receiving challenge text of length 128.
Fix is to use length based on the challenge text received.
Change-Id: I9a8b1a05d36421cfab2bf699fe38c50e150cf464
CRs-Fixed: 2100554
Bug: 67030205
Signed-off-by: Srinivas Girigowda <sgirigow@codeaurora.org>
An incorrect IE length can overflow the remaining length variable
and make IE parsing logic perform a buffer over-read.
Check on IE length to avoid buffer over-read.
Bug: 63868629
Change-Id: I20ef6a0136c7a5b602ad15a2fb725f20807b81d0
CRs-Fixed: 2033195
Signed-off-by: Ecco Park <eccopark@google.com>
qcacld-3.0 to qcacld-2.0 propagation
Add check for buffer length in function sme_set_ft_ies.
Bug: 64431968
Change-Id: I7adc56e23316c0ceb193a5bdf8c4c0b5f4fbd20a
CRs-Fixed: 2070583
Signed-off-by: Ecco Park <eccopark@google.com>
Fix CVE-2017-11035
qcacld-3.0 to qcacld-2.0 propagation.
Fix incorrect processing of encrypted auth frame by allocating
appropriate local buffer and using correct type for frame length.
Change-Id: I87d6f4c3c43dd332d5b1877ddf4b3b46a717468b
CRs-Fixed: 2082544
Fix CVE-2017-11015
Change-Id: I7cb934fa97e0250fdc62eec74000f0dd5b323633
Currently limProcessAuthFrame stack frame size exceeds 1024 and causes
build failures for 32 bit platforms.
Move multiple variables from local to dynamic allocation to reduce the
frame size of limProcessAuthFrame.
Change-Id: I83cf5ab24693e0ce012894d808ac79bf37fa9a08
CRs-Fixed: 2083572
Fix CVE-2017-11015
Change-Id: Ifb1971d07ba99705f14d693a6d9a484f71a48c67
qcacld-3.0 to prima propagation
In function rrmProcessBeaconReportReq, add bound check before
writing to channel list which is of fixed size.
Change-Id: I3c80974bba84a96f7b85e4ce62bbb01c23b4babf
CRs-Fixed: 2072774
Fix CVE-2017-11014
Change-Id: Ie5ec655f449093b8b5042a398d94b8342df60e3e
qcacld-3.0 to qcacld-2.0 propagation
Update SIR_MAC_AUTH_CHALLENGE_LENGTH to 253 as per IEEE spec.
Currently value of SIR_MAC_AUTH_CHALLENGE_LENGTH is set to 128.
This may result in potential buffer overflow since frame parser
allows challenge text of length upto 253 but driver can not handle
challenge text longer than 128 bytes.
Change-Id: I7baf860fdde51a14a6573b4f0f26817f5071193e
CRs-Fixed: 2072937
Fix CVE-2017-11015
Change-Id: Ia8aafbb92ac089449d9ea448e45bbb4678d4bd36
qcacld-3.0 to qcacld-2.0 propagation
Update limComputeCrc32() to pass uint16_t as a length type.
Currently uint8_t is being passed as length and there will be type
mismatch when authentication frame to be encrypted will be larger
than 255 bytes.
Change-Id: Ic009197c13a2d70c9015a184acff2e82bf80eaba
CRs-Fixed: 2072937
fix CVE-2017-11015
Change-Id: I0d2044fee3d597493d6c846de4122b6472a45b5e
Check if a IE has been encountered more than max possible for that IE while
parsing a frame.
Change-Id: I1054c7df18780469849be55fc4343f09ac502a49
CRs-Fixed: 2069927
Fix CVE-2017-11013
Change-Id: I41b97a29cf984e0fc605a22f6f6abfc07880976c
This switches over to propagation_next to respect
namepsace semantics.
Test: Remounting to change the options of a fs with mount based
options should propagate to all shared copies of that mount,
and the slaves/indirect slaves of those.
Bug: 122428178
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Change-Id: Ic35cd2782a646435689f5bedfa1f218fe4ab8254
commit 5ec0811d30378ae104f250bfc9b3640242d81e3f upstream.
When the first propgated copy was a slave the following oops would result:
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
> IP: [<ffffffff811fba4e>] propagate_one+0xbe/0x1c0
> PGD bacd4067 PUD bac66067 PMD 0
> Oops: 0000 [#1] SMP
> Modules linked in:
> CPU: 1 PID: 824 Comm: mount Not tainted 4.6.0-rc5userns+ #1523
> Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
> task: ffff8800bb0a8000 ti: ffff8800bac3c000 task.ti: ffff8800bac3c000
> RIP: 0010:[<ffffffff811fba4e>] [<ffffffff811fba4e>] propagate_one+0xbe/0x1c0
> RSP: 0018:ffff8800bac3fd38 EFLAGS: 00010283
> RAX: 0000000000000000 RBX: ffff8800bb77ec00 RCX: 0000000000000010
> RDX: 0000000000000000 RSI: ffff8800bb58c000 RDI: ffff8800bb58c480
> RBP: ffff8800bac3fd48 R08: 0000000000000001 R09: 0000000000000000
> R10: 0000000000001ca1 R11: 0000000000001c9d R12: 0000000000000000
> R13: ffff8800ba713800 R14: ffff8800bac3fda0 R15: ffff8800bb77ec00
> FS: 00007f3c0cd9b7e0(0000) GS:ffff8800bfb00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000010 CR3: 00000000bb79d000 CR4: 00000000000006e0
> Stack:
> ffff8800bb77ec00 0000000000000000 ffff8800bac3fd88 ffffffff811fbf85
> ffff8800bac3fd98 ffff8800bb77f080 ffff8800ba713800 ffff8800bb262b40
> 0000000000000000 0000000000000000 ffff8800bac3fdd8 ffffffff811f1da0
> Call Trace:
> [<ffffffff811fbf85>] propagate_mnt+0x105/0x140
> [<ffffffff811f1da0>] attach_recursive_mnt+0x120/0x1e0
> [<ffffffff811f1ec3>] graft_tree+0x63/0x70
> [<ffffffff811f1f6b>] do_add_mount+0x9b/0x100
> [<ffffffff811f2c1a>] do_mount+0x2aa/0xdf0
> [<ffffffff8117efbe>] ? strndup_user+0x4e/0x70
> [<ffffffff811f3a45>] SyS_mount+0x75/0xc0
> [<ffffffff8100242b>] do_syscall_64+0x4b/0xa0
> [<ffffffff81988f3c>] entry_SYSCALL64_slow_path+0x25/0x25
> Code: 00 00 75 ec 48 89 0d 02 22 22 01 8b 89 10 01 00 00 48 89 05 fd 21 22 01 39 8e 10 01 00 00 0f 84 e0 00 00 00 48 8b 80 d8 00 00 00 <48> 8b 50 10 48 89 05 df 21 22 01 48 89 15 d0 21 22 01 8b 53 30
> RIP [<ffffffff811fba4e>] propagate_one+0xbe/0x1c0
> RSP <ffff8800bac3fd38>
> CR2: 0000000000000010
> ---[ end trace 2725ecd95164f217 ]---
This oops happens with the namespace_sem held and can be triggered by
non-root users. An all around not pleasant experience.
To avoid this scenario when finding the appropriate source mount to
copy stop the walk up the mnt_master chain when the first source mount
is encountered.
Further rewrite the walk up the last_source mnt_master chain so that
it is clear what is going on.
The reason why the first source mount is special is that it it's
mnt_parent is not a mount in the dest_mnt propagation tree, and as
such termination conditions based up on the dest_mnt mount propgation
tree do not make sense.
To avoid other kinds of confusion last_dest is not changed when
computing last_source. last_dest is only used once in propagate_one
and that is above the point of the code being modified, so changing
the global variable is meaningless and confusing.
fixes: f2ebb3a921 ("smarter propagate_mnt()")
Reported-by: Tycho Andersen <tycho.andersen@canonical.com>
Reviewed-by: Seth Forshee <seth.forshee@canonical.com>
Tested-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: Ie55a2c52db9773b461acc6ebe427221acb7093f0
commit 7ae8fd0351f912b075149a1e03a017be8b903b9a upstream.
propagate_one(m) calculates "type" argument for copy_tree() like this:
> if (m->mnt_group_id == last_dest->mnt_group_id) {
> type = CL_MAKE_SHARED;
> } else {
> type = CL_SLAVE;
> if (IS_MNT_SHARED(m))
> type |= CL_MAKE_SHARED;
> }
The "type" argument then governs clone_mnt() behavior with respect to flags
and mnt_master of new mount. When we iterate through a slave group, it is
possible that both current "m" and "last_dest" are not shared (although,
both are slaves, i.e. have non-NULL mnt_master-s). Then the comparison
above erroneously makes new mount shared and sets its mnt_master to
last_source->mnt_master. The patch fixes the problem by handling zero
mnt_group_id-s as though they are unequal.
The similar problem exists in the implementation of "else" clause above
when we have to ascend upward in the master/slave tree by calling:
> last_source = last_source->mnt_master;
> last_dest = last_source->mnt_parent;
proper number of times. The last step is governed by
"n->mnt_group_id != last_dest->mnt_group_id" condition that may lie if
both are zero. The patch fixes this case in the same way as the former one.
[AV: don't open-code an obvious helper...]
Signed-off-by: Maxim Patlasov <mpatlasov@virtuozzo.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: I78454c89b1f672e49c8ffcf63d1339a3e371aa87
Add mount option unshared_obb to not link the obb
folders of multiple users together.
Bug: 27915347
Test: mount with option. Check if altering one obb
alters the other
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Change-Id: I3956e06bd0a222b0bbb2768c9a8a8372ada85e1e
unshared_obb was missing from show_options
bug: 133257717
Change-Id: I1bc49d1b4098052382a518540e5965e037aa39f1
Signed-off-by: Kevin F. Haggerty <haggertk@lineageos.org>
commit 98ca480a8f22fdbd768e3dad07024c8d4856576c upstream.
An ino is unsigned, so display it as such in /proc/locks.
Cc: stable@vger.kernel.org
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Change-Id: I250a495fe3fc809e880535347f462fe552644edf
File-private locks have been re-christened as "open file description"
locks. Finish the symbol name cleanup in the internal implementation.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Change-Id: Iee48047540a7d8fefb5078cc005ae9ea8994f521
File-private locks have been merged into Linux for v3.15, and *now*
people are commenting that the name and macro definitions for the new
file-private locks suck.
...and I can't even disagree. The names and command macros do suck.
We're going to have to live with these for a long time, so it's
important that we be happy with the names before we're stuck with them.
The consensus on the lists so far is that they should be rechristened as
"open file description locks".
The name isn't a big deal for the kernel, but the command macros are not
visually distinct enough from the traditional POSIX lock macros. The
glibc and documentation folks are recommending that we change them to
look like F_OFD_{GETLK|SETLK|SETLKW}. That lessens the chance that a
programmer will typo one of the commands wrong, and also makes it easier
to spot this difference when reading code.
This patch makes the following changes that I think are necessary before
v3.15 ships:
1) rename the command macros to their new names. These end up in the uapi
headers and so are part of the external-facing API. It turns out that
glibc doesn't actually use the fcntl.h uapi header, but it's hard to
be sure that something else won't. Changing it now is safest.
2) make the the /proc/locks output display these as type "OFDLCK"
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Carlos O'Donell <carlos@redhat.com>
Cc: Stefan Metzmacher <metze@samba.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Frank Filz <ffilzlnx@mindspring.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Change-Id: Ia975197281d4c80a4ad420d7621896d2f369cef6
Due to some unfortunate history, POSIX locks have very strange and
unhelpful semantics. The thing that usually catches people by surprise
is that they are dropped whenever the process closes any file descriptor
associated with the inode.
This is extremely problematic for people developing file servers that
need to implement byte-range locks. Developers often need a "lock
management" facility to ensure that file descriptors are not closed
until all of the locks associated with the inode are finished.
Additionally, "classic" POSIX locks are owned by the process. Locks
taken between threads within the same process won't conflict with one
another, which renders them useless for synchronization between threads.
This patchset adds a new type of lock that attempts to address these
issues. These locks conflict with classic POSIX read/write locks, but
have semantics that are more like BSD locks with respect to inheritance
and behavior on close.
This is implemented primarily by changing how fl_owner field is set for
these locks. Instead of having them owned by the files_struct of the
process, they are instead owned by the filp on which they were acquired.
Thus, they are inherited across fork() and are only released when the
last reference to a filp is put.
These new semantics prevent them from being merged with classic POSIX
locks, even if they are acquired by the same process. These locks will
also conflict with classic POSIX locks even if they are acquired by
the same process or on the same file descriptor.
The new locks are managed using a new set of cmd values to the fcntl()
syscall. The initial implementation of this converts these values to
"classic" cmd values at a fairly high level, and the details are not
exposed to the underlying filesystem. We may eventually want to push
this handing out to the lower filesystem code but for now I don't
see any need for it.
Also, note that with this implementation the new cmd values are only
available via fcntl64() on 32-bit arches. There's little need to
add support for legacy apps on a new interface like this.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Change-Id: I35691bdfed9cadcbbcb6ff6804d9eea1db661ddc
Once we introduce file private locks, we'll need to know what cmd value
was used, as that affects the ownership and whether a conflict would
arise.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Change-Id: Iaeb8233ae25bde5ef0049118ff94e4a9e0f02214
FL_FILE_PVT locks are no longer tied to a particular pid, and are
instead inheritable by child processes. Report a l_pid of '-1' for
these sorts of locks since the pid is somewhat meaningless for them.
This precedent comes from FreeBSD. There, POSIX and flock() locks can
conflict with one another. If fcntl(F_GETLK, ...) returns a lock set
with flock() then the l_pid member cannot be a process ID because the
lock is not held by a process as such.
Acked-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Change-Id: I7d702fcaaaf8592356926d51b60e53ee217ca747
In a later patch, we'll be adding a new type of lock that's owned by
the struct file instead of the files_struct. Those sorts of locks
will be flagged with a new FL_FILE_PVT flag.
Report these types of locks as "FLPVT" in /proc/locks to distinguish
them from "classic" POSIX locks.
Acked-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Change-Id: Id0b6d9c7a947b512e5683ad3b6188d73582c2de9
This function currently removes leases in addition to flock locks and in
a later patch we'll have it deal with file-private locks too. Rename it
to locks_remove_file to indicate that it removes locks that are
associated with a particular struct file, and not just flock locks.
Acked-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Change-Id: I1289cfbc02eb778532e984a29adffb02a9370cc1
We have many places where we want to check if a socket is
not a timewait or request socket. Use a helper to avoid
hard coding this.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[backported from net-next 1d0ab253872cdd3d8e7913f59c266c7fd01771d0]
[lorenzo@google.com: removed TCPF_NEW_SYN_RECV, and added a comment to add it back.]
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Bug: 24163529
Change-Id: Ibf09017e1ab00af5e6925273117c335d7f515d73
When I cooked commit c3658e8d0f ("tcp: fix possible NULL dereference in
tcp_vX_send_reset()") I missed other spots we could deref a NULL
skb_dst(skb)
Again, if a socket is provided, we do not need skb_dst() to get a
pointer to network namespace : sock_net(sk) is good enough.
[Backport of net-next 0f85feae6b710ced3abad5b2b47d31dfcb956b62]
Bug: 16355602
Change-Id: Ibe1def7979625ee7902bff2f33ec8945b9945948
Reported-by: Dann Frazier <dann.frazier@canonical.com>
Bisected-by: Dann Frazier <dann.frazier@canonical.com>
Tested-by: Dann Frazier <dann.frazier@canonical.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: ca777eff51 ("tcp: remove dst refcount false sharing for prequeue mode")
Signed-off-by: David S. Miller <davem@davemloft.net>
The mtu should be a __be32, not the mark.
Reported-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: Ie321dcc3652921f8f28491d39c8262268aeb22bc
One tricky issue on the ipv6 side vs. ipv4 is that the ICMP callouts
to handle the error pass the 32-bit info cookie in network byte order
whereas ipv4 passes it around in host byte order.
Like the ipv4 side, we have two helper functions. One for when we
have a socket context and one for when we do not.
ip6ip6 tunnels are not handled here, because they handle PMTU events
by essentially relaying another ICMP packet-too-big message back to
the original sender.
This patch allows us to get rid of rt6_do_pmtu_disc(). It handles all
kinds of situations that simply cannot happen when we do the PMTU
update directly using a fully resolved route.
In fact, the "plen == 128" check in ip6_rt_update_pmtu() can very
likely be removed or changed into a BUG_ON() check. We should never
have a prefixed ipv6 route when we get there.
Another piece of strange history here is that TCP and DCCP, unlike in
ipv4, never invoke the update_pmtu() method from their ICMP error
handlers. This is incredibly astonishing since this is the context
where we have the most accurate context in which to make a PMTU
update, namely we have a fully connected socket and associated cached
socket route.
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: Ibb5cae4316256108c5130459b07288c9fc7380c3
[ Upstream commit 001eabfd54c0cbf9d7d16264ddc8cc0bee67e3ed ]
This updates the bit sliced AES module to the latest version in the
upstream OpenSSL repository (e620e5ae37bc). This is needed to fix a
bug in the XTS decryption path, where data chunked in a certain way
could trigger the ciphertext stealing code, which is not supposed to
be active in the kernel build (The kernel implementation of XTS only
supports round multiples of the AES block size of 16 bytes, whereas
the conformant OpenSSL implementation of XTS supports inputs of
arbitrary size by applying ciphertext stealing). This is fixed in
the upstream version by adding the missing #ifndef XTS_CHAIN_TWEAK
around the offending instructions.
The upstream code also contains the change applied by Russell to
build the code unconditionally, i.e., even if __LINUX_ARM_ARCH__ < 7,
but implemented slightly differently.
Cc: stable@vger.kernel.org
Fixes: e4e7f10bfc ("ARM: add support for bit sliced AES using NEON instructions")
Reported-by: Adrian Kotelba <adrian.kotelba@gmail.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Change-Id: I94f417ae8b9830c76230cdb6a4870efa216715cd
A shared anonymous mapping created without MAP_NORESERVE holds memory
reservation for whole range of shmem segment. Usually there is no way
to change its size, but /proc/<pid>/map_files/... (available if
CONFIG_CHECKPOINT_RESTORE=y) allows that.
This patch adjusts the memory reservation in shmem_setattr().
Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: Ibaf48bed2a1eada5ffa96e550b7f1549d569e1b6
This patch fixes an "off-by-one" bug found in
581791f (FunctionFS: enable multiple functions).
During gfs_bind/gfs_unbind the functionfs_bind/functionfs_unbind should be
called for every functionfs instance. With the "i" pre-decremented they
were not called for the zeroth instance.
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrzej Pietrasiewicz <andrzej.p@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Cc: <stable@vger.kernel.org>
[ balbi@ti.com : added offending commit's subject ]
Signed-off-by: Felipe Balbi <balbi@ti.com>
Change-Id: Idf19c2d3842546fb0fb47f77f59a248b1caa3fcb
Modify the request_module to prefix the file system type with "fs-"
and add aliases to all of the filesystems that can be built as modules
to match.
A common practice is to build all of the kernel code and leave code
that is not commonly needed as modules, with the result that many
users are exposed to any bug anywhere in the kernel.
Looking for filesystems with a fs- prefix limits the pool of possible
modules that can be loaded by mount to just filesystems trivially
making things safer with no real cost.
Using aliases means user space can control the policy of which
filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
with blacklist and alias directives. Allowing simple, safe,
well understood work-arounds to known problematic software.
This also addresses a rare but unfortunate problem where the filesystem
name is not the same as it's module name and module auto-loading
would not work. While writing this patch I saw a handful of such
cases. The most significant being autofs that lives in the module
autofs4.
This is relevant to user namespaces because we can reach the request
module in get_fs_type() without having any special permissions, and
people get uncomfortable when a user specified string (in this case
the filesystem type) goes all of the way to request_module.
After having looked at this issue I don't think there is any
particular reason to perform any filtering or permission checks beyond
making it clear in the module request that we want a filesystem
module. The common pattern in the kernel is to call request_module()
without regards to the users permissions. In general all a filesystem
module does once loaded is call register_filesystem() and go to sleep.
Which means there is not much attack surface exposed by loading a
filesytem module unless the filesystem is mounted. In a user
namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
which most filesystems do not set today.
Change-Id: I623b13dbdb44bb9ba7481f29575e1ca4ad8102f4
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Kees Cook <keescook@chromium.org>
Reported-by: Kees Cook <keescook@google.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Kevin F. Haggerty <haggertk@lineageos.org>
This is needed for MTP to know if writes are aligned to packet size.
Change-Id: If504511e649d46eb8d52f1fafeda071dddeec263
Signed-off-by: Jerry Zhang <zhangjerry@google.com>
When writing the descriptors to the ep0 file of functionfs, the HID descriptors where not recognized which caused the initialization from user space to fail.
Signed-off-by: Koen Beel <koen.beel@barco.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Change-Id: Id0f930e12a84315995a3ea4d08757ba1f3b567be
There's a bunch of failure exits in ffs_fs_mount() with
seriously broken recovery logics. Most of that appears to stem
from misunderstanding of the ->kill_sb() semantics; unlike
->put_super() it is called for *all* superblocks of given type,
no matter how (in)complete the setup had been. ->put_super()
is called only if ->s_root is not NULL; any failure prior to
setting ->s_root will have the call of ->put_super() skipped.
->kill_sb(), OTOH, awaits every superblock that has come from
sget().
Current behaviour of ffs_fs_mount():
We have struct ffs_sb_fill_data data on stack there. We do
ffs_dev = functionfs_acquire_dev_callback(dev_name);
and store that in data.private_data. Then we call mount_nodev(),
passing it ffs_sb_fill() as a callback. That will either fail
outright, or manage to call ffs_sb_fill(). There we allocate an
instance of struct ffs_data, slap the value of ffs_dev (picked
from data.private_data) into ffs->private_data and overwrite
data.private_data by storing ffs into an overlapping member
(data.ffs_data). Then we store ffs into sb->s_fs_info and attempt
to set the rest of the things up (root inode, root dentry, then
create /ep0 there). Any of those might fail. Should that
happen, we get ffs_fs_kill_sb() called before mount_nodev()
returns. If mount_nodev() fails for any reason whatsoever,
we proceed to
functionfs_release_dev_callback(data.ffs_data);
That's broken in a lot of ways. Suppose the thing has failed in
allocation of e.g. root inode or dentry. We have
functionfs_release_dev_callback(ffs);
ffs_data_put(ffs);
done by ffs_fs_kill_sb() (ffs accessed via sb->s_fs_info), followed by
functionfs_release_dev_callback(ffs);
from ffs_fs_mount() (via data.ffs_data). Note that the second
functionfs_release_dev_callback() has every chance to be done to freed memory.
Suppose we fail *before* root inode allocation. What happens then?
ffs_fs_kill_sb() doesn't do anything to ffs (it's either not called at all,
or it doesn't have a pointer to ffs stored in sb->s_fs_info). And
functionfs_release_dev_callback(data.ffs_data);
is called by ffs_fs_mount(), but here we are in nasal daemon country - we
are reading from a member of union we'd never stored into. In practice,
we'll get what we used to store into the overlapping field, i.e. ffs_dev.
And then we get screwed, since we treat it (struct gfs_ffs_obj * in
disguise, returned by functionfs_acquire_dev_callback()) as struct
ffs_data *, pick what would've been ffs_data ->private_data from it
(*well* past the actual end of the struct gfs_ffs_obj - struct ffs_data
is much bigger) and poke in whatever it points to.
FWIW, there's a minor leak on top of all that in case if ffs_sb_fill()
fails on kstrdup() - ffs is obviously forgotten.
The thing is, there is no point in playing all those games with union.
Just allocate and initialize ffs_data *before* calling mount_nodev() and
pass a pointer to it via data.ffs_data. And once it's stored in
sb->s_fs_info, clear data.ffs_data, so that ffs_fs_mount() knows that
it doesn't need to kill the sucker manually - from that point on
we'll have it done by ->kill_sb().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Cc: stable <stable@vger.kernel.org> # 3.3+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: Ic3886c79018e4f06cf84d27c98ce5f80d7d9bbe9
Android uses ashmem for sharing memory regions. We are looking forward
to migrating all usecases of ashmem to memfd so that we can possibly
remove the ashmem driver in the future from staging while also
benefiting from using memfd and contributing to it. Note staging drivers
are also not ABI and generally can be removed at anytime.
One of the main usecases Android has is the ability to create a region
and mmap it as writeable, then add protection against making any
"future" writes while keeping the existing already mmap'ed
writeable-region active. This allows us to implement a usecase where
receivers of the shared memory buffer can get a read-only view, while
the sender continues to write to the buffer.
See CursorWindow documentation in Android for more details:
https://developer.android.com/reference/android/database/CursorWindow
This usecase cannot be implemented with the existing F_SEAL_WRITE seal.
To support the usecase, this patch adds a new F_SEAL_FUTURE_WRITE seal
which prevents any future mmap and write syscalls from succeeding while
keeping the existing mmap active.
Verified with test program at: https://lore.kernel.org/patchwork/patch/1008117/
Backport link: https://lore.kernel.org/patchwork/patch/1014892/
Bug: 113362644
Change-Id: If7424db3b64372932d455f0219cd9df613fec1d4
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Joel Fernandes <joelaf@google.com>
Running my likely/unlikely profiler, I discovered that the test in
shmem_write_begin() that tests for info->seals as unlikely, is always
incorrect. This is because shmem_get_inode() sets info->seals to have
F_SEAL_SEAL set by default, and it is unlikely to be cleared when
shmem_write_begin() is called. Thus, the if statement is very likely.
But as the if statement block only cares about F_SEAL_WRITE and
F_SEAL_GROW, change the test to only test those two bits.
Link: http://lkml.kernel.org/r/20170203105656.7aec6237@gandalf.local.home
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: I83b8fc6ebae581486df16842713ba83a37e3b858
The new header file memfd.h from commit 9183df25fe ("shm: add
memfd_create() syscall") should be exported.
Change-Id: I07ea7ae25765ece11180c0dcaed749918bf03a11
Signed-off-by: David Drysdale <drysdale@google.com>
Reviewed-by: David Herrmann <dh.herrmann@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If we set SEAL_WRITE on a file, we must make sure there cannot be any
ongoing write-operations on the file. For write() calls, we simply lock
the inode mutex, for mmap() we simply verify there're no writable
mappings. However, there might be pages pinned by AIO, Direct-IO and
similar operations via GUP. We must make sure those do not write to the
memfd file after we set SEAL_WRITE.
As there is no way to notify GUP users to drop pages or to wait for them
to be done, we implement the wait ourself: When setting SEAL_WRITE, we
check all pages for their ref-count. If it's bigger than 1, we know
there's some user of the page. We then mark the page and wait for up to
150ms for those ref-counts to be dropped. If the ref-counts are not
dropped in time, we refuse the seal operation.
Change-Id: I964c29a82f2a9a1077647b29b2073c9ef4e0a05a
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Ryan Lortie <desrt@desrt.ca>
Cc: Lennart Poettering <lennart@poettering.net>
Cc: Daniel Mack <zonque@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If two processes share a common memory region, they usually want some
guarantees to allow safe access. This often includes:
- one side cannot overwrite data while the other reads it
- one side cannot shrink the buffer while the other accesses it
- one side cannot grow the buffer beyond previously set boundaries
If there is a trust-relationship between both parties, there is no need
for policy enforcement. However, if there's no trust relationship (eg.,
for general-purpose IPC) sharing memory-regions is highly fragile and
often not possible without local copies. Look at the following two
use-cases:
1) A graphics client wants to share its rendering-buffer with a
graphics-server. The memory-region is allocated by the client for
read/write access and a second FD is passed to the server. While
scanning out from the memory region, the server has no guarantee that
the client doesn't shrink the buffer at any time, requiring rather
cumbersome SIGBUS handling.
2) A process wants to perform an RPC on another process. To avoid huge
bandwidth consumption, zero-copy is preferred. After a message is
assembled in-memory and a FD is passed to the remote side, both sides
want to be sure that neither modifies this shared copy, anymore. The
source may have put sensible data into the message without a separate
copy and the target may want to parse the message inline, to avoid a
local copy.
While SIGBUS handling, POSIX mandatory locking and MAP_DENYWRITE provide
ways to achieve most of this, the first one is unproportionally ugly to
use in libraries and the latter two are broken/racy or even disabled due
to denial of service attacks.
This patch introduces the concept of SEALING. If you seal a file, a
specific set of operations is blocked on that file forever. Unlike locks,
seals can only be set, never removed. Hence, once you verified a specific
set of seals is set, you're guaranteed that no-one can perform the blocked
operations on this file, anymore.
An initial set of SEALS is introduced by this patch:
- SHRINK: If SEAL_SHRINK is set, the file in question cannot be reduced
in size. This affects ftruncate() and open(O_TRUNC).
- GROW: If SEAL_GROW is set, the file in question cannot be increased
in size. This affects ftruncate(), fallocate() and write().
- WRITE: If SEAL_WRITE is set, no write operations (besides resizing)
are possible. This affects fallocate(PUNCH_HOLE), mmap() and
write().
- SEAL: If SEAL_SEAL is set, no further seals can be added to a file.
This basically prevents the F_ADD_SEAL operation on a file and
can be set to prevent others from adding further seals that you
don't want.
The described use-cases can easily use these seals to provide safe use
without any trust-relationship:
1) The graphics server can verify that a passed file-descriptor has
SEAL_SHRINK set. This allows safe scanout, while the client is
allowed to increase buffer size for window-resizing on-the-fly.
Concurrent writes are explicitly allowed.
2) For general-purpose IPC, both processes can verify that SEAL_SHRINK,
SEAL_GROW and SEAL_WRITE are set. This guarantees that neither
process can modify the data while the other side parses it.
Furthermore, it guarantees that even with writable FDs passed to the
peer, it cannot increase the size to hit memory-limits of the source
process (in case the file-storage is accounted to the source).
The new API is an extension to fcntl(), adding two new commands:
F_GET_SEALS: Return a bitset describing the seals on the file. This
can be called on any FD if the underlying file supports
sealing.
F_ADD_SEALS: Change the seals of a given file. This requires WRITE
access to the file and F_SEAL_SEAL may not already be set.
Furthermore, the underlying file must support sealing and
there may not be any existing shared mapping of that file.
Otherwise, EBADF/EPERM is returned.
The given seals are _added_ to the existing set of seals
on the file. You cannot remove seals again.
The fcntl() handler is currently specific to shmem and disabled on all
files. A file needs to explicitly support sealing for this interface to
work. A separate syscall is added in a follow-up, which creates files that
support sealing. There is no intention to support this on other
file-systems. Semantics are unclear for non-volatile files and we lack any
use-case right now. Therefore, the implementation is specific to shmem.
Change-Id: I2d6247d3287c61dbe6bafabf56554e80b414f938
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Ryan Lortie <desrt@desrt.ca>
Cc: Lennart Poettering <lennart@poettering.net>
Cc: Daniel Mack <zonque@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch (of 6):
The i_mmap_writable field counts existing writable mappings of an
address_space. To allow drivers to prevent new writable mappings, make
this counter signed and prevent new writable mappings if it is negative.
This is modelled after i_writecount and DENYWRITE.
This will be required by the shmem-sealing infrastructure to prevent any
new writable mappings after the WRITE seal has been set. In case there
exists a writable mapping, this operation will fail with EBUSY.
Note that we rely on the fact that iff you already own a writable mapping,
you can increase the counter without using the helpers. This is the same
that we do for i_writecount.
Change-Id: Id16c5b650e451956a4f6df004483cb63197c613c
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Ryan Lortie <desrt@desrt.ca>
Cc: Lennart Poettering <lennart@poettering.net>
Cc: Daniel Mack <zonque@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
correct_wcount and inode in mmap_region() just complicate the code. This
boolean was needed previously, when deny_write_access() was called before
vma_merge(), now we can simply check VM_DENYWRITE and do
allow_write_access() if it is set.
allow_write_access() checks file != NULL, so this is safe even if it was
possible to use VM_DENYWRITE && !file. Just we need to ensure we use the
same file which was deny_write_access()'ed, so the patch also moves "file
= vma->vm_file" down after allow_write_access().
Change-Id: I80d3674ce40fb97a80b128ea63edc86ca1770d20
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Colin Cross <ccross@android.com>
Cc: David Rientjes <rientjes@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>