android_kernel_samsung_msm8226/fs
David Herrmann 74c1387979 shm: add sealing API
If two processes share a common memory region, they usually want some
guarantees to allow safe access. This often includes:
  - one side cannot overwrite data while the other reads it
  - one side cannot shrink the buffer while the other accesses it
  - one side cannot grow the buffer beyond previously set boundaries

If there is a trust-relationship between both parties, there is no need
for policy enforcement.  However, if there's no trust relationship (eg.,
for general-purpose IPC) sharing memory-regions is highly fragile and
often not possible without local copies.  Look at the following two
use-cases:

  1) A graphics client wants to share its rendering-buffer with a
     graphics-server. The memory-region is allocated by the client for
     read/write access and a second FD is passed to the server. While
     scanning out from the memory region, the server has no guarantee that
     the client doesn't shrink the buffer at any time, requiring rather
     cumbersome SIGBUS handling.
  2) A process wants to perform an RPC on another process. To avoid huge
     bandwidth consumption, zero-copy is preferred. After a message is
     assembled in-memory and a FD is passed to the remote side, both sides
     want to be sure that neither modifies this shared copy, anymore. The
     source may have put sensible data into the message without a separate
     copy and the target may want to parse the message inline, to avoid a
     local copy.

While SIGBUS handling, POSIX mandatory locking and MAP_DENYWRITE provide
ways to achieve most of this, the first one is unproportionally ugly to
use in libraries and the latter two are broken/racy or even disabled due
to denial of service attacks.

This patch introduces the concept of SEALING.  If you seal a file, a
specific set of operations is blocked on that file forever.  Unlike locks,
seals can only be set, never removed.  Hence, once you verified a specific
set of seals is set, you're guaranteed that no-one can perform the blocked
operations on this file, anymore.

An initial set of SEALS is introduced by this patch:
  - SHRINK: If SEAL_SHRINK is set, the file in question cannot be reduced
            in size. This affects ftruncate() and open(O_TRUNC).
  - GROW: If SEAL_GROW is set, the file in question cannot be increased
          in size. This affects ftruncate(), fallocate() and write().
  - WRITE: If SEAL_WRITE is set, no write operations (besides resizing)
           are possible. This affects fallocate(PUNCH_HOLE), mmap() and
           write().
  - SEAL: If SEAL_SEAL is set, no further seals can be added to a file.
          This basically prevents the F_ADD_SEAL operation on a file and
          can be set to prevent others from adding further seals that you
          don't want.

The described use-cases can easily use these seals to provide safe use
without any trust-relationship:

  1) The graphics server can verify that a passed file-descriptor has
     SEAL_SHRINK set. This allows safe scanout, while the client is
     allowed to increase buffer size for window-resizing on-the-fly.
     Concurrent writes are explicitly allowed.
  2) For general-purpose IPC, both processes can verify that SEAL_SHRINK,
     SEAL_GROW and SEAL_WRITE are set. This guarantees that neither
     process can modify the data while the other side parses it.
     Furthermore, it guarantees that even with writable FDs passed to the
     peer, it cannot increase the size to hit memory-limits of the source
     process (in case the file-storage is accounted to the source).

The new API is an extension to fcntl(), adding two new commands:
  F_GET_SEALS: Return a bitset describing the seals on the file. This
               can be called on any FD if the underlying file supports
               sealing.
  F_ADD_SEALS: Change the seals of a given file. This requires WRITE
               access to the file and F_SEAL_SEAL may not already be set.
               Furthermore, the underlying file must support sealing and
               there may not be any existing shared mapping of that file.
               Otherwise, EBADF/EPERM is returned.
               The given seals are _added_ to the existing set of seals
               on the file. You cannot remove seals again.

The fcntl() handler is currently specific to shmem and disabled on all
files. A file needs to explicitly support sealing for this interface to
work. A separate syscall is added in a follow-up, which creates files that
support sealing. There is no intention to support this on other
file-systems. Semantics are unclear for non-volatile files and we lack any
use-case right now. Therefore, the implementation is specific to shmem.

Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Ryan Lortie <desrt@desrt.ca>
Cc: Lennart Poettering <lennart@poettering.net>
Cc: Daniel Mack <zonque@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-06 08:48:36 +01:00
..
9p BACKPORT: posix_acl: Clear SGID bit when setting file permissions 2019-08-06 12:23:27 +02:00
adfs fs: Limit sys_mount to only request filesystem modules. 2019-08-06 10:44:59 +02:00
affs fs: Limit sys_mount to only request filesystem modules. 2019-08-06 10:44:59 +02:00
afs fs: Limit sys_mount to only request filesystem modules. 2019-08-06 10:44:59 +02:00
autofs4 fs: Limit sys_mount to only request filesystem modules. 2019-08-06 10:44:59 +02:00
befs fs: Limit sys_mount to only request filesystem modules. 2019-08-06 10:44:59 +02:00
bfs fs: Limit sys_mount to only request filesystem modules. 2019-08-06 10:44:59 +02:00
btrfs BACKPORT: posix_acl: Clear SGID bit when setting file permissions 2019-08-06 12:23:27 +02:00
cachefiles Merge tag 'v3.4.113' into lineage-16.0 2019-08-05 14:20:47 +02:00
ceph fs: Limit sys_mount to only request filesystem modules. 2019-08-06 10:44:59 +02:00
cifs Merge remote-tracking branch 'google-common/deprecated/android-3.4' into lineage-16.0 2019-08-06 11:41:21 +02:00
coda fs: Limit sys_mount to only request filesystem modules. 2019-08-06 10:44:59 +02:00
configfs fs: Limit sys_mount to only request filesystem modules. 2019-08-06 10:44:59 +02:00
cramfs fs: Limit sys_mount to only request filesystem modules. 2019-08-06 10:44:59 +02:00
crypto fscrypt: remove broken support for detecting keyring key revocation 2019-08-06 12:26:38 +02:00
debugfs fs: Limit sys_mount to only request filesystem modules. 2019-08-06 10:44:59 +02:00
devpts fs: Limit sys_mount to only request filesystem modules. 2019-08-06 10:44:59 +02:00
dlm
ecryptfs ecryptfs: don't allow mmap when the lower fs doesn't support it 2019-08-06 11:48:23 +02:00
efs fs: Limit sys_mount to only request filesystem modules. 2019-08-06 10:44:59 +02:00
exfat fs: exfat: Allow disabling exfat 2020-01-06 08:40:23 +01:00
exofs fs: Limit sys_mount to only request filesystem modules. 2019-08-06 10:44:59 +02:00
exportfs Merge tag 'v3.4.113' into lineage-16.0 2019-08-05 14:20:47 +02:00
ext2 BACKPORT: posix_acl: Clear SGID bit when setting file permissions 2019-08-06 12:23:27 +02:00
ext3 BACKPORT: posix_acl: Clear SGID bit when setting file permissions 2019-08-06 12:23:27 +02:00
ext4 ext4: zero out the unused memory region in the extent tree block 2020-01-06 08:40:44 +01:00
f2fs cache the value of file_inode() in struct file 2020-01-06 08:48:36 +01:00
fat misc: fix some GCC warnings 2020-01-06 08:40:46 +01:00
freevxfs fs: Limit sys_mount to only request filesystem modules. 2019-08-06 10:44:59 +02:00
fscache Merge tag 'v3.4.113' into lineage-16.0 2019-08-05 14:20:47 +02:00
fuse ANDROID: fuse: Add null terminator to path in canonical path to avoid issue 2020-01-06 08:40:25 +01:00
gfs2 BACKPORT: posix_acl: Clear SGID bit when setting file permissions 2019-08-06 12:23:27 +02:00
hfs fs: Limit sys_mount to only request filesystem modules. 2019-08-06 10:44:59 +02:00
hfsplus fs: Limit sys_mount to only request filesystem modules. 2019-08-06 10:44:59 +02:00
hostfs Merge tag 'v3.4.113' into lineage-16.0 2019-08-05 14:20:47 +02:00
hpfs Merge tag 'v3.4.113' into lineage-16.0 2019-08-05 14:20:47 +02:00
hppfs fs: Limit sys_mount to only request filesystem modules. 2019-08-06 10:44:59 +02:00
hugetlbfs mm: larger stack guard gap, between vmas 2019-08-06 12:26:29 +02:00
isofs fs: Limit sys_mount to only request filesystem modules. 2019-08-06 10:44:59 +02:00
jbd Merge tag 'v3.4.113' into lineage-16.0 2019-08-05 14:20:47 +02:00
jbd2 jbd2: don't mark block as modified if the handle is out of credits 2020-01-06 08:40:36 +01:00
jffs2 BACKPORT: posix_acl: Clear SGID bit when setting file permissions 2019-08-06 12:23:27 +02:00
jfs BACKPORT: posix_acl: Clear SGID bit when setting file permissions 2019-08-06 12:23:27 +02:00
lockd Merge tag 'v3.4.113' into lineage-16.0 2019-08-05 14:20:47 +02:00
logfs fs: Limit sys_mount to only request filesystem modules. 2019-08-06 10:44:59 +02:00
minix fs: Limit sys_mount to only request filesystem modules. 2019-08-06 10:44:59 +02:00
ncpfs fs: Limit sys_mount to only request filesystem modules. 2019-08-06 10:44:59 +02:00
nfs Merge remote-tracking branch 'google-common/deprecated/android-3.4' into lineage-16.0 2019-08-06 11:41:21 +02:00
nfs_common
nfsd fs: Limit sys_mount to only request filesystem modules. 2019-08-06 10:44:59 +02:00
nilfs2 fs: Limit sys_mount to only request filesystem modules. 2019-08-06 10:44:59 +02:00
nls
notify vfs: Add permission2 for filesystems with per mount permissions 2019-08-06 10:44:34 +02:00
ntfs fs: Limit sys_mount to only request filesystem modules. 2019-08-06 10:44:59 +02:00
ocfs2 BACKPORT: posix_acl: Clear SGID bit when setting file permissions 2019-08-06 12:23:27 +02:00
omfs fs: Limit sys_mount to only request filesystem modules. 2019-08-06 10:44:59 +02:00
openpromfs fs: Limit sys_mount to only request filesystem modules. 2019-08-06 10:44:59 +02:00
proc proc: Export androidboot.mode=charger if needed 2020-01-06 08:40:48 +01:00
pstore Merge tag 'v3.4.113' into lineage-16.0 2019-08-05 14:20:47 +02:00
qnx4 fs: Limit sys_mount to only request filesystem modules. 2019-08-06 10:44:59 +02:00
qnx6 fs: Limit sys_mount to only request filesystem modules. 2019-08-06 10:44:59 +02:00
quota Merge tag 'v3.4.113' into lineage-16.0 2019-08-05 14:20:47 +02:00
ramfs Merge tag 'v3.4.113' into lineage-16.0 2019-08-05 14:20:47 +02:00
reiserfs BACKPORT: posix_acl: Clear SGID bit when setting file permissions 2019-08-06 12:23:27 +02:00
romfs fs: Limit sys_mount to only request filesystem modules. 2019-08-06 10:44:59 +02:00
scfs misc: Import SM-G900H kernel source code 2019-08-02 15:14:10 +02:00
sdcardfs fs: sdcardfs: Add missing option to show_options 2020-01-06 08:40:44 +01:00
sdfat cache the value of file_inode() in struct file 2020-01-06 08:48:36 +01:00
squashfs
sysfs Merge tag 'v3.4.113' into lineage-16.0 2019-08-05 14:20:47 +02:00
sysv fs: Limit sys_mount to only request filesystem modules. 2019-08-06 10:44:59 +02:00
ubifs fs: Limit sys_mount to only request filesystem modules. 2019-08-06 10:44:59 +02:00
udf Merge tag 'v3.4.113' into lineage-16.0 2019-08-05 14:20:47 +02:00
ufs fs: Limit sys_mount to only request filesystem modules. 2019-08-06 10:44:59 +02:00
xfs BACKPORT: posix_acl: Clear SGID bit when setting file permissions 2019-08-06 12:23:27 +02:00
yaffs2
aio.c BACKPORT: aio: mark AIO pseudo-fs noexec 2019-08-06 11:48:19 +02:00
anon_inodes.c
attr.c vfs: Add setattr2 for filesystems with per mount permissions 2019-08-06 10:44:35 +02:00
bad_inode.c
binfmt_aout.c
binfmt_elf.c Merge tag 'v3.4.113' into lineage-16.0 2019-08-05 14:20:47 +02:00
binfmt_elf_fdpic.c
binfmt_em86.c Merge tag 'v3.4.113' into lineage-16.0 2019-08-05 14:20:47 +02:00
binfmt_flat.c
binfmt_misc.c fs: Limit sys_mount to only request filesystem modules. 2019-08-06 10:44:59 +02:00
binfmt_script.c Merge tag 'v3.4.113' into lineage-16.0 2019-08-05 14:20:47 +02:00
binfmt_som.c
bio-integrity.c
bio.c Merge tag 'v3.4.113' into lineage-16.0 2019-08-05 14:20:47 +02:00
block_dev.c Merge tag 'v3.4.113' into lineage-16.0 2019-08-05 14:20:47 +02:00
buffer.c Merge tag 'v3.4.113' into lineage-16.0 2019-08-05 14:20:47 +02:00
char_dev.c
compat.c Merge tag 'v3.4.113' into lineage-16.0 2019-08-05 14:20:47 +02:00
compat_binfmt_elf.c
compat_ioctl.c Merge tag 'v3.4.113' into lineage-16.0 2019-08-05 14:20:47 +02:00
dcache.c constify d_lookup() arguments 2019-08-06 10:44:47 +02:00
dcookies.c
direct-io.c Merge tag 'v3.4.113' into lineage-16.0 2019-08-05 14:20:47 +02:00
drop_caches.c misc: Import SM-G900H kernel source code 2019-08-02 15:14:10 +02:00
eventfd.c
eventpoll.c PM: Rename CAP_EPOLLWAKEUP to CAP_BLOCK_SUSPEND 2019-08-08 15:09:54 +02:00
exec.c exec: Limit arg stack to at most 75% of _STK_LIM 2020-01-06 08:40:36 +01:00
fcntl.c shm: add sealing API 2020-01-06 08:48:36 +01:00
fhandle.c vfs: read file_handle only once in handle_to_path 2015-07-21 01:37:10 -07:00
fifo.c Merge tag 'v3.4.113' into lineage-16.0 2019-08-05 14:20:47 +02:00
file.c Merge tag 'v3.4.113' into lineage-16.0 2019-08-05 14:20:47 +02:00
file_table.c cache the value of file_inode() in struct file 2020-01-06 08:48:36 +01:00
filesystems.c fs: Limit sys_mount to only request filesystem modules. 2019-08-06 10:44:59 +02:00
fs-writeback.c Merge tag 'v3.4.113' into lineage-16.0 2019-08-05 14:20:47 +02:00
fs_struct.c sdcardfs: override umask on mkdir and create 2019-08-06 10:44:26 +02:00
generic_acl.c BACKPORT: posix_acl: Clear SGID bit when setting file permissions 2019-08-06 12:23:27 +02:00
inode.c Fix up non-directory creation in SGID directories 2020-01-06 08:40:41 +01:00
internal.h vfs: Allow filesystems to access their private mount data 2019-08-06 10:44:34 +02:00
ioctl.c
ioprio.c Merge tag 'v3.4.113' into lineage-16.0 2019-08-05 14:20:47 +02:00
Kconfig fs: Add sdfat 2020-01-06 08:40:22 +01:00
Kconfig.binfmt
libfs.c Merge tag 'v3.4.113' into lineage-16.0 2019-08-05 14:20:47 +02:00
locks.c Merge tag 'v3.4.113' into lineage-16.0 2019-08-05 14:20:47 +02:00
Makefile fs: Add sdfat 2020-01-06 08:40:22 +01:00
mbcache.c
mount.h Merge remote-tracking branch 'google-common/deprecated/android-3.4' into lineage-16.0 2019-08-06 11:41:21 +02:00
mpage.c
namei.c ANDROID: vfs: Missed updating truncate to truncate2 2019-08-06 10:44:40 +02:00
namespace.c ANDROID: mnt: Fix freeing of mount data 2019-08-08 12:19:53 +02:00
no-block.c
open.c cache the value of file_inode() in struct file 2020-01-06 08:48:36 +01:00
pipe.c Revert "pipe: iovec: Fix memory corruption when retrying atomic copy as non-atomic" 2020-01-06 08:40:48 +01:00
pnode.c ANDROID: mnt: Fix next_descendent 2019-08-06 10:45:24 +02:00
pnode.h Merge remote-tracking branch 'google-common/deprecated/android-3.4' into lineage-16.0 2019-08-06 11:41:21 +02:00
posix_acl.c BACKPORT: posix_acl: Clear SGID bit when setting file permissions 2019-08-06 12:23:27 +02:00
proc_namespace.c vfs: Allow filesystems to access their private mount data 2019-08-06 10:44:34 +02:00
read_write.c
read_write.h
readdir.c kernel: Only expose su when daemon is running 2019-08-05 09:12:33 +02:00
select.c Merge remote-tracking branch 'google-common/deprecated/android-3.4' into lineage-16.0 2019-08-06 11:41:21 +02:00
seq_file.c misc: Import SM-G900H kernel source code 2019-08-02 15:14:10 +02:00
signalfd.c
splice.c splice: introduce FMODE_SPLICE_READ and FMODE_SPLICE_WRITE 2019-08-06 12:24:24 +02:00
stack.c
stat.c Merge tag 'v3.4.113' into lineage-16.0 2019-08-05 14:20:47 +02:00
statfs.c Merge tag 'v3.4.113' into lineage-16.0 2019-08-05 14:20:47 +02:00
super.c vfs: Allow filesystems to access their private mount data 2019-08-06 10:44:34 +02:00
sync.c
timerfd.c timerfd: add alarm timers 2019-08-06 12:31:34 +02:00
utimes.c vfs: Add setattr2 for filesystems with per mount permissions 2019-08-06 10:44:35 +02:00
xattr.c ANDROID: xattr: Pass EOPNOTSUPP to permission2 2020-01-06 08:40:22 +01:00
xattr_acl.c