android_kernel_google_msm/fs
David Herrmann 68d3e84f05 shm: add sealing API
If two processes share a common memory region, they usually want some
guarantees to allow safe access. This often includes:
  - one side cannot overwrite data while the other reads it
  - one side cannot shrink the buffer while the other accesses it
  - one side cannot grow the buffer beyond previously set boundaries

If there is a trust-relationship between both parties, there is no need
for policy enforcement.  However, if there's no trust relationship (eg.,
for general-purpose IPC) sharing memory-regions is highly fragile and
often not possible without local copies.  Look at the following two
use-cases:

  1) A graphics client wants to share its rendering-buffer with a
     graphics-server. The memory-region is allocated by the client for
     read/write access and a second FD is passed to the server. While
     scanning out from the memory region, the server has no guarantee that
     the client doesn't shrink the buffer at any time, requiring rather
     cumbersome SIGBUS handling.
  2) A process wants to perform an RPC on another process. To avoid huge
     bandwidth consumption, zero-copy is preferred. After a message is
     assembled in-memory and a FD is passed to the remote side, both sides
     want to be sure that neither modifies this shared copy, anymore. The
     source may have put sensible data into the message without a separate
     copy and the target may want to parse the message inline, to avoid a
     local copy.

While SIGBUS handling, POSIX mandatory locking and MAP_DENYWRITE provide
ways to achieve most of this, the first one is unproportionally ugly to
use in libraries and the latter two are broken/racy or even disabled due
to denial of service attacks.

This patch introduces the concept of SEALING.  If you seal a file, a
specific set of operations is blocked on that file forever.  Unlike locks,
seals can only be set, never removed.  Hence, once you verified a specific
set of seals is set, you're guaranteed that no-one can perform the blocked
operations on this file, anymore.

An initial set of SEALS is introduced by this patch:
  - SHRINK: If SEAL_SHRINK is set, the file in question cannot be reduced
            in size. This affects ftruncate() and open(O_TRUNC).
  - GROW: If SEAL_GROW is set, the file in question cannot be increased
          in size. This affects ftruncate(), fallocate() and write().
  - WRITE: If SEAL_WRITE is set, no write operations (besides resizing)
           are possible. This affects fallocate(PUNCH_HOLE), mmap() and
           write().
  - SEAL: If SEAL_SEAL is set, no further seals can be added to a file.
          This basically prevents the F_ADD_SEAL operation on a file and
          can be set to prevent others from adding further seals that you
          don't want.

The described use-cases can easily use these seals to provide safe use
without any trust-relationship:

  1) The graphics server can verify that a passed file-descriptor has
     SEAL_SHRINK set. This allows safe scanout, while the client is
     allowed to increase buffer size for window-resizing on-the-fly.
     Concurrent writes are explicitly allowed.
  2) For general-purpose IPC, both processes can verify that SEAL_SHRINK,
     SEAL_GROW and SEAL_WRITE are set. This guarantees that neither
     process can modify the data while the other side parses it.
     Furthermore, it guarantees that even with writable FDs passed to the
     peer, it cannot increase the size to hit memory-limits of the source
     process (in case the file-storage is accounted to the source).

The new API is an extension to fcntl(), adding two new commands:
  F_GET_SEALS: Return a bitset describing the seals on the file. This
               can be called on any FD if the underlying file supports
               sealing.
  F_ADD_SEALS: Change the seals of a given file. This requires WRITE
               access to the file and F_SEAL_SEAL may not already be set.
               Furthermore, the underlying file must support sealing and
               there may not be any existing shared mapping of that file.
               Otherwise, EBADF/EPERM is returned.
               The given seals are _added_ to the existing set of seals
               on the file. You cannot remove seals again.

The fcntl() handler is currently specific to shmem and disabled on all
files. A file needs to explicitly support sealing for this interface to
work. A separate syscall is added in a follow-up, which creates files that
support sealing. There is no intention to support this on other
file-systems. Semantics are unclear for non-volatile files and we lack any
use-case right now. Therefore, the implementation is specific to shmem.

Change-Id: I2d6247d3287c61dbe6bafabf56554e80b414f938
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Ryan Lortie <desrt@desrt.ca>
Cc: Lennart Poettering <lennart@poettering.net>
Cc: Daniel Mack <zonque@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-07 22:15:12 +03:00
..
9p fs: make posix_acl_create more useful 2020-12-07 21:02:49 +03:00
adfs fs: push sync_filesystem() down to the file system's remount_fs() 2020-11-29 16:11:45 +03:00
affs fs: push sync_filesystem() down to the file system's remount_fs() 2020-11-29 16:11:45 +03:00
afs don't pass nameidata to ->create() 2018-12-07 22:28:00 +04:00
autofs4 stop passing nameidata to ->lookup() 2018-12-07 22:26:28 +04:00
befs fs: push sync_filesystem() down to the file system's remount_fs() 2020-11-29 16:11:45 +03:00
bfs don't pass nameidata to ->create() 2018-12-07 22:28:00 +04:00
btrfs fs: make posix_acl_create more useful 2020-12-07 21:02:49 +03:00
cachefiles don't pass nameidata * to vfs_create() 2018-12-07 22:28:48 +04:00
ceph mm: kill vma flag VM_CAN_NONLINEAR 2020-11-29 16:11:40 +03:00
cifs fs: push sync_filesystem() down to the file system's remount_fs() 2020-11-29 16:11:45 +03:00
coda fs: push sync_filesystem() down to the file system's remount_fs() 2020-11-29 16:11:45 +03:00
configfs stop passing nameidata to ->lookup() 2018-12-07 22:26:28 +04:00
cramfs fs: push sync_filesystem() down to the file system's remount_fs() 2020-11-29 16:11:45 +03:00
crypto ext4/fscrypto: avoid RCU lookup in d_revalidate 2016-10-29 23:12:37 +08:00
debugfs fs: push sync_filesystem() down to the file system's remount_fs() 2020-11-29 16:11:45 +03:00
devpts fs: push sync_filesystem() down to the file system's remount_fs() 2020-11-29 16:11:45 +03:00
dlm
ecryptfs don't pass nameidata * to vfs_create() 2018-12-07 22:28:48 +04:00
efs fs: push sync_filesystem() down to the file system's remount_fs() 2020-11-29 16:11:45 +03:00
exofs don't pass nameidata to ->create() 2018-12-07 22:28:00 +04:00
exportfs move d_rcu from overlapping d_child to overlapping d_alias 2017-09-22 19:11:55 +03:00
ext2 fs: make posix_acl_create more useful 2020-12-07 21:02:49 +03:00
ext3 fs: make posix_acl_create more useful 2020-12-07 21:02:49 +03:00
ext4 fs: make posix_acl_create more useful 2020-12-07 21:02:49 +03:00
f2fs fs: make posix_acl_create more useful 2020-12-07 21:02:49 +03:00
fat fs: push sync_filesystem() down to the file system's remount_fs() 2020-11-29 16:11:45 +03:00
freevxfs fs: push sync_filesystem() down to the file system's remount_fs() 2020-11-29 16:11:45 +03:00
fscache lib/radix-tree.c: make radix_tree_node_alloc() work correctly within interrupt 2020-12-07 21:02:05 +03:00
fuse fs: push sync_filesystem() down to the file system's remount_fs() 2020-11-29 16:11:45 +03:00
gfs2 fs: make posix_acl_create more useful 2020-12-07 21:02:49 +03:00
hfs fs: push sync_filesystem() down to the file system's remount_fs() 2020-11-29 16:11:45 +03:00
hfsplus fs: push sync_filesystem() down to the file system's remount_fs() 2020-11-29 16:11:45 +03:00
hostfs don't pass nameidata to ->create() 2018-12-07 22:28:00 +04:00
hpfs fs: push sync_filesystem() down to the file system's remount_fs() 2020-11-29 16:11:45 +03:00
hppfs stop passing nameidata to ->lookup() 2018-12-07 22:26:28 +04:00
hugetlbfs don't pass nameidata to ->create() 2018-12-07 22:28:00 +04:00
isofs stop passing nameidata to ->lookup() 2018-12-07 22:26:28 +04:00
jbd
jbd2 jbd2: Fix unreclaimed pages after truncate in data=journal mode 2016-10-26 23:15:34 +08:00
jffs2 fs: make posix_acl_create more useful 2020-12-07 21:02:49 +03:00
jfs fs: make posix_acl_create more useful 2020-12-07 21:02:49 +03:00
lockd lockd: Try to reconnect if statd has moved 2015-02-02 17:04:42 +08:00
logfs don't pass nameidata to ->create() 2018-12-07 22:28:00 +04:00
minix fs: push sync_filesystem() down to the file system's remount_fs() 2020-11-29 16:11:45 +03:00
ncpfs fs: push sync_filesystem() down to the file system's remount_fs() 2020-11-29 16:11:45 +03:00
nfs fs: make posix_acl_create more useful 2020-12-07 21:02:49 +03:00
nfs_common
nfsd userns: Pass a userns parameter into posix_acl_to_xattr and posix_acl_from_xattr 2020-12-07 21:02:21 +03:00
nilfs2 fs: push sync_filesystem() down to the file system's remount_fs() 2020-11-29 16:11:45 +03:00
nls
notify fanotify: check file flags passed in fanotify_init 2018-12-07 22:28:48 +04:00
ntfs fs: push sync_filesystem() down to the file system's remount_fs() 2020-11-29 16:11:45 +03:00
ocfs2 fs: make posix_acl_create more useful 2020-12-07 21:02:49 +03:00
omfs don't pass nameidata to ->create() 2018-12-07 22:28:00 +04:00
openpromfs fs: push sync_filesystem() down to the file system's remount_fs() 2020-11-29 16:11:45 +03:00
proc fs: push sync_filesystem() down to the file system's remount_fs() 2020-11-29 16:11:45 +03:00
pstore fs: push sync_filesystem() down to the file system's remount_fs() 2020-11-29 16:11:45 +03:00
qnx4 fs: push sync_filesystem() down to the file system's remount_fs() 2020-11-29 16:11:45 +03:00
qnx6 fs: push sync_filesystem() down to the file system's remount_fs() 2020-11-29 16:11:45 +03:00
quota vfs: define struct filename and have getname() return it 2018-12-07 22:28:48 +04:00
ramfs don't pass nameidata to ->create() 2018-12-07 22:28:00 +04:00
reiserfs fs: make posix_acl_create more useful 2020-12-07 21:02:49 +03:00
romfs fs: push sync_filesystem() down to the file system's remount_fs() 2020-11-29 16:11:45 +03:00
sdcardfs mm: kill vma flag VM_CAN_NONLINEAR 2020-11-29 16:11:40 +03:00
squashfs fs: push sync_filesystem() down to the file system's remount_fs() 2020-11-29 16:11:45 +03:00
sysfs stop passing nameidata to ->lookup() 2018-12-07 22:26:28 +04:00
sysv fs: push sync_filesystem() down to the file system's remount_fs() 2020-11-29 16:11:45 +03:00
ubifs fs: push sync_filesystem() down to the file system's remount_fs() 2020-11-29 16:11:45 +03:00
udf fs: push sync_filesystem() down to the file system's remount_fs() 2020-11-29 16:11:45 +03:00
ufs fs: push sync_filesystem() down to the file system's remount_fs() 2020-11-29 16:11:45 +03:00
xfs fs: make posix_acl_create more useful 2020-12-07 21:02:49 +03:00
yaffs2 fs: yaffs2: Add null pointer check before dereferencing inode 2013-02-27 18:19:17 -08:00
aio.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
anon_inodes.c
attr.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
bad_inode.c mm/fs: remove truncate_range 2020-12-07 20:57:30 +03:00
binfmt_aout.c
binfmt_elf.c binfmt_elf: Don't clobber passed executable's file header 2016-10-26 23:15:28 +08:00
binfmt_elf_fdpic.c
binfmt_em86.c exec: use -ELOOP for max recursion depth 2013-03-28 12:12:28 -07:00
binfmt_flat.c
binfmt_misc.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
binfmt_script.c exec: use -ELOOP for max recursion depth 2013-03-28 12:12:28 -07:00
binfmt_som.c
bio-integrity.c
bio.c SCSI: sg: Fix user memory corruption when SG_IO is interrupted by a signal 2013-09-07 21:58:16 -07:00
block_dev.c writeback: Fix periodic writeback after fs mount 2013-07-28 16:26:08 -07:00
buffer.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
char_dev.c
compat.c vfs: define struct filename and have getname() return it 2018-12-07 22:28:48 +04:00
compat_binfmt_elf.c
compat_ioctl.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
dcache.c [O_TMPFILE] it's still short a few helpers, but infrastructure should be OK now... 2018-12-07 22:28:48 +04:00
dcookies.c
direct-io.c fs: Fix possible use-after-free with AIO 2013-03-04 06:06:41 +08:00
drop_caches.c
eventfd.c
eventpoll.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
exec.c vfs: make path_openat take a struct filename pointer 2018-12-07 22:28:48 +04:00
fcntl.c shm: add sealing API 2020-12-07 22:15:12 +03:00
fhandle.c vfs: read file_handle only once in handle_to_path 2016-10-29 23:12:11 +08:00
fifo.c
file.c fs/file.c:fdtable: avoid triggering OOMs from alloc_fdmem 2014-02-22 10:32:45 -08:00
file_table.c get rid of s_files and files_lock 2016-03-21 09:17:55 +08:00
filesystems.c vfs: define struct filename and have getname() return it 2018-12-07 22:28:48 +04:00
fs-writeback.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
fs_struct.c sdcardfs: override umask on mkdir and create 2017-09-22 19:12:02 +03:00
inode.c mm: allow drivers to prevent new writable mappings 2020-12-07 21:08:09 +03:00
internal.h vfs: make path_openat take a struct filename pointer 2018-12-07 22:28:48 +04:00
ioctl.c
ioprio.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
Kconfig fs: remove generic_acl 2020-12-07 21:02:53 +03:00
Kconfig.binfmt
libfs.c stop passing nameidata to ->lookup() 2018-12-07 22:26:28 +04:00
locks.c locks: allow __break_lease to sleep even when break_time is 0 2014-05-13 14:11:31 +02:00
Makefile fs: remove generic_acl 2020-12-07 21:02:53 +03:00
mbcache.c
mount.h proc: Usable inode numbers for the namespace file descriptors. 2015-07-13 11:18:01 -07:00
mpage.c
namei.c fs: add get_acl helper 2020-12-07 21:02:40 +03:00
namespace.c vfs: define struct filename and have getname() return it 2018-12-07 22:28:48 +04:00
no-block.c
open.c fs: Fix file mode for O_TMPFILE 2018-12-07 22:28:48 +04:00
pipe.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
pnode.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
pnode.h ANDROID: mnt: remount should propagate to slaves of slaves 2017-09-22 19:12:11 +03:00
posix_acl.c fs: NULL dereference in posix_acl_to_xattr() 2020-12-07 21:05:11 +03:00
proc_namespace.c vfs: Allow filesystems to access their private mount data 2017-09-22 19:12:06 +03:00
read_write.c
read_write.h
readdir.c kernel: Only expose su when daemon is running 2017-05-19 18:41:25 -06:00
select.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
seq_file.c fs/seq_file: Use vmalloc by default for allocations > PAGE_SIZE 2014-11-18 15:13:24 -08:00
signalfd.c
splice.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
stack.c
stat.c vfs: make O_PATH file descriptors usable for 'fstat()' 2020-11-22 01:21:34 +03:00
statfs.c vfs: allow O_PATH file descriptors for fstatfs() 2013-10-22 09:02:25 +01:00
super.c vmscan: remove obsolete shrink_control comment 2020-11-29 16:11:26 +03:00
sync.c fs/sync.c: make sync_file_range(2) use WB_SYNC_NONE writeback 2017-12-31 13:02:49 +03:00
timerfd.c timerfd: support CLOCK_BOOTTIME clock 2017-08-27 19:07:23 +03:00
utimes.c vfs: Add setattr2 for filesystems with per mount permissions 2017-09-22 19:12:07 +03:00
xattr.c fs, xattr: fix bug when removing a name not in xattr list 2020-12-07 21:02:30 +03:00