android_kernel_google_msm

mirror of https://github.com/followmsi/android_kernel_google_msm.git synced 2024-11-06 23:17:41 +00:00

Author	SHA1	Message	Date
Theodore Ts'o	c51e428eea	fs: push sync_filesystem() down to the file system's remount_fs() Previously, the no-op "mount -o mount /dev/xxx" operation when the file system is already mounted read-write causes an implied, unconditional syncfs(). This seems pretty stupid, and it's certainly documented or guaraunteed to do this, nor is it particularly useful, except in the case where the file system was mounted rw and is getting remounted read-only. However, it's possible that there might be some file systems that are actually depending on this behavior. In most file systems, it's probably fine to only call sync_filesystem() when transitioning from read-write to read-only, and there are some file systems where this is not needed at all (for example, for a pseudo-filesystem or something like romfs). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: linux-fsdevel@vger.kernel.org Cc: Christoph Hellwig <hch@infradead.org> Cc: Artem Bityutskiy <dedekind1@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Evgeniy Dushistov <dushistov@mail.ru> Cc: Jan Kara <jack@suse.cz> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Anders Larsen <al@alarsen.net> Cc: Phillip Lougher <phillip@squashfs.org.uk> Cc: Kees Cook <keescook@chromium.org> Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Cc: Petr Vandrovec <petr@vandrovec.name> Cc: xfs@oss.sgi.com Cc: linux-btrfs@vger.kernel.org Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Cc: codalist@coda.cs.cmu.edu Cc: linux-ext4@vger.kernel.org Cc: linux-f2fs-devel@lists.sourceforge.net Cc: fuse-devel@lists.sourceforge.net Cc: cluster-devel@redhat.com Cc: linux-mtd@lists.infradead.org Cc: jfs-discussion@lists.sourceforge.net Cc: linux-nfs@vger.kernel.org Cc: linux-nilfs@vger.kernel.org Cc: linux-ntfs-dev@lists.sourceforge.net Cc: ocfs2-devel@oss.oracle.com Cc: reiserfs-devel@vger.kernel.org Change-Id: I03b43c745f82fce2cd3e0856c42eda70d94a45f8	2020-11-29 16:11:45 +03:00
Konstantin Khlebnikov	7c58c4e397	mm: kill vma flag VM_CAN_NONLINEAR Move actual pte filling for non-linear file mappings into the new special vma operation: ->remap_pages(). Filesystems must implement this method to get non-linear mapping support, if it uses filemap_fault() then generic_file_remap_pages() can be used. Now device drivers can implement this method and obtain nonlinear vma support. Change-Id: Ifbbbdfcdf871a8173856a13087400885357f95ee Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Carsten Otte <cotte@de.ibm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> #arch/tile Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Eric Paris <eparis@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Kentaro Takeda <takedakn@nttdata.co.jp> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robert Richter <robert.richter@amd.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Venkatesh Pallipadi <venki@google.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-11-29 16:11:40 +03:00
Ritesh Harjani	5063bb6474	ANDROID: fuse: Add null terminator to path in canonical path to avoid issue page allocated in fuse_dentry_canonical_path to be handled in fuse_dev_do_write is allocated using __get_free_pages(GFP_KERNEL). This may not return a page with data filled with 0. Now this page may not have a null terminator at all. If this happens and userspace fuse daemon screws up by passing a string to kernel which is not NULL terminated (or did not fill anything), then inside fuse driver in kernel when we try to do strlen(fuse_dev_write->kern_path->getname_kernel) on that page data -> it may give us issue with kernel paging request. Unable to handle kernel paging request at virtual address ------------[ cut here ]------------ <..> PC is at strlen+0x10/0x90 LR is at getname_kernel+0x2c/0xf4 <..> strlen+0x10/0x90 kern_path+0x28/0x4c fuse_dev_do_write+0x5b8/0x694 fuse_dev_write+0x74/0x94 do_iter_readv_writev+0x80/0xb8 do_readv_writev+0xec/0x1cc vfs_writev+0x54/0x64 SyS_writev+0x64/0xe4 el0_svc_naked+0x24/0x28 To avoid this we should ensure in case of FUSE_CANONICAL_PATH, the page is null terminated. Change-Id: I33ca7cc76b4472eaa982c67bb20685df451121f5 Signed-off-by: Ritesh Harjani <riteshh@codeaurora.org> Bug: 75984715 [Daniel - small edit, using args size ] Signed-off-by: Daniel Rosenberg <drosen@google.com>	2020-11-29 16:11:35 +03:00
Minchan Kim	96cd891619	vmscan: remove obsolete shrink_control comment `09f363c7` ("vmscan: fix shrinker callback bug in fs/super.c") fixed a shrinker callback which was returning -1 when nr_to_scan is zero, which caused excessive slab scanning. But `635697c6` ("vmscan: fix initial shrinker size handling") fixed the problem, again so we can freely return -1 although nr_to_scan is zero. So let's revert `09f363c7` because the comment added in `09f363c7` made an unnecessary rule. Change-Id: Ic57b698b97406b980e06bd213afa283868a779a2 Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-11-29 16:11:26 +03:00
Linus Torvalds	3848b52cdc	vfs: make O_PATH file descriptors usable for 'fstat()' We already use them for openat() and friends, but fstat() also wants to be able to use O_PATH file descriptors. This should make it more directly comparable to the O_SEARCH of Solaris. Note that you could already do the same thing with "fstatat()" and an empty path, but just doing "fstat()" directly is simpler and faster, so there is no reason not to just allow it directly. See also commit `332a2e1244`, which did the same thing for fchdir, for the same reasons. Reported-by: ольга крыжановская <olga.kryzhanovska@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@kernel.org # O_PATH introduced in 3.0+ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Change-Id: I98db77b7f3fcadb1f0ecdb5c50ff4a7ad9db8bb3	2020-11-24 22:49:23 +01:00
Linus Torvalds	9fc39ebcf6	vfs: make O_PATH file descriptors usable for 'fstat()' We already use them for openat() and friends, but fstat() also wants to be able to use O_PATH file descriptors. This should make it more directly comparable to the O_SEARCH of Solaris. Note that you could already do the same thing with "fstatat()" and an empty path, but just doing "fstat()" directly is simpler and faster, so there is no reason not to just allow it directly. See also commit `332a2e1244`, which did the same thing for fchdir, for the same reasons. Reported-by: ольга крыжановская <olga.kryzhanovska@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@kernel.org # O_PATH introduced in 3.0+ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-11-22 01:21:34 +03:00
Sultanxda	0d1df57f23	proc: Remove additional SafetyNet flags from /proc/cmdline SafetyNet checks androidboot.veritymode in Nougat, so remove it. Additionally, remove androidboot.enable_dm_verity and androidboot.secboot in case SafetyNet will check them in the future. Change-Id: Idd3422012aa62f39edadc1eee9b5b84879a87b33 Signed-off-by: Sultanxda <sultanxda@gmail.com>	2020-11-19 12:06:44 +01:00
Sultanxda	c844bdf21a	proc: Remove verifiedbootstate flag from /proc/cmdline Userspace parses this and sets the ro.boot.verifiedbootstate prop according to the value that this flag has. When ro.boot.verifiedbootstate is not 'green', SafetyNet is tripped and fails the CTS test. Hide verifiedbootstate from /proc/cmdline in order to fix the failed SafetyNet CTS check. Change-Id: I530a1d1abb8f24c5a84dc423367715c353e005d3 Signed-off-by: Sultanxda <sultanxda@gmail.com>	2020-11-19 12:06:35 +01:00
Theodore Ts'o	b84ec752b0	fs: push sync_filesystem() down to the file system's remount_fs() Previously, the no-op "mount -o mount /dev/xxx" operation when the file system is already mounted read-write causes an implied, unconditional syncfs(). This seems pretty stupid, and it's certainly documented or guaraunteed to do this, nor is it particularly useful, except in the case where the file system was mounted rw and is getting remounted read-only. However, it's possible that there might be some file systems that are actually depending on this behavior. In most file systems, it's probably fine to only call sync_filesystem() when transitioning from read-write to read-only, and there are some file systems where this is not needed at all (for example, for a pseudo-filesystem or something like romfs). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: linux-fsdevel@vger.kernel.org Cc: Christoph Hellwig <hch@infradead.org> Cc: Artem Bityutskiy <dedekind1@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Evgeniy Dushistov <dushistov@mail.ru> Cc: Jan Kara <jack@suse.cz> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Anders Larsen <al@alarsen.net> Cc: Phillip Lougher <phillip@squashfs.org.uk> Cc: Kees Cook <keescook@chromium.org> Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Cc: Petr Vandrovec <petr@vandrovec.name> Cc: xfs@oss.sgi.com Cc: linux-btrfs@vger.kernel.org Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Cc: codalist@coda.cs.cmu.edu Cc: linux-ext4@vger.kernel.org Cc: linux-f2fs-devel@lists.sourceforge.net Cc: fuse-devel@lists.sourceforge.net Cc: cluster-devel@redhat.com Cc: linux-mtd@lists.infradead.org Cc: jfs-discussion@lists.sourceforge.net Cc: linux-nfs@vger.kernel.org Cc: linux-nilfs@vger.kernel.org Cc: linux-ntfs-dev@lists.sourceforge.net Cc: ocfs2-devel@oss.oracle.com Cc: reiserfs-devel@vger.kernel.org Change-Id: I03b43c745f82fce2cd3e0856c42eda70d94a45f8	2020-11-19 11:23:46 +01:00
Konstantin Khlebnikov	e2bc3e2f21	mm: kill vma flag VM_CAN_NONLINEAR Move actual pte filling for non-linear file mappings into the new special vma operation: ->remap_pages(). Filesystems must implement this method to get non-linear mapping support, if it uses filemap_fault() then generic_file_remap_pages() can be used. Now device drivers can implement this method and obtain nonlinear vma support. Change-Id: Ifbbbdfcdf871a8173856a13087400885357f95ee Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Carsten Otte <cotte@de.ibm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> #arch/tile Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Eric Paris <eparis@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Kentaro Takeda <takedakn@nttdata.co.jp> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robert Richter <robert.richter@amd.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Venkatesh Pallipadi <venki@google.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-11-19 11:23:37 +01:00
Ritesh Harjani	70aae931ee	ANDROID: fuse: Add null terminator to path in canonical path to avoid issue page allocated in fuse_dentry_canonical_path to be handled in fuse_dev_do_write is allocated using __get_free_pages(GFP_KERNEL). This may not return a page with data filled with 0. Now this page may not have a null terminator at all. If this happens and userspace fuse daemon screws up by passing a string to kernel which is not NULL terminated (or did not fill anything), then inside fuse driver in kernel when we try to do strlen(fuse_dev_write->kern_path->getname_kernel) on that page data -> it may give us issue with kernel paging request. Unable to handle kernel paging request at virtual address ------------[ cut here ]------------ <..> PC is at strlen+0x10/0x90 LR is at getname_kernel+0x2c/0xf4 <..> strlen+0x10/0x90 kern_path+0x28/0x4c fuse_dev_do_write+0x5b8/0x694 fuse_dev_write+0x74/0x94 do_iter_readv_writev+0x80/0xb8 do_readv_writev+0xec/0x1cc vfs_writev+0x54/0x64 SyS_writev+0x64/0xe4 el0_svc_naked+0x24/0x28 To avoid this we should ensure in case of FUSE_CANONICAL_PATH, the page is null terminated. Change-Id: I33ca7cc76b4472eaa982c67bb20685df451121f5 Signed-off-by: Ritesh Harjani <riteshh@codeaurora.org> Bug: 75984715 [Daniel - small edit, using args size ] Signed-off-by: Daniel Rosenberg <drosen@google.com>	2020-11-19 11:23:23 +01:00
Minchan Kim	af96d7c7c7	vmscan: remove obsolete shrink_control comment `09f363c7` ("vmscan: fix shrinker callback bug in fs/super.c") fixed a shrinker callback which was returning -1 when nr_to_scan is zero, which caused excessive slab scanning. But `635697c6` ("vmscan: fix initial shrinker size handling") fixed the problem, again so we can freely return -1 although nr_to_scan is zero. So let's revert `09f363c7` because the comment added in `09f363c7` made an unnecessary rule. Change-Id: Ic57b698b97406b980e06bd213afa283868a779a2 Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-11-18 10:38:32 +01:00
followmsi	997eab3f62	exfat: don't pass nameidata to ->create() / stop passing nameidata to ->lookup()	2020-10-05 12:24:49 +02:00
flar2	db77a3c64d	exFAT support Signed-off-by: flar2 <asegaert@gmail.com>	2020-10-05 11:42:38 +02:00
Al Viro	9332955257	path_openat(): fix double fput() [ Upstream commit f15133df088ecadd141ea1907f2c96df67c729f0 ] path_openat() jumps to the wrong place after do_tmpfile() - it has already done path_cleanup() (as part of path_lookupat() called by do_tmpfile()), so doing that again can lead to double fput(). Change-Id: I83bb7f0a15db8d2202a010b75ade98f80e7270f2 Cc: stable@vger.kernel.org # v3.11+ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>	2018-12-07 22:28:48 +04:00
Eric Rannaud	83c1f7b47f	fs: allow open(dir, O_TMPFILE\|..., 0) with mode 0 The man page for open(2) indicates that when O_CREAT is specified, the 'mode' argument applies only to future accesses to the file: Note that this mode applies only to future accesses of the newly created file; the open() call that creates a read-only file may well return a read/write file descriptor. The man page for open(2) implies that 'mode' is treated identically by O_CREAT and O_TMPFILE. O_TMPFILE, however, behaves differently: int fd = open("/tmp", O_TMPFILE \| O_RDWR, 0); assert(fd == -1); assert(errno == EACCES); int fd = open("/tmp", O_TMPFILE \| O_RDWR, 0600); assert(fd > 0); For O_CREAT, do_last() sets acc_mode to MAY_OPEN only: if (opened & FILE_CREATED) { / Don't check for write permission, don't truncate */ open_flag &= ~O_TRUNC; will_truncate = false; acc_mode = MAY_OPEN; path_to_nameidata(path, nd); goto finish_open_created; } But for O_TMPFILE, do_tmpfile() passes the full op->acc_mode to may_open(). This patch lines up the behavior of O_TMPFILE with O_CREAT. After the inode is created, may_open() is called with acc_mode = MAY_OPEN, in do_tmpfile(). A different, but related glibc bug revealed the discrepancy: https://sourceware.org/bugzilla/show_bug.cgi?id=17523 The glibc lazily loads the 'mode' argument of open() and openat() using va_arg() only if O_CREAT is present in 'flags' (to support both the 2 argument and the 3 argument forms of open; same idea for openat()). However, the glibc ignores the 'mode' argument if O_TMPFILE is in 'flags'. On x86_64, for open(), it magically works anyway, as 'mode' is in RDX when entering open(), and is still in RDX on SYSCALL, which is where the kernel looks for the 3rd argument of a syscall. But openat() is not quite so lucky: 'mode' is in RCX when entering the glibc wrapper for openat(), while the kernel looks for the 4th argument of a syscall in R10. Indeed, the syscall calling convention differs from the regular calling convention in this respect on x86_64. So the kernel sees mode = 0 when trying to use glibc openat() with O_TMPFILE, and fails with EACCES. Change-Id: I4da221448695c2aca15818d8d4f44784ecdbdac6 Signed-off-by: Eric Rannaud <e@nanocritical.com> Acked-by: Andy Lutomirski <luto@amacapital.net> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-12-07 22:28:48 +04:00
Heinrich Schuchardt	8dfe817aac	fanotify: check file flags passed in fanotify_init Without this patch fanotify_init does not validate the value passed in event_f_flags. When a fanotify event is read from the fanotify file descriptor a new file descriptor is created where file.f_flags = event_f_flags. Internal and external open flags are stored together in field f_flags of struct file. Hence, an application might create file descriptors with internal flags like FMODE_EXEC, FMODE_NOCMTIME set. Jan Kara and Eric Paris both aggreed that this is a bug and the value of event_f_flags should be checked: https://lkml.org/lkml/2014/4/29/522 https://lkml.org/lkml/2014/4/29/539 This updated patch version considers the comments by Michael Kerrisk in https://lkml.org/lkml/2014/5/4/10 With the patch the value of event_f_flags is checked. When specifying an invalid value error EINVAL is returned. Internal flags are disallowed. File creation flags are disallowed: O_CREAT, O_DIRECTORY, O_EXCL, O_NOCTTY, O_NOFOLLOW, O_TRUNC, and O_TTY_INIT. Flags which do not make sense with fanotify are disallowed: __O_TMPFILE, O_PATH, FASYNC, and O_DIRECT. This leaves us with the following allowed values: O_RDONLY, O_WRONLY, O_RDWR are basic functionality. The are stored in the bits given by O_ACCMODE. O_APPEND is working as expected. The value might be useful in a logging application which appends the current status each time the log is opened. O_LARGEFILE is needed for files exceeding 4GB on 32bit systems. O_NONBLOCK may be useful when monitoring slow devices like tapes. O_NDELAY is equal to O_NONBLOCK except for platform parisc. To avoid code breaking on parisc either both flags should be allowed or none. The patch allows both. __O_SYNC and O_DSYNC may be used to avoid data loss on power disruption. O_NOATIME may be useful to reduce disk activity. O_CLOEXEC may be useful, if separate processes shall be used to scan files. Once this patch is accepted, the fanotify_init.2 manpage has to be updated. Change-Id: I0e3a23ccbb38fc612df14068164dde3cb7f94f86 Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-12-07 22:28:48 +04:00
Miklos Szeredi	796c65f764	ext[34]: fix double put in tmpfile d_tmpfile() already swallowed the inode ref. Change-Id: Ib393e3dc34d13065efb5fc0cd96f8667e294b908 Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-12-07 22:28:48 +04:00
Zheng Liu	0b492c4f16	vfs: add missing check for __O_TMPFILE in fcntl_init() As comment in include/uapi/asm-generic/fcntl.h described, when introducing new O_* bits, we need to check its uniqueness in fcntl_init(). But __O_TMPFILE bit is missing. So fix it. Change-Id: I914b76ab4282717b88afbbcde3c630726daef747 Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-12-07 22:28:48 +04:00
Andy Lutomirski	b1d71dd0e1	fs: Fix file mode for O_TMPFILE O_TMPFILE, like O_CREAT, should respect the requested mode and should create regular files. This fixes two bugs: O_TMPFILE required privilege (because the mode ended up as 000) and it produced bogus inodes with no type. Change-Id: I322c3f4a60bcae4f376898aee75ea838daa1c8d3 Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-12-07 22:28:48 +04:00
Zheng Liu	41f188e1c5	ext4: fix a BUG when opening a file with O_TMPFILE flag When we try to open a file with O_TMPFILE flag, we will trigger a bug. The root cause is that in ext4_orphan_add() we check ->i_nlink == 0 and this check always fails because we set ->i_nlink = 1 in inode_init_always(). We can use the following program to trigger it: int main(int argc, char *argv[]) { int fd; fd = open(argv[1], O_TMPFILE, 0666); if (fd < 0) { perror("open "); return -1; } close(fd); return 0; } The oops message looks like this: kernel BUG at fs/ext4/namei.c:2572! invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: dlci bridge stp hidp cmtp kernelcapi l2tp_ppp l2tp_netlink l2tp_core sctp libcrc32c rfcomm tun fuse nfnetli nk can_raw ipt_ULOG can_bcm x25 scsi_transport_iscsi ipx p8023 p8022 appletalk phonet psnap vmw_vsock_vmci_transport af_key vmw_vmci rose vsock atm can netrom ax25 af_rxrpc ir da pppoe pppox ppp_generic slhc bluetooth nfc rfkill rds caif_socket caif crc_ccitt af_802154 llc2 llc snd_hda_codec_realtek snd_hda_intel snd_hda_codec serio_raw snd_pcm pcsp kr edac_core snd_page_alloc snd_timer snd soundcore r8169 mii sr_mod cdrom pata_atiixp radeon backlight drm_kms_helper ttm CPU: 1 PID: 1812571 Comm: trinity-child2 Not tainted 3.11.0-rc1+ #12 Hardware name: Gigabyte Technology Co., Ltd. GA-MA78GM-S2H/GA-MA78GM-S2H, BIOS F12a 04/23/2010 task: ffff88007dfe69a0 ti: ffff88010f7b6000 task.ti: ffff88010f7b6000 RIP: 0010:[<ffffffff8125ce69>] [<ffffffff8125ce69>] ext4_orphan_add+0x299/0x2b0 RSP: 0018:ffff88010f7b7cf8 EFLAGS: 00010202 RAX: 0000000000000000 RBX: ffff8800966d3020 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff88007dfe70b8 RDI: 0000000000000001 RBP: ffff88010f7b7d40 R08: ffff880126a3c4e0 R09: ffff88010f7b7ca0 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801271fd668 R13: ffff8800966d2f78 R14: ffff88011d7089f0 R15: ffff88007dfe69a0 FS: 00007f70441a3740(0000) GS:ffff88012a800000(0000) knlGS:00000000f77c96c0 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000002834000 CR3: 0000000107964000 CR4: 00000000000007e0 DR0: 0000000000780000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600 Stack: 0000000000002000 00000020810b6dde 0000000000000000 ffff88011d46db00 ffff8800966d3020 ffff88011d7089f0 ffff88009c7f4c10 ffff88010f7b7f2c ffff88007dfe69a0 ffff88010f7b7da8 ffffffff8125cfac ffff880100000004 Call Trace: [<ffffffff8125cfac>] ext4_tmpfile+0x12c/0x180 [<ffffffff811cba78>] path_openat+0x238/0x700 [<ffffffff8100afc4>] ? native_sched_clock+0x24/0x80 [<ffffffff811cc647>] do_filp_open+0x47/0xa0 [<ffffffff811db73f>] ? __alloc_fd+0xaf/0x200 [<ffffffff811ba2e4>] do_sys_open+0x124/0x210 [<ffffffff81010725>] ? syscall_trace_enter+0x25/0x290 [<ffffffff811ba3ee>] SyS_open+0x1e/0x20 [<ffffffff816ca8d4>] tracesys+0xdd/0xe2 [<ffffffff81001001>] ? start_thread_common.constprop.6+0x1/0xa0 Code: 04 00 00 00 89 04 24 31 c0 e8 c4 77 04 00 e9 43 fe ff ff 66 25 00 d0 66 3d 00 80 0f 84 0e fe ff ff 83 7b 48 00 0f 84 04 fe ff ff <0f> 0b 49 8b 8c 24 50 07 00 00 e9 88 fe ff ff 0f 1f 84 00 00 00 Here we couldn't call clear_nlink() directly because in d_tmpfile() we will call inode_dec_link_count() to decrease ->i_nlink. So this commit tries to call d_tmpfile() before ext4_orphan_add() to fix this problem. Change-Id: I04dca79854fc9b4932df853251e28419721aabf5 Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Tested-by: Darrick J. Wong <darrick.wong@oracle.com> Tested-by: Dave Jones <davej@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Acked-by: Al Viro <viro@zeniv.linux.org.uk>	2018-12-07 22:28:48 +04:00
Zheng Liu	9854e7f3c0	ext3: fix a BUG when opening a file with O_TMPFILE flag When we try to open a file with O_TMPFILE flag, we will trigger a bug. The root cause is that in ext4_orphan_add() we check ->i_nlink == 0 and this check always fails because we set ->i_nlink = 1 in inode_init_always(). We can use the following program to trigger it: int main(int argc, char *argv[]) { int fd; fd = open(argv[1], O_TMPFILE, 0666); if (fd < 0) { perror("open "); return -1; } close(fd); return 0; } The oops message looks like this: kernel: kernel BUG at fs/ext3/namei.c:1992! kernel: invalid opcode: 0000 [#1] SMP kernel: Modules linked in: ext4 jbd2 crc16 cpufreq_ondemand ipv6 dm_mirror dm_region_hash dm_log dm_mod parport_pc parport serio_raw sg dcdbas pcspkr i2c_i801 ehci_pci ehci_hcd button acpi_cpufreq mperf e1000e ptp pps_core ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core ext3 jbd sd_mod ahci libahci libata scsi_mod uhci_hcd kernel: CPU: 0 PID: 2882 Comm: tst_tmpfile Not tainted 3.11.0-rc1+ #4 kernel: Hardware name: Dell Inc. OptiPlex 780 /0V4W66, BIOS A05 08/11/2010 kernel: task: ffff880112d30050 ti: ffff8801124d4000 task.ti: ffff8801124d4000 kernel: RIP: 0010:[<ffffffffa00db5ae>] [<ffffffffa00db5ae>] ext3_orphan_add+0x6a/0x1eb [ext3] kernel: RSP: 0018:ffff8801124d5cc8 EFLAGS: 00010202 kernel: RAX: 0000000000000000 RBX: ffff880111510128 RCX: ffff8801114683a0 kernel: RDX: 0000000000000000 RSI: ffff880111510128 RDI: ffff88010fcf65a8 kernel: RBP: ffff8801124d5d18 R08: 0080000000000000 R09: ffffffffa00d3b7f kernel: R10: ffff8801114683a0 R11: ffff8801032a2558 R12: 0000000000000000 kernel: R13: ffff88010fcf6800 R14: ffff8801032a2558 R15: ffff8801115100d8 kernel: FS: 00007f5d172b5700(0000) GS:ffff880117c00000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b kernel: CR2: 00007f5d16df15d0 CR3: 0000000110b1d000 CR4: 00000000000407f0 kernel: Stack: kernel: 000000000000000c ffff8801048a7dc8 ffff8801114685a8 ffffffffa00b80d7 kernel: ffff8801124d5e38 ffff8801032a2558 ffff88010ce24d68 0000000000000000 kernel: ffff88011146b300 ffff8801124d5d44 ffff8801124d5d78 ffffffffa00db7e1 kernel: Call Trace: kernel: [<ffffffffa00b80d7>] ? journal_start+0x8c/0xbd [jbd] kernel: [<ffffffffa00db7e1>] ext3_tmpfile+0xb2/0x13b [ext3] kernel: [<ffffffff821076f8>] path_openat+0x11f/0x5e7 kernel: [<ffffffff821c86b4>] ? list_del+0x11/0x30 kernel: [<ffffffff82065fa2>] ? __dequeue_entity+0x33/0x38 kernel: [<ffffffff82107cd5>] do_filp_open+0x3f/0x8d kernel: [<ffffffff82112532>] ? __alloc_fd+0x50/0x102 kernel: [<ffffffff820f9296>] do_sys_open+0x13b/0x1cd kernel: [<ffffffff820f935c>] SyS_open+0x1e/0x20 kernel: [<ffffffff82398c02>] system_call_fastpath+0x16/0x1b kernel: Code: 39 c7 0f 85 67 01 00 00 0f b7 03 25 00 f0 00 00 3d 00 40 00 00 74 18 3d 00 80 00 00 74 11 3d 00 a0 00 00 74 0a 83 7b 48 00 74 04 <0f> 0b eb fe 49 8b 85 50 03 00 00 4c 89 f6 48 c7 c7 c0 99 0e a0 kernel: RIP [<ffffffffa00db5ae>] ext3_orphan_add+0x6a/0x1eb [ext3] kernel: RSP <ffff8801124d5cc8> Here we couldn't call clear_nlink() directly because in d_tmpfile() we will call inode_dec_link_count() to decrease ->i_nlink. So this commit tries to call d_tmpfile() before ext4_orphan_add() to fix this problem. Change-Id: I7c71cb75eaa579fd85d37dd8b1d22cb843d48361 Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Jan Kara <jack@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk>	2018-12-07 22:28:48 +04:00
Al Viro	cfb042a7c1	allow O_TMPFILE to work with O_WRONLY Change-Id: If1758bafed5fe780665a899fa456417680f3a24c Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-12-07 22:28:48 +04:00
Al Viro	2d7a35567b	Safer ABI for O_TMPFILE [suggested by Rasmus Villemoes] make O_DIRECTORY \| O_RDWR part of O_TMPFILE; that will fail on old kernels in a lot more cases than what I came up with. And make sure O_CREAT doesn't get there... Change-Id: I90b6ad396a8053eadd5cb32501f55cbb1d4be2db Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-12-07 22:28:48 +04:00
Miklos Szeredi	0765c88d0a	vfs: improve i_op->atomic_open() documentation Fix documentation of ->atomic_open() and related functions: finish_open() and finish_no_open(). Also add details that seem to be unclear and a source of bugs (some of which are fixed in the following series). Cc-ing maintainers of all filesystems implementing ->atomic_open(). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Sage Weil <sage@inktank.com> Cc: Steve French <sfrench@samba.org> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Change-Id: Ic3734901961cb69079189f7d4ded66af5a88d8f2	2018-12-07 22:28:48 +04:00
Al Viro	2813fb51b6	ext4: ->tmpfile() support very similar to ext3 counterpart... Change-Id: Ia6d57ae72f19f17b3ea8dc3ebb5016aa4d7bda5d Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-12-07 22:28:48 +04:00
Al Viro	3b63bffd0f	ext3 ->tmpfile() support In this case we do need a bit more than usual, due to orphan list handling. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Change-Id: I3a2da2b3f9bde5ac5a8158005a3068a6a67b7a83	2018-12-07 22:28:48 +04:00
Al Viro	ca7d77149e	allow the temp files created by open() to be linked to O_TMPFILE \| O_CREAT => linkat() with AT_SYMLINK_FOLLOW and /proc/self/fd/<n> as oldpath (i.e. flink()) will create a link O_TMPFILE \| O_CREAT \| O_EXCL => ENOENT on attempt to link those guys Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Change-Id: I1c10dfd653cb48f4e7a42344337601210779178a	2018-12-07 22:28:48 +04:00
Al Viro	b390b8b86f	[O_TMPFILE] it's still short a few helpers, but infrastructure should be OK now... Change-Id: I6d19ad586df0185978a651a2e4ff126800e34570 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-12-07 22:28:48 +04:00
Eric W. Biederman	e8f9b710c1	proc: Use nd_jump_link in proc_ns_follow_link Update proc_ns_follow_link to use nd_jump_link instead of just manually updating nd.path.dentry. This fixes the BUG_ON(nd->inode != parent->d_inode) reported by Dave Jones and reproduced trivially with mkdir /proc/self/ns/uts/a. Sigh it looks like the VFS change to require use of nd_jump_link happend while proc_ns_follow_link was baking and since the common case of proc_ns_follow_link continued to work without problems the need for making this change was overlooked. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Change-Id: I465f73b64069aca5b059bad28bfef098dddc1b99	2018-12-07 22:28:48 +04:00
Linus Torvalds	4dda4639cb	vfs: don't BUG_ON() if following a /proc fd pseudo-symlink results in a symlink It's "normal" - it can happen if the file descriptor you followed was opened with O_NOFOLLOW. Reported-by: Dave Jones <davej@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Change-Id: Ic8bcf2195ef87b424c2121691ca8fe78c6f8eb73	2018-12-07 22:28:48 +04:00
Al Viro	ca9186553a	lookup_one_len: don't accept . and .. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Change-Id: I4a5861290d8d890898cabbe0d109e47bde8aa5ce	2018-12-07 22:28:48 +04:00
Linus Torvalds	7321097bd6	VFS: don't do protected {sym,hard}links by default In commit `800179c9b8` ("This adds symlink and hardlink restrictions to the Linux VFS"), the new link protections were enabled by default, in the hope that no actual application would care, despite it being technically against legacy UNIX (and documented POSIX) behavior. However, it does turn out to break some applications. It's rare, and it's unfortunate, but it's unacceptable to break existing systems, so we'll have to default to legacy behavior. In particular, it has broken the way AFD distributes files, see http://www.dwd.de/AFD/ along with some legacy scripts. Distributions can end up setting this at initrd time or in system scripts: if you have security problems due to link attacks during your early boot sequence, you have bigger problems than some kernel sysctl setting. Do: echo 1 > /proc/sys/fs/protected_symlinks echo 1 > /proc/sys/fs/protected_hardlinks to re-enable the link protections. Alternatively, we may at some point introduce a kernel config option that sets these kinds of "more secure but not traditional" behavioural options automatically. Reported-by: Nick Bowler <nbowler@elliptictech.com> Reported-by: Holger Kiehl <Holger.Kiehl@dwd.de> Cc: Kees Cook <keescook@chromium.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org # v3.6 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Change-Id: I0f626d9487972c6dcae2dd98d80f72c2e7727087	2018-12-07 22:28:48 +04:00
Jeff Layton	ea5bb91a72	vfs: embed struct filename inside of names_cache allocation if possible In the common case where a name is much smaller than PATH_MAX, an extra allocation for struct filename is unnecessary. Before allocating a separate one, try to embed the struct filename inside the buffer first. If it turns out that that's not long enough, then fall back to allocating a separate struct filename and redoing the copy. Change-Id: I57df0c4e642cc7a76efaa621ba1ce10e717447ff Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-12-07 22:28:48 +04:00
Al Viro	3d35b21eb3	use can_lookup() instead of direct checks of ->i_op->lookup a couple of places got missed back when Linus has introduced that one... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Change-Id: I47ad6735f70d32e54a1ca9b15fa43b2fbcc6b999	2018-12-07 22:28:48 +04:00
Jeff Layton	dde1c69f9b	vfs: make path_openat take a struct filename pointer ...and fix up the callers. For do_file_open_root, just declare a struct filename on the stack and fill out the .name field. For do_filp_open, make it also take a struct filename pointer, and fix up its callers to call it appropriately. For filp_open, add a variant that takes a struct filename pointer and turn filp_open into a wrapper around it. Change-Id: Ibeb0479a22019e78b22990406d54c4ebed76a567 Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-12-07 22:28:48 +04:00
Jeff Layton	fecca794b8	vfs: turn do_path_lookup into wrapper around struct filename variant ...and make the user_path callers use that variant instead. Change-Id: I2d162b8859702febd366a4920b896b26bacf5136 Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-12-07 22:28:48 +04:00
Jeff Layton	3df0a6646d	vfs: define struct filename and have getname() return it getname() is intended to copy pathname strings from userspace into a kernel buffer. The result is just a string in kernel space. It would however be quite helpful to be able to attach some ancillary info to the string. For instance, we could attach some audit-related info to reduce the amount of audit-related processing needed. When auditing is enabled, we could also call getname() on the string more than once and not need to recopy it from userspace. This patchset converts the getname()/putname() interfaces to return a struct instead of a string. For now, the struct just tracks the string in kernel space and the original userland pointer for it. Later, we'll add other information to the struct as it becomes convenient. Change-Id: Ib690c3dd4d56624f0ddb081e1c1d4f23c2dd0cd1 Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-12-07 22:28:48 +04:00
Jeff Layton	aa0c13bbbe	vfs: unexport getname and putname symbols I see no callers in module code. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Change-Id: I88117f368a130770b6e4d4686cadde6723c1d7fc	2018-12-07 22:28:48 +04:00
Arnd Bergmann	c273793a85	vfs: bogus warnings in fs/namei.c The follow_link() function always initializes its *p argument, or returns an error, but when building with 'gcc -s', the compiler gets confused by the __always_inline attribute to the function and can no longer detect where the cookie was initialized. The solution is to always initialize the pointer from follow_link, even in the error path. When building with -O2, this has zero impact on generated code and adds a single instruction in the error path for a -Os build on ARM. Without this patch, building with gcc-4.6 through gcc-4.8 and CONFIG_CC_OPTIMIZE_FOR_SIZE results in: fs/namei.c: In function 'link_path_walk': fs/namei.c:649:24: warning: 'cookie' may be used uninitialized in this function [-Wuninitialized] fs/namei.c:1544:9: note: 'cookie' was declared here fs/namei.c: In function 'path_lookupat': fs/namei.c:649:24: warning: 'cookie' may be used uninitialized in this function [-Wuninitialized] fs/namei.c:1934:10: note: 'cookie' was declared here fs/namei.c: In function 'path_openat': fs/namei.c:649:24: warning: 'cookie' may be used uninitialized in this function [-Wuninitialized] fs/namei.c:2899:9: note: 'cookie' was declared here Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Change-Id: Ib640b0c8b111da37b389ceb24f468497ad97622e	2018-12-07 22:28:48 +04:00
Sasha Levin	276d16ddf7	fs: prevent use after free in auditing when symlink following was denied Commit "fs: add link restriction audit reporting" has added auditing of failed attempts to follow symlinks. Unfortunately, the auditing was being done after the struct path structure was released earlier. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Change-Id: Id6639dd23f00eb29ee19c8c7c714769ba25efca7	2018-12-07 22:28:48 +04:00
Al Viro	2378a18866	namei.c: fix BS comment get_write_access() is needed for nfsd, not binfmt_aout (the latter has no business doing anything of that kind, of course) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Change-Id: I210f8b92bdd26966b4ca47f000b58433a8f8eca6	2018-12-07 22:28:48 +04:00
Sage Weil	8c29257456	vfs: fix propagation of atomic_open create error on negative dentry If ->atomic_open() returns -ENOENT, we take care to return the create error (e.g., EACCES), if any. Do the same when ->atomic_open() returns 1 and provides a negative dentry. This fixes a regression where an unprivileged open O_CREAT fails with ENOENT instead of EACCES, introduced with the new atomic_open code. It is tested by the open/08.t test in the pjd posix test suite, and was observed on top of fuse (backed by ceph-fuse). Signed-off-by: Sage Weil <sage@inktank.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Change-Id: Ie92bf84be4469484b005d0ea9b9886a0bd36d922	2018-12-07 22:28:48 +04:00
Miklos Szeredi	77b0dd77b7	vfs: pass right create mode to may_o_create() Pass the umask-ed create mode to may_o_create() instead of the original one. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Tested-by: Richard W.M. Jones <rjones@redhat.com> Change-Id: Ie873439e8135f579c91dba57e88665e96d646ae4	2018-12-07 22:28:48 +04:00
Miklos Szeredi	c261fc42d5	vfs: atomic_open(): fix create mode usage Don't mask S_ISREG off the create mode before passing to ->atomic_open(). Other methods (->create, ->mknod) also get the complete file mode and filesystems expect it. Reported-by: Steve <steveamigauk@yahoo.co.uk> Reported-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Tested-by: Richard W.M. Jones <rjones@redhat.com> Change-Id: Idd21534c4124f2c7ade8b9afbd40b6fa303dbc4d	2018-12-07 22:28:48 +04:00
Jan Kara	e667844e5a	fs: Push mnt_want_write() outside of i_mutex Currently, mnt_want_write() is sometimes called with i_mutex held and sometimes without it. This isn't really a problem because mnt_want_write() is a non-blocking operation (essentially has a trylock semantics) but when the function starts to handle also frozen filesystems, it will get a full lock semantics and thus proper lock ordering has to be established. So move all mnt_want_write() calls outside of i_mutex. One non-trivial case needing conversion is kern_path_create() / user_path_create() which didn't include mnt_want_write() but now needs to because it acquires i_mutex. Because there are virtual file systems which don't bother with freeze / remount-ro protection we actually provide both versions of the function - one which calls mnt_want_write() and one which does not. [AV: scratch the previous, mnt_want_write() has been moved to kern_path_create() by now] Change-Id: I460255fabb9bfcebe6974aabdcd0b5dca1856a9e Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-12-07 22:28:48 +04:00
Al Viro	d6a5fcecf4	simplify lookup_open()/atomic_open() - do the temporary mnt_want_write() early The write ref to vfsmount taken in lookup_open()/atomic_open() is going to be dropped; we take the one to stay in dentry_open(). Just grab the temporary in caller if it looks like we are going to need it (create/truncate/writable open) and pass (by value) "has it succeeded" flag. Instead of doing mnt_want_write() inside, check that flag and treat "false" as "mnt_want_write() has just failed". mnt_want_write() is cheap and the things get considerably simpler and more robust that way - we get it and drop it in the same function, to start with, rather than passing a "has something in the guts of really scary functions taken it" back to caller. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Change-Id: Icda3799935abd688cbad95d4a1f22563b1f653d5	2018-12-07 22:28:48 +04:00
Al Viro	3e7ea88625	fix O_EXCL handling for devices O_EXCL without O_CREAT has different semantics; it's "fail if already opened", not "fail if already exists". commit `71574865` broke that... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Change-Id: I59e7ab80df02e7fff2f4f9118d78921f60399a02	2018-12-07 22:28:48 +04:00
Kees Cook	2f549f9575	fs: add link restriction audit reporting Adds audit messages for unexpected link restriction violations so that system owners will have some sort of potentially actionable information about misbehaving processes. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Change-Id: I4a6ef885b0680e1d554e32b7cc3506f8e0ba0b8a	2018-12-07 22:28:48 +04:00
Kees Cook	ec7215ac09	fs: add link restrictions This adds symlink and hardlink restrictions to the Linux VFS. Symlinks: A long-standing class of security issues is the symlink-based time-of-check-time-of-use race, most commonly seen in world-writable directories like /tmp. The common method of exploitation of this flaw is to cross privilege boundaries when following a given symlink (i.e. a root process follows a symlink belonging to another user). For a likely incomplete list of hundreds of examples across the years, please see: http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=/tmp The solution is to permit symlinks to only be followed when outside a sticky world-writable directory, or when the uid of the symlink and follower match, or when the directory owner matches the symlink's owner. Some pointers to the history of earlier discussion that I could find: 1996 Aug, Zygo Blaxell http://marc.info/?l=bugtraq&m=87602167419830&w=2 1996 Oct, Andrew Tridgell http://lkml.indiana.edu/hypermail/linux/kernel/9610.2/0086.html 1997 Dec, Albert D Cahalan http://lkml.org/lkml/1997/12/16/4 2005 Feb, Lorenzo Hernández García-Hierro http://lkml.indiana.edu/hypermail/linux/kernel/0502.0/1896.html 2010 May, Kees Cook https://lkml.org/lkml/2010/5/30/144 Past objections and rebuttals could be summarized as: - Violates POSIX. - POSIX didn't consider this situation and it's not useful to follow a broken specification at the cost of security. - Might break unknown applications that use this feature. - Applications that break because of the change are easy to spot and fix. Applications that are vulnerable to symlink ToCToU by not having the change aren't. Additionally, no applications have yet been found that rely on this behavior. - Applications should just use mkstemp() or O_CREATE\|O_EXCL. - True, but applications are not perfect, and new software is written all the time that makes these mistakes; blocking this flaw at the kernel is a single solution to the entire class of vulnerability. - This should live in the core VFS. - This should live in an LSM. (https://lkml.org/lkml/2010/5/31/135) - This should live in an LSM. - This should live in the core VFS. (https://lkml.org/lkml/2010/8/2/188) Hardlinks: On systems that have user-writable directories on the same partition as system files, a long-standing class of security issues is the hardlink-based time-of-check-time-of-use race, most commonly seen in world-writable directories like /tmp. The common method of exploitation of this flaw is to cross privilege boundaries when following a given hardlink (i.e. a root process follows a hardlink created by another user). Additionally, an issue exists where users can "pin" a potentially vulnerable setuid/setgid file so that an administrator will not actually upgrade a system fully. The solution is to permit hardlinks to only be created when the user is already the existing file's owner, or if they already have read/write access to the existing file. Many Linux users are surprised when they learn they can link to files they have no access to, so this change appears to follow the doctrine of "least surprise". Additionally, this change does not violate POSIX, which states "the implementation may require that the calling process has permission to access the existing file"[1]. This change is known to break some implementations of the "at" daemon, though the version used by Fedora and Ubuntu has been fixed[2] for a while. Otherwise, the change has been undisruptive while in use in Ubuntu for the last 1.5 years. [1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/linkat.html [2] http://anonscm.debian.org/gitweb/?p=collab-maint/at.git;a=commitdiff;h=f4114656c3a6c6f6070e315ffdf940a49eda3279 This patch is based on the patches in Openwall and grsecurity, along with suggestions from Al Viro. I have added a sysctl to enable the protected behavior, and documentation. Change-Id: Ic4872c58e8a0672147c73b13175ea143e19915ba Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-12-07 22:28:48 +04:00
Jeff Layton	87b37ef17f	vfs: don't let do_last pass negative dentry to audit_inode I can reliably reproduce the following panic by simply setting an audit rule on a recent 3.5.0+ kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040 IP: [<ffffffff810d1250>] audit_copy_inode+0x10/0x90 PGD 7acd9067 PUD 7b8fb067 PMD 0 Oops: 0000 [#86] SMP Modules linked in: nfs nfs_acl auth_rpcgss fscache lockd sunrpc tpm_bios btrfs zlib_deflate libcrc32c kvm_amd kvm joydev virtio_net pcspkr i2c_piix4 floppy virtio_balloon microcode virtio_blk cirrus drm_kms_helper ttm drm i2c_core [last unloaded: scsi_wait_scan] CPU 0 Pid: 1286, comm: abrt-dump-oops Tainted: G D 3.5.0+ #1 Bochs Bochs RIP: 0010:[<ffffffff810d1250>] [<ffffffff810d1250>] audit_copy_inode+0x10/0x90 RSP: 0018:ffff88007aebfc38 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff88003692d860 RCX: 00000000000038c4 RDX: 0000000000000000 RSI: ffff88006baf5d80 RDI: ffff88003692d860 RBP: ffff88007aebfc68 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 R13: ffff880036d30f00 R14: ffff88006baf5d80 R15: ffff88003692d800 FS: 00007f7562634740(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000040 CR3: 000000003643d000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process abrt-dump-oops (pid: 1286, threadinfo ffff88007aebe000, task ffff880079614530) Stack: ffff88007aebfdf8 ffff88007aebff28 ffff88007aebfc98 ffffffff81211358 ffff88003692d860 0000000000000000 ffff88007aebfcc8 ffffffff810d4968 ffff88007aebfcc8 ffff8800000038c4 0000000000000000 0000000000000000 Call Trace: [<ffffffff81211358>] ? ext4_lookup+0xe8/0x160 [<ffffffff810d4968>] __audit_inode+0x118/0x2d0 [<ffffffff811955a9>] do_last+0x999/0xe80 [<ffffffff81191fe8>] ? inode_permission+0x18/0x50 [<ffffffff81171efa>] ? kmem_cache_alloc_trace+0x11a/0x130 [<ffffffff81195b4a>] path_openat+0xba/0x420 [<ffffffff81196111>] do_filp_open+0x41/0xa0 [<ffffffff811a24bd>] ? alloc_fd+0x4d/0x120 [<ffffffff811855cd>] do_sys_open+0xed/0x1c0 [<ffffffff810d40cc>] ? __audit_syscall_entry+0xcc/0x300 [<ffffffff811856c1>] sys_open+0x21/0x30 [<ffffffff81611ca9>] system_call_fastpath+0x16/0x1b RSP <ffff88007aebfc38> CR2: 0000000000000040 The problem is that do_last is passing a negative dentry to audit_inode. The comments on lookup_open note that it can pass back a negative dentry if O_CREAT is not set. This patch fixes the oops, but I'm not clear on whether there's a better approach. Cc: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Change-Id: I92efc8c98013979b58e7eabcfc242c7143b5f928	2018-12-07 22:28:48 +04:00
Al Viro	579400a5b2	pull mnt_want_write()/mnt_drop_write() into kern_path_create()/done_path_create() resp. One side effect - attempt to create a cross-device link on a read-only fs fails with EROFS instead of EXDEV now. Makes more sense, POSIX allows, etc. Change-Id: I264b03a230dbd310f3b3671d2da06ceb2930179b Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-12-07 22:28:48 +04:00
Al Viro	9d58c3048f	mknod: take sanity checks on mode into the very beginning Note that applying umask can't affect their results. While that affects errno in cases like mknod("/no_such_directory/a", 030000) yielding -EINVAL (due to impossible mode_t) instead of -ENOENT (due to inexistent directory), IMO that makes a lot more sense, POSIX allows to return either and any software that relies on getting -ENOENT instead of -EINVAL in that case deserves everything it gets. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Change-Id: I1abb3e8ad247f3f48bde931d70e6f546126c62d7	2018-12-07 22:28:48 +04:00
Al Viro	2c1bd80538	new helper: done_path_create() releases what needs to be released after {kern,user}_path_create() Change-Id: If7fa7455e2ba8a6f4f4c4d2db502a38b4a60d7c7 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-12-07 22:28:48 +04:00
Al Viro	8c0b229166	tidy up namei.c a bit locking/unlocking for rcu walk taken to a couple of inline helpers Change-Id: I19f7f437641bb56f186f5d4c197425886f3625ca Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-12-07 22:28:48 +04:00
Al Viro	4aec9404b1	unobfuscate follow_up() a bit really convoluted test in there has grown up during struct mount introduction; what it checks is that we'd reached the root of mount tree. Change-Id: Ia48bdc985ae689345cfd409d8c81eb52fca6e014	2018-12-07 22:28:48 +04:00
Al Viro	82dd82dd48	use __lookup_hash() in kern_path_parent() No need to bother with lookup_one_len() here - it's an overkill Signed-off-by Al Viro <viro@zeniv.linux.org.uk> Change-Id: I733256aed797e9c0ac52f9c7cbc17b40e5b151fe	2018-12-07 22:28:48 +04:00
Christoph Hellwig	0dae24aa48	fs: add nd_jump_link Add a helper that abstracts out the jump to an already parsed struct path from ->follow_link operation from procfs. Not only does this clean up the code by moving the two sides of this game into a single helper, but it also prepares for making struct nameidata private to namei.c Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Change-Id: If2392e9a3db44877f3976b543b12d3402cd29c22	2018-12-07 22:28:48 +04:00
Christoph Hellwig	049005ce30	fs: move path_put on failure out of ->follow_link Currently the non-nd_set_link based versions of ->follow_link are expected to do a path_put(&nd->path) on failure. This calling convention is unexpected, undocumented and doesn't match what the nd_set_link-based instances do. Move the path_put out of the only non-nd_set_link based ->follow_link instance into the caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Change-Id: I6e06cf2be5425e752622a33eb63308bced33b0bb	2018-12-07 22:28:48 +04:00
David Howells	97733c93ab	VFS: Fix the banner comment on lookup_open() Since commit 197e37d9, the banner comment on lookup_open() no longer matches what the function returns. It used to return a struct file pointer or NULL and now it returns an integer and is passed the struct file pointer it is to use amongst its arguments. Update the comment to reflect this. Also add a banner comment to atomic_open(). Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Change-Id: Ia49cbec8cd15bd0b4af0b44bb16d79faa80947e0	2018-12-07 22:28:48 +04:00
Al Viro	2023b60cc3	don't pass nameidata * to vfs_create() all we want is a boolean flag, same as the method gets now Change-Id: I0cbe220b96bbbec6d50228cac774a0439f6a29f2 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-12-07 22:28:48 +04:00
Al Viro	dcb9cda2ea	don't pass nameidata to ->create() boolean "does it have to be exclusive?" flag is passed instead; Local filesystem should just ignore it - the object is guaranteed not to be there yet. Change-Id: I25efea9892458f6f64070c62bd1adb5194dcd8c1 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-12-07 22:28:00 +04:00
Al Viro	4ff32315d8	fs/namei.c: don't pass nameidata to __lookup_hash() and lookup_real() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Change-Id: Ib083d752c2295101e759ccd2fe17b01ddaaefaf2	2018-12-07 22:26:31 +04:00
Al Viro	66c4da2876	stop passing nameidata to ->lookup() Just the flags; only NFS cares even about that, but there are legitimate uses for such argument. And getting rid of that completely would require splitting ->lookup() into a couple of methods (at least), so let's leave that alone for now... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Change-Id: Id5a9a96c3202f724156c32fb266190334e7dbe48	2018-12-07 22:26:28 +04:00
Al Viro	fe5cfc12d0	fs/namei.c: don't pass namedata to lookup_dcache() just the flags... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Change-Id: Ie885bc825aa80573489a60d3cbd6ab4d3975ea7e	2018-12-07 22:26:13 +04:00
Al Viro	f033032252	fs/namei.c: don't pass nameidata to d_revalidate() since the method wrapped by it doesn't need that anymore... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Change-Id: I2d0b8680f4ff4dd4d46e0e9b4673370081929137	2018-12-07 22:26:13 +04:00
Al Viro	559bdce534	stop passing nameidata * to ->d_revalidate() Just the lookup flags. Die, bastard, die... Change-Id: Ie1e6aa84316f14bd9f0a2d297bd5eb32c92c84fd Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-12-07 22:26:05 +04:00
Al Viro	5f47e78fd6	fs/namei.c: get do_last() and friends return int Same conventions as for ->atomic_open(). Trimmed the forest of labels a bit, while we are at it... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Change-Id: I94a25b547d3caaf3c20e2b6fbe4183ac5e1b87d7	2018-12-07 22:20:38 +04:00
Al Viro	a716ffb5f1	fs/nfs/dir.c: switch to passing nd->flags instead of nd wherever possible Change-Id: I747e6ec144891850ac9c7e57f09ca51dee6306ab Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-12-07 22:20:38 +04:00
Al Viro	b667b7b5a3	nfs_lookup_verify_inode() - nd is always non-NULL here Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Change-Id: I71d8ef74d63b7d2b2af28c7a6c70a10ffedcc3c1	2018-12-07 22:20:38 +04:00
Al Viro	10af5b2c04	switch nfs_lookup_check_intent() away from nameidata just pass the flags Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Change-Id: I5954ce57c51db893b6153d4edcef37b72b158ad5	2018-12-07 22:20:38 +04:00
Al Viro	80c89c609f	make finish_no_open() return int namely, 1 ;-) That's what we want to return from ->atomic_open() instances after finish_no_open(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Change-Id: Id629fb7d43cca5a4ca91802ba13b61aa95288d47	2018-12-07 22:20:38 +04:00
Al Viro	812f0dc61c	kill struct opendata Just pass struct file . Methods are happier that way... There's no need to return struct file from finish_open() now, so let it return int. Next: saner prototypes for parts in namei.c Change-Id: I984f0f992330c959a2f9703d9e7647ef340e2845 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-12-07 22:20:38 +04:00
Al Viro	1ae100dc44	kill opendata->{mnt,dentry} ->filp->f_path is there for purpose... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Change-Id: I6d70a12b7a7541c95d4b812543cfe4b7933ae3fe	2018-12-07 22:20:38 +04:00
Al Viro	cb28cf9441	don't modify od->filp at all make put_filp() conditional on flag set by finish_open() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Change-Id: I79833cd7a54d635bad80c6fca31eb55631e96c8b	2018-12-07 22:20:38 +04:00
Al Viro	19cbdb4013	make ->atomic_open() return int Change of calling conventions: old new NULL 1 file 0 ERR_PTR(-ve) -ve Caller knows that struct file *; no need to return it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Change-Id: I883d67181a0100447a2e077ed537ee393e862e0b	2018-12-07 22:20:38 +04:00
Al Viro	e465d5dd30	->atomic_open() prototype change - pass int * instead of bool * ... and let finish_open() report having opened the file via that sucker. Next step: don't modify od->filp at all. [AV: FILE_CREATE was already used by cifs; Miklos' fix folded] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Change-Id: I6ea0f871fab215a2901710392abbda88c80008c1	2018-12-07 22:20:38 +04:00
Miklos Szeredi	7589e33d50	ceph: implement i_op->atomic_open() Add an ->atomic_open implementation which replaces the atomic lookup+open+create operation implemented via ->lookup and ->create operations. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: Sage Weil <sage@newdream.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Change-Id: I9cd73db22147a760ee2f69b498aacd16689908b1	2018-12-07 22:20:38 +04:00
Miklos Szeredi	80bfa0c48b	ceph: remove unused arg from ceph_lookup_open() What was the purpose of this? Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: Sage Weil <sage@newdream.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Change-Id: I5f994a7aa9edc51f6e7ec4f746bf51332d5b496b	2018-12-07 22:20:38 +04:00
Miklos Szeredi	810978e1b8	9p: implement i_op->atomic_open() Add an ->atomic_open implementation which replaces the atomic open+create operation implemented via ->create. No functionality is changed. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: Eric Van Hensbergen <ericvh@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Change-Id: I0c8c0998fcf940f603963a876ef2be825babf6a7	2018-12-07 22:20:38 +04:00
Miklos Szeredi	37d50c6e2b	nfs: don't use intents for checking atomic open is_atomic_open() is now only used by nfs4_lookup_revalidate() to check whether it's okay to skip normal revalidation. It does a racy check for mount read-onlyness and falls back to normal revalidation if the open would fail. This makes little sense now that this function isn't used for determining whether to actually open the file or not. The d_mountpoint() check still makes sense since it is an indication that we might be following a mount and so open may not revalidate the dentry. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Change-Id: Ibad58603956e406c63b4b7c63243502eabe6febf	2018-12-07 22:20:38 +04:00
Miklos Szeredi	9cd79dd256	nfs: don't use nd->intent.open.flags Instead check LOOKUP_EXCL in nd->flags, which is basically what the open intent flags were used for. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Change-Id: I9afc0a255a45c8d976efdd17ab991b71fc3c41f3	2018-12-07 22:20:38 +04:00
Miklos Szeredi	d5abf4af6d	nfs: clean up ->create in nfs_rpc_ops Don't pass nfs_open_context() to ->create(). Only the NFS4 implementation needed that and only because it wanted to return an open file using open intents. That task has been replaced by ->atomic_open so it is not necessary anymore to pass the context to the create rpc operation. Despite nfs4_proc_create apparently being okay with a NULL context it Oopses somewhere down the call chain. So allocate a context here. Change-Id: I1241a81129cb80a7f06338969ea95f28e10d40f0 Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-12-07 22:20:38 +04:00
Miklos Szeredi	e785a439fe	nfs: implement i_op->atomic_open() Replace NFS4 specific ->lookup implementation with ->atomic_open impelementation and use the generic nfs_lookup for other lookups. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Change-Id: I287cb7db22925d56e7f37c8ae8869086e9e17841	2018-12-07 22:20:38 +04:00
Miklos Szeredi	ae55c0a967	nfs: don't open in ->d_revalidate NFSv4 can't do reliable opens in d_revalidate, since it cannot know whether a mount needs to be followed or not. It does check d_mountpoint() on the dentry, which can result in a weird error if the VFS found that the mount does not in fact need to be followed, e.g.: # mount --bind /mnt/nfs /mnt/nfs-clone # echo something > /mnt/nfs/tmp/bar # echo x > /tmp/file # mount --bind /tmp/file /mnt/nfs-clone/tmp/bar # cat /mnt/nfs/tmp/bar cat: /mnt/nfs/tmp/bar: Not a directory Which should, by any sane filesystem, result in "something" being printed. So instead do the open in f_op->open() and in the unlikely case that the cached dentry turned out to be invalid, drop the dentry and return EOPENSTALE to let the VFS retry. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Change-Id: I680d2732dd85ae175a2ad9142f42bb3db16dc533	2018-12-07 22:20:38 +04:00
Miklos Szeredi	02e5874d07	fuse: implement i_op->atomic_open() Add an ->atomic_open implementation which replaces the atomic open+create operation implemented via ->create. No functionality is changed. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Change-Id: I5d06b0b21d17a68854b2b2b22a15d25b75e07724	2018-12-07 22:20:38 +04:00
Miklos Szeredi	14b04ec2cd	cifs: implement i_op->atomic_open() Add an ->atomic_open implementation which replaces the atomic lookup+open+create operation implemented via ->lookup and ->create operations. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: Steve French <sfrench@samba.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Change-Id: I8b39432768b5b6336ffb643026e565f1aefc40f5	2018-12-07 22:20:38 +04:00
Miklos Szeredi	93c19dc2c5	vfs: move O_DIRECT check to common code Perform open_check_o_direct() in a common place in do_last after opening the file. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Change-Id: Icce292d477d2cb452acedd08ed35df5207294c07	2018-12-07 22:20:38 +04:00
Miklos Szeredi	ac77be7485	vfs: do_last(): clean up retry Move the lookup retry logic to the bottom of the function to make the normal case simpler to read. Reported-by: David Howells <dhowells@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Change-Id: I6f0d88aae8d935c1138b3170bf171a69d453ab32	2018-12-07 22:20:38 +04:00
Miklos Szeredi	6e8fd291fd	vfs: do_last(): clean up bool Consistently use bool for boolean values in do_last(). Reported-by: David Howells <dhowells@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Change-Id: I25241ab09fa97c0b27c9006c34691faf2ff53989	2018-12-07 22:20:38 +04:00
Miklos Szeredi	5fcdc5c28f	vfs: do_last(): clean up labels Reported-by: David Howells <dhowells@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Change-Id: I446f4ad62d8289b719b4de4402981839e5ad2dad	2018-12-07 22:20:38 +04:00
Miklos Szeredi	103f069454	vfs: do_last(): clean up error handling Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Change-Id: I1e699d2bc81792a4ce6cd52043fac881bef6757f	2018-12-07 22:20:38 +04:00
Miklos Szeredi	3e35ade5af	vfs: remove open intents from nameidata All users of open intents have been converted to use ->atomic_{open,create}. This patch gets rid of nd->intent.open and related infrastructure. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Change-Id: I9143c542eaa8f34e899e647d1e39fa289816639a	2018-12-07 22:20:38 +04:00
Miklos Szeredi	5bb6f9f985	vfs: add i_op->atomic_open() Add a new inode operation which is called on the last component of an open. Using this the filesystem can look up, possibly create and open the file in one atomic operation. If it cannot perform this (e.g. the file type turned out to be wrong) it may signal this by returning NULL instead of an open struct file pointer. i_op->atomic_open() is only called if the last component is negative or needs lookup. Handling cached positive dentries here doesn't add much value: these can be opened using f_op->open(). If the cached file turns out to be invalid, the open can be retried, this time using ->atomic_open() with a fresh dentry. For now leave the old way of using open intents in lookup and revalidate in place. This will be removed once all the users are converted. David Howells noticed that if ->atomic_open() opens the file but does not create it, handle_truncate() will be called on it even if it is not a regular file. Fix this by checking the file type in this case too. Change-Id: Ic8ccf5b34eb678bec8ab12ea2f5aebbf79f8473b Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-12-07 22:20:38 +04:00
Miklos Szeredi	73532754fb	vfs: lookup_open(): expand lookup_hash() Copy __lookup_hash() into lookup_open(). The next patch will insert the atomic open call just before the real lookup. Change-Id: I0764bfb794f152e95f3592a3c8171f3935ab7101 Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-12-07 22:20:38 +04:00
Miklos Szeredi	366e3dfec9	vfs: add lookup_open() Split out lookup + maybe create from do_last(). This is the part under i_mutex protection. The function is called lookup_open() and returns a filp even though the open part is not used yet. Change-Id: If0cfeff6b9a91306c1e3ad98ad7168e50308aea6 Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2018-12-07 22:20:38 +04:00
Linus Torvalds	6027c8669b	vfs: be even more careful about dentry RCU name lookups Miklos Szeredi points out that we need to also worry about memory odering when doing the dentry name comparison asynchronously with RCU. In particular, doing a rename can do a memcpy() of one dentry name over another, and we want to make sure that any unlocked reader will always see the proper terminating NUL character, so that it won't ever run off the allocation. Rather than having to be extra careful with the name copy or at lookup time for each character, this resolves the issue by making sure that all names that are inlined in the dentry always have a NUL character at the end of the name allocation. If we do that at dentry allocation time, we know that no future name copy will ever change that final NUL to anything else, so there are no memory ordering issues. So even if a concurrent rename ends up overwriting the NUL character that terminates the original name, we always know that there is one final NUL at the end, and there is no worry about the lockless RCU lookup traversing the name too far. The out-of-line allocations are never copied over, so we can just make sure that we write the name (with terminating NULL) and do a write barrier before we expose the name to anything else by setting it in the dentry. Reported-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Nick Piggin <npiggin@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Change-Id: I5890f1c8d9d207dcf508aa17b373a6d7e6d4e586	2018-12-07 22:20:38 +04:00
Linus Torvalds	89d83ed09c	vfs: make it possible to access the dentry hash/len as one 64-bit entry This allows comparing hash and len in one operation on 64-bit architectures. Right now only __d_lookup_rcu() takes advantage of this, since that is the case we care most about. The use of anonymous struct/unions hides the alternate 64-bit approach from most users, the exception being a few cases where we initialize a 'struct qstr' with a static initializer. This makes the problematic cases use a new QSTR_INIT() helper function for that (but initializing just the name pointer with a "{ .name = xyzzy }" initializer remains valid, as does just copying another qstr structure). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Change-Id: I710810eb8717563264428e39c99e62599d31907e	2018-12-07 22:20:38 +04:00
Linus Torvalds	e9214261df	vfs: move dentry name length comparison from dentry_cmp() into callers All callers do want to check the dentry length, but some of them can check the length and the hash together, so doing it in dentry_cmp() can be counter-productive. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Change-Id: I19bda91a21594b0150f8eecef0f2738984c3b13e	2018-12-07 22:20:38 +04:00
Linus Torvalds	d8b7256cdc	vfs: do the careful dentry name access for all dentry_cmp cases Commit `12f8ad4b05` ("vfs: clean up __d_lookup_rcu() and dentry_cmp() interfaces") did the careful ACCESS_ONCE() of the dentry name only for the word-at-a-time case, even though the issue is generic. Admittedly I don't really see gcc ever reloading the value in the middle of the loop, so the ACCESS_ONCE() protects us from a fairly theoretical issue. But better safe than sorry. Also, this consolidates the common parts of the word-at-a-time and bytewise logic, which includes checking the length. We'll be changing that later. Change-Id: Icd5773ff1910d0509e250eb65a469e99d0145795 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-12-07 22:20:15 +04:00

1 2 3 4 5 ...

27882 commits