Commit graph

27882 commits

Author SHA1 Message Date
Jeff Layton
87b37ef17f vfs: don't let do_last pass negative dentry to audit_inode
I can reliably reproduce the following panic by simply setting an audit
rule on a recent 3.5.0+ kernel:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
 IP: [<ffffffff810d1250>] audit_copy_inode+0x10/0x90
 PGD 7acd9067 PUD 7b8fb067 PMD 0
 Oops: 0000 [#86] SMP
 Modules linked in: nfs nfs_acl auth_rpcgss fscache lockd sunrpc tpm_bios btrfs zlib_deflate libcrc32c kvm_amd kvm joydev virtio_net pcspkr i2c_piix4 floppy virtio_balloon microcode virtio_blk cirrus drm_kms_helper ttm drm i2c_core [last unloaded: scsi_wait_scan]
 CPU 0
 Pid: 1286, comm: abrt-dump-oops Tainted: G      D      3.5.0+ #1 Bochs Bochs
 RIP: 0010:[<ffffffff810d1250>]  [<ffffffff810d1250>] audit_copy_inode+0x10/0x90
 RSP: 0018:ffff88007aebfc38  EFLAGS: 00010282
 RAX: 0000000000000000 RBX: ffff88003692d860 RCX: 00000000000038c4
 RDX: 0000000000000000 RSI: ffff88006baf5d80 RDI: ffff88003692d860
 RBP: ffff88007aebfc68 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
 R13: ffff880036d30f00 R14: ffff88006baf5d80 R15: ffff88003692d800
 FS:  00007f7562634740(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000040 CR3: 000000003643d000 CR4: 00000000000006f0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
 Process abrt-dump-oops (pid: 1286, threadinfo ffff88007aebe000, task ffff880079614530)
 Stack:
  ffff88007aebfdf8 ffff88007aebff28 ffff88007aebfc98 ffffffff81211358
  ffff88003692d860 0000000000000000 ffff88007aebfcc8 ffffffff810d4968
  ffff88007aebfcc8 ffff8800000038c4 0000000000000000 0000000000000000
 Call Trace:
  [<ffffffff81211358>] ? ext4_lookup+0xe8/0x160
  [<ffffffff810d4968>] __audit_inode+0x118/0x2d0
  [<ffffffff811955a9>] do_last+0x999/0xe80
  [<ffffffff81191fe8>] ? inode_permission+0x18/0x50
  [<ffffffff81171efa>] ? kmem_cache_alloc_trace+0x11a/0x130
  [<ffffffff81195b4a>] path_openat+0xba/0x420
  [<ffffffff81196111>] do_filp_open+0x41/0xa0
  [<ffffffff811a24bd>] ? alloc_fd+0x4d/0x120
  [<ffffffff811855cd>] do_sys_open+0xed/0x1c0
  [<ffffffff810d40cc>] ? __audit_syscall_entry+0xcc/0x300
  [<ffffffff811856c1>] sys_open+0x21/0x30
  [<ffffffff81611ca9>] system_call_fastpath+0x16/0x1b
  RSP <ffff88007aebfc38>
 CR2: 0000000000000040

The problem is that do_last is passing a negative dentry to audit_inode.
The comments on lookup_open note that it can pass back a negative dentry
if O_CREAT is not set.

This patch fixes the oops, but I'm not clear on whether there's a better
approach.

Cc: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change-Id: I92efc8c98013979b58e7eabcfc242c7143b5f928
2018-12-07 22:28:48 +04:00
Al Viro
579400a5b2 pull mnt_want_write()/mnt_drop_write() into kern_path_create()/done_path_create() resp.
One side effect - attempt to create a cross-device link on a read-only fs fails
with EROFS instead of EXDEV now.  Makes more sense, POSIX allows, etc.

Change-Id: I264b03a230dbd310f3b3671d2da06ceb2930179b
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-12-07 22:28:48 +04:00
Al Viro
9d58c3048f mknod: take sanity checks on mode into the very beginning
Note that applying umask can't affect their results.  While
that affects errno in cases like
	mknod("/no_such_directory/a", 030000)
yielding -EINVAL (due to impossible mode_t) instead of
-ENOENT (due to inexistent directory), IMO that makes a lot
more sense, POSIX allows to return either and any software
that relies on getting -ENOENT instead of -EINVAL in that
case deserves everything it gets.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change-Id: I1abb3e8ad247f3f48bde931d70e6f546126c62d7
2018-12-07 22:28:48 +04:00
Al Viro
2c1bd80538 new helper: done_path_create()
releases what needs to be released after {kern,user}_path_create()

Change-Id: If7fa7455e2ba8a6f4f4c4d2db502a38b4a60d7c7
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-12-07 22:28:48 +04:00
Al Viro
8c0b229166 tidy up namei.c a bit
locking/unlocking for rcu walk taken to a couple of inline helpers

Change-Id: I19f7f437641bb56f186f5d4c197425886f3625ca
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-12-07 22:28:48 +04:00
Al Viro
4aec9404b1 unobfuscate follow_up() a bit
really convoluted test in there has grown up during struct mount
introduction; what it checks is that we'd reached the root of
mount tree.

Change-Id: Ia48bdc985ae689345cfd409d8c81eb52fca6e014
2018-12-07 22:28:48 +04:00
Al Viro
82dd82dd48 use __lookup_hash() in kern_path_parent()
No need to bother with lookup_one_len() here - it's an overkill

Signed-off-by Al Viro <viro@zeniv.linux.org.uk>

Change-Id: I733256aed797e9c0ac52f9c7cbc17b40e5b151fe
2018-12-07 22:28:48 +04:00
Christoph Hellwig
0dae24aa48 fs: add nd_jump_link
Add a helper that abstracts out the jump to an already parsed struct path
from ->follow_link operation from procfs.  Not only does this clean up
the code by moving the two sides of this game into a single helper, but
it also prepares for making struct nameidata private to namei.c

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change-Id: If2392e9a3db44877f3976b543b12d3402cd29c22
2018-12-07 22:28:48 +04:00
Christoph Hellwig
049005ce30 fs: move path_put on failure out of ->follow_link
Currently the non-nd_set_link based versions of ->follow_link are expected
to do a path_put(&nd->path) on failure.  This calling convention is unexpected,
undocumented and doesn't match what the nd_set_link-based instances do.

Move the path_put out of the only non-nd_set_link based ->follow_link
instance into the caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change-Id: I6e06cf2be5425e752622a33eb63308bced33b0bb
2018-12-07 22:28:48 +04:00
David Howells
97733c93ab VFS: Fix the banner comment on lookup_open()
Since commit 197e37d9, the banner comment on lookup_open() no longer matches
what the function returns.  It used to return a struct file pointer or NULL and
now it returns an integer and is passed the struct file pointer it is to use
amongst its arguments.  Update the comment to reflect this.

Also add a banner comment to atomic_open().

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change-Id: Ia49cbec8cd15bd0b4af0b44bb16d79faa80947e0
2018-12-07 22:28:48 +04:00
Al Viro
2023b60cc3 don't pass nameidata * to vfs_create()
all we want is a boolean flag, same as the method gets now

Change-Id: I0cbe220b96bbbec6d50228cac774a0439f6a29f2
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-12-07 22:28:48 +04:00
Al Viro
dcb9cda2ea don't pass nameidata to ->create()
boolean "does it have to be exclusive?" flag is passed instead;
Local filesystem should just ignore it - the object is guaranteed
not to be there yet.

Change-Id: I25efea9892458f6f64070c62bd1adb5194dcd8c1
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-12-07 22:28:00 +04:00
Al Viro
4ff32315d8 fs/namei.c: don't pass nameidata to __lookup_hash() and lookup_real()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change-Id: Ib083d752c2295101e759ccd2fe17b01ddaaefaf2
2018-12-07 22:26:31 +04:00
Al Viro
66c4da2876 stop passing nameidata to ->lookup()
Just the flags; only NFS cares even about that, but there are
legitimate uses for such argument.  And getting rid of that
completely would require splitting ->lookup() into a couple
of methods (at least), so let's leave that alone for now...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change-Id: Id5a9a96c3202f724156c32fb266190334e7dbe48
2018-12-07 22:26:28 +04:00
Al Viro
fe5cfc12d0 fs/namei.c: don't pass namedata to lookup_dcache()
just the flags...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change-Id: Ie885bc825aa80573489a60d3cbd6ab4d3975ea7e
2018-12-07 22:26:13 +04:00
Al Viro
f033032252 fs/namei.c: don't pass nameidata to d_revalidate()
since the method wrapped by it doesn't need that anymore...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change-Id: I2d0b8680f4ff4dd4d46e0e9b4673370081929137
2018-12-07 22:26:13 +04:00
Al Viro
559bdce534 stop passing nameidata * to ->d_revalidate()
Just the lookup flags.  Die, bastard, die...

Change-Id: Ie1e6aa84316f14bd9f0a2d297bd5eb32c92c84fd
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-12-07 22:26:05 +04:00
Al Viro
5f47e78fd6 fs/namei.c: get do_last() and friends return int
Same conventions as for ->atomic_open().  Trimmed the
forest of labels a bit, while we are at it...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change-Id: I94a25b547d3caaf3c20e2b6fbe4183ac5e1b87d7
2018-12-07 22:20:38 +04:00
Al Viro
a716ffb5f1 fs/nfs/dir.c: switch to passing nd->flags instead of nd wherever possible
Change-Id: I747e6ec144891850ac9c7e57f09ca51dee6306ab
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-12-07 22:20:38 +04:00
Al Viro
b667b7b5a3 nfs_lookup_verify_inode() - nd is *always* non-NULL here
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change-Id: I71d8ef74d63b7d2b2af28c7a6c70a10ffedcc3c1
2018-12-07 22:20:38 +04:00
Al Viro
10af5b2c04 switch nfs_lookup_check_intent() away from nameidata
just pass the flags

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change-Id: I5954ce57c51db893b6153d4edcef37b72b158ad5
2018-12-07 22:20:38 +04:00
Al Viro
80c89c609f make finish_no_open() return int
namely, 1 ;-)  That's what we want to return from ->atomic_open()
instances after finish_no_open().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change-Id: Id629fb7d43cca5a4ca91802ba13b61aa95288d47
2018-12-07 22:20:38 +04:00
Al Viro
812f0dc61c kill struct opendata
Just pass struct file *.  Methods are happier that way...
There's no need to return struct file * from finish_open() now,
so let it return int.  Next: saner prototypes for parts in
namei.c

Change-Id: I984f0f992330c959a2f9703d9e7647ef340e2845
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-12-07 22:20:38 +04:00
Al Viro
1ae100dc44 kill opendata->{mnt,dentry}
->filp->f_path is there for purpose...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change-Id: I6d70a12b7a7541c95d4b812543cfe4b7933ae3fe
2018-12-07 22:20:38 +04:00
Al Viro
cb28cf9441 don't modify od->filp at all
make put_filp() conditional on flag set by finish_open()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change-Id: I79833cd7a54d635bad80c6fca31eb55631e96c8b
2018-12-07 22:20:38 +04:00
Al Viro
19cbdb4013 make ->atomic_open() return int
Change of calling conventions:
old		new
NULL		1
file		0
ERR_PTR(-ve)	-ve

Caller *knows* that struct file *; no need to return it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change-Id: I883d67181a0100447a2e077ed537ee393e862e0b
2018-12-07 22:20:38 +04:00
Al Viro
e465d5dd30 ->atomic_open() prototype change - pass int * instead of bool *
... and let finish_open() report having opened the file via that sucker.
Next step: don't modify od->filp at all.

[AV: FILE_CREATE was already used by cifs; Miklos' fix folded]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change-Id: I6ea0f871fab215a2901710392abbda88c80008c1
2018-12-07 22:20:38 +04:00
Miklos Szeredi
7589e33d50 ceph: implement i_op->atomic_open()
Add an ->atomic_open implementation which replaces the atomic lookup+open+create
operation implemented via ->lookup and ->create operations.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change-Id: I9cd73db22147a760ee2f69b498aacd16689908b1
2018-12-07 22:20:38 +04:00
Miklos Szeredi
80bfa0c48b ceph: remove unused arg from ceph_lookup_open()
What was the purpose of this?

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change-Id: I5f994a7aa9edc51f6e7ec4f746bf51332d5b496b
2018-12-07 22:20:38 +04:00
Miklos Szeredi
810978e1b8 9p: implement i_op->atomic_open()
Add an ->atomic_open implementation which replaces the atomic open+create
operation implemented via ->create.  No functionality is changed.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change-Id: I0c8c0998fcf940f603963a876ef2be825babf6a7
2018-12-07 22:20:38 +04:00
Miklos Szeredi
37d50c6e2b nfs: don't use intents for checking atomic open
is_atomic_open() is now only used by nfs4_lookup_revalidate() to check whether
it's okay to skip normal revalidation.

It does a racy check for mount read-onlyness and falls back to normal
revalidation if the open would fail.  This makes little sense now that this
function isn't used for determining whether to actually open the file or not.

The d_mountpoint() check still makes sense since it is an indication that we
might be following a mount and so open may not revalidate the dentry.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change-Id: Ibad58603956e406c63b4b7c63243502eabe6febf
2018-12-07 22:20:38 +04:00
Miklos Szeredi
9cd79dd256 nfs: don't use nd->intent.open.flags
Instead check LOOKUP_EXCL in nd->flags, which is basically what the open intent
flags were used for.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change-Id: I9afc0a255a45c8d976efdd17ab991b71fc3c41f3
2018-12-07 22:20:38 +04:00
Miklos Szeredi
d5abf4af6d nfs: clean up ->create in nfs_rpc_ops
Don't pass nfs_open_context() to ->create().  Only the NFS4 implementation
needed that and only because it wanted to return an open file using open
intents.  That task has been replaced by ->atomic_open so it is not necessary
anymore to pass the context to the create rpc operation.

Despite nfs4_proc_create apparently being okay with a NULL context it Oopses
somewhere down the call chain.  So allocate a context here.

Change-Id: I1241a81129cb80a7f06338969ea95f28e10d40f0
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-12-07 22:20:38 +04:00
Miklos Szeredi
e785a439fe nfs: implement i_op->atomic_open()
Replace NFS4 specific ->lookup implementation with ->atomic_open impelementation
and use the generic nfs_lookup for other lookups.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change-Id: I287cb7db22925d56e7f37c8ae8869086e9e17841
2018-12-07 22:20:38 +04:00
Miklos Szeredi
ae55c0a967 nfs: don't open in ->d_revalidate
NFSv4 can't do reliable opens in d_revalidate, since it cannot know whether a
mount needs to be followed or not.  It does check d_mountpoint() on the dentry,
which can result in a weird error if the VFS found that the mount does not in
fact need to be followed, e.g.:

  # mount --bind /mnt/nfs /mnt/nfs-clone
  # echo something > /mnt/nfs/tmp/bar
  # echo x > /tmp/file
  # mount --bind /tmp/file /mnt/nfs-clone/tmp/bar
  # cat  /mnt/nfs/tmp/bar
  cat: /mnt/nfs/tmp/bar: Not a directory

Which should, by any sane filesystem, result in "something" being printed.

So instead do the open in f_op->open() and in the unlikely case that the cached
dentry turned out to be invalid, drop the dentry and return EOPENSTALE to let
the VFS retry.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change-Id: I680d2732dd85ae175a2ad9142f42bb3db16dc533
2018-12-07 22:20:38 +04:00
Miklos Szeredi
02e5874d07 fuse: implement i_op->atomic_open()
Add an ->atomic_open implementation which replaces the atomic open+create
operation implemented via ->create.  No functionality is changed.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change-Id: I5d06b0b21d17a68854b2b2b22a15d25b75e07724
2018-12-07 22:20:38 +04:00
Miklos Szeredi
14b04ec2cd cifs: implement i_op->atomic_open()
Add an ->atomic_open implementation which replaces the atomic lookup+open+create
operation implemented via ->lookup and ->create operations.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Steve French <sfrench@samba.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change-Id: I8b39432768b5b6336ffb643026e565f1aefc40f5
2018-12-07 22:20:38 +04:00
Miklos Szeredi
93c19dc2c5 vfs: move O_DIRECT check to common code
Perform open_check_o_direct() in a common place in do_last after opening the
file.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change-Id: Icce292d477d2cb452acedd08ed35df5207294c07
2018-12-07 22:20:38 +04:00
Miklos Szeredi
ac77be7485 vfs: do_last(): clean up retry
Move the lookup retry logic to the bottom of the function to make the normal
case simpler to read.

Reported-by: David Howells <dhowells@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change-Id: I6f0d88aae8d935c1138b3170bf171a69d453ab32
2018-12-07 22:20:38 +04:00
Miklos Szeredi
6e8fd291fd vfs: do_last(): clean up bool
Consistently use bool for boolean values in do_last().

Reported-by: David Howells <dhowells@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change-Id: I25241ab09fa97c0b27c9006c34691faf2ff53989
2018-12-07 22:20:38 +04:00
Miklos Szeredi
5fcdc5c28f vfs: do_last(): clean up labels
Reported-by: David Howells <dhowells@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change-Id: I446f4ad62d8289b719b4de4402981839e5ad2dad
2018-12-07 22:20:38 +04:00
Miklos Szeredi
103f069454 vfs: do_last(): clean up error handling
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change-Id: I1e699d2bc81792a4ce6cd52043fac881bef6757f
2018-12-07 22:20:38 +04:00
Miklos Szeredi
3e35ade5af vfs: remove open intents from nameidata
All users of open intents have been converted to use ->atomic_{open,create}.

This patch gets rid of nd->intent.open and related infrastructure.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change-Id: I9143c542eaa8f34e899e647d1e39fa289816639a
2018-12-07 22:20:38 +04:00
Miklos Szeredi
5bb6f9f985 vfs: add i_op->atomic_open()
Add a new inode operation which is called on the last component of an open.
Using this the filesystem can look up, possibly create and open the file in one
atomic operation.  If it cannot perform this (e.g. the file type turned out to
be wrong) it may signal this by returning NULL instead of an open struct file
pointer.

i_op->atomic_open() is only called if the last component is negative or needs
lookup.  Handling cached positive dentries here doesn't add much value: these
can be opened using f_op->open().  If the cached file turns out to be invalid,
the open can be retried, this time using ->atomic_open() with a fresh dentry.

For now leave the old way of using open intents in lookup and revalidate in
place.  This will be removed once all the users are converted.

David Howells noticed that if ->atomic_open() opens the file but does not create
it, handle_truncate() will be called on it even if it is not a regular file.
Fix this by checking the file type in this case too.

Change-Id: Ic8ccf5b34eb678bec8ab12ea2f5aebbf79f8473b
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-12-07 22:20:38 +04:00
Miklos Szeredi
73532754fb vfs: lookup_open(): expand lookup_hash()
Copy __lookup_hash() into lookup_open().  The next patch will insert the atomic
open call just before the real lookup.

Change-Id: I0764bfb794f152e95f3592a3c8171f3935ab7101
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-12-07 22:20:38 +04:00
Miklos Szeredi
366e3dfec9 vfs: add lookup_open()
Split out lookup + maybe create from do_last().  This is the part under i_mutex
protection.

The function is called lookup_open() and returns a filp even though the open
part is not used yet.

Change-Id: If0cfeff6b9a91306c1e3ad98ad7168e50308aea6
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-12-07 22:20:38 +04:00
Linus Torvalds
6027c8669b vfs: be even more careful about dentry RCU name lookups
Miklos Szeredi points out that we need to also worry about memory
odering when doing the dentry name comparison asynchronously with RCU.

In particular, doing a rename can do a memcpy() of one dentry name over
another, and we want to make sure that any unlocked reader will always
see the proper terminating NUL character, so that it won't ever run off
the allocation.

Rather than having to be extra careful with the name copy or at lookup
time for each character, this resolves the issue by making sure that all
names that are inlined in the dentry always have a NUL character at the
end of the name allocation.  If we do that at dentry allocation time, we
know that no future name copy will ever change that final NUL to
anything else, so there are no memory ordering issues.

So even if a concurrent rename ends up overwriting the NUL character
that terminates the original name, we always know that there is one
final NUL at the end, and there is no worry about the lockless RCU
lookup traversing the name too far.

The out-of-line allocations are never copied over, so we can just make
sure that we write the name (with terminating NULL) and do a write
barrier before we expose the name to anything else by setting it in the
dentry.

Reported-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Nick Piggin <npiggin@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: I5890f1c8d9d207dcf508aa17b373a6d7e6d4e586
2018-12-07 22:20:38 +04:00
Linus Torvalds
89d83ed09c vfs: make it possible to access the dentry hash/len as one 64-bit entry
This allows comparing hash and len in one operation on 64-bit
architectures.  Right now only __d_lookup_rcu() takes advantage of this,
since that is the case we care most about.

The use of anonymous struct/unions hides the alternate 64-bit approach
from most users, the exception being a few cases where we initialize a
'struct qstr' with a static initializer.  This makes the problematic
cases use a new QSTR_INIT() helper function for that (but initializing
just the name pointer with a "{ .name = xyzzy }" initializer remains
valid, as does just copying another qstr structure).

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: I710810eb8717563264428e39c99e62599d31907e
2018-12-07 22:20:38 +04:00
Linus Torvalds
e9214261df vfs: move dentry name length comparison from dentry_cmp() into callers
All callers do want to check the dentry length, but some of them can
check the length and the hash together, so doing it in dentry_cmp() can
be counter-productive.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: I19bda91a21594b0150f8eecef0f2738984c3b13e
2018-12-07 22:20:38 +04:00
Linus Torvalds
d8b7256cdc vfs: do the careful dentry name access for all dentry_cmp cases
Commit 12f8ad4b05 ("vfs: clean up __d_lookup_rcu() and dentry_cmp()
interfaces") did the careful ACCESS_ONCE() of the dentry name only for
the word-at-a-time case, even though the issue is generic.

Admittedly I don't really see gcc ever reloading the value in the middle
of the loop, so the ACCESS_ONCE() protects us from a fairly theoretical
issue. But better safe than sorry.

Also, this consolidates the common parts of the word-at-a-time and
bytewise logic, which includes checking the length.  We'll be changing
that later.

Change-Id: Icd5773ff1910d0509e250eb65a469e99d0145795
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-07 22:20:15 +04:00
Miklos Szeredi
25bd1f6a36 vfs: do_last(): common slow lookup
Make the slow lookup part of O_CREAT and non-O_CREAT opens common.

This allows atomic_open to be hooked into the slow lookup part.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change-Id: I1e3c97737246f53fb4e33d2df5586e0b422aa30e
2018-12-07 22:12:51 +04:00
Miklos Szeredi
ef9b34cffd vfs: do_last(): separate O_CREAT specific code
Check O_CREAT on the slow lookup paths where necessary.  This allows the rest to
be shared with plain open.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change-Id: I498c7daabc652b940d2618f95580a97ed16fe129
2018-12-07 22:12:51 +04:00
Miklos Szeredi
8b85f578e2 vfs: do_last(): inline lookup_slow()
Copy lookup_slow() into do_last().

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change-Id: I3612576e8f753f9482e7e4cb294c146ddd94462d
2018-12-07 22:12:51 +04:00
Al Viro
c79698cc6c namei.c: let follow_link() do put_link() on failure
no need for kludgy "set cookie to ERR_PTR(...) because we failed
before we did actual ->follow_link() and want to suppress put_link()",
no pointless check in put_link() itself.

Callers checked if follow_link() has failed anyway; might as well
break out of their loops if that happened, without bothering
to call put_link() first.

[AV: folded fixes from hch]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change-Id: I240bcdcbf8a89b925716e69610def8c341dd2419
2018-12-07 22:12:51 +04:00
Miklos Szeredi
d1070d4ee7 vfs: retry last component if opening stale dentry
NFS optimizes away d_revalidates for last component of open.  This means that
open itself can find the dentry stale.

This patch allows the filesystem to return EOPENSTALE and the VFS will retry the
lookup on just the last component if possible.

If the lookup was done using RCU mode, including the last component, then this
is not possible since the parent dentry is lost.  In this case fall back to
non-RCU lookup.  Currently this is not used since NFS will always leave RCU
mode.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change-Id: Ie1f7465757dc9086ef0ffefe22c969ef3c6ddedb
2018-12-07 22:12:51 +04:00
Miklos Szeredi
9d86708f6c vfs: do_last() common post lookup
Now the post lookup code can be shared between O_CREAT and plain opens since
they are essentially the same.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change-Id: Ia366d92c0e4edc183bfa23218676083872ef10f0
2018-12-07 22:12:51 +04:00
Miklos Szeredi
70497cfc0f vfs: do_last(): add audit_inode before open
This allows this code to be shared between O_CREAT and plain opens.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change-Id: I07469a80d749749e2b161ef500afd98895b33e6a
2018-12-07 22:12:51 +04:00
Miklos Szeredi
17c45ceea1 vfs: do_last(): only return EISDIR for O_CREAT
This allows this code to be shared between O_CREAT and plain opens.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change-Id: I74a1b2a53fe009245bfce52dfc17f7628ac6d9c0
2018-12-07 22:12:51 +04:00
Miklos Szeredi
5f53e97907 vfs: do_last(): check LOOKUP_DIRECTORY
Check for ENOTDIR before finishing open.  This allows this code to be shared
between O_CREAT and plain opens.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change-Id: Ib5bcfc2548d45b8c632bb11aecb3f5618887d3d1
2018-12-07 22:12:51 +04:00
Miklos Szeredi
228e8884b4 vfs: do_last(): make ENOENT exit RCU safe
This will allow this code to be used in RCU mode.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change-Id: I72145c85cd20934d00bc9cbcb20efe42a636d592
2018-12-07 22:12:51 +04:00
Miklos Szeredi
725142ea33 vfs: make follow_link check RCU safe
This will allow this code to be used in RCU mode.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change-Id: I16990161f7e02dcdfd8979cb5594fe4a8207fca1
2018-12-07 22:12:51 +04:00
Miklos Szeredi
ae607d85d3 vfs: do_last(): use inode variable
Use helper variable instead of path->dentry->d_inode before complete_walk().
This will allow this code to be used in RCU mode.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change-Id: I8fd1525e9d0efd992451664c0fb545bbd4bb40d3
2018-12-07 22:12:51 +04:00
Miklos Szeredi
4447ee76a6 vfs: do_last(): inline walk_component()
Copy walk_component() into do_lookup().

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change-Id: I1568bb6b976521d4d01df892689e8e977b84af8f
2018-12-07 22:12:51 +04:00
Miklos Szeredi
d45ea2d979 vfs: do_last(): make exit RCU safe
Allow returning from do_last() with LOOKUP_RCU still set on the "out:" and
"exit:" labels.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change-Id: Ic63718547582cf09a2c0bbc86dcc9084bd080ffc
2018-12-07 22:12:51 +04:00
Miklos Szeredi
12f1e7dad9 vfs: split do_lookup()
Split do_lookup() into two functions:

  lookup_fast() - does cached lookup without i_mutex
  lookup_slow() - does lookup with i_mutex

Both follow managed dentries.

The new functions are needed by atomic_open.

Change-Id: Ic32255a88ff82a92622a20135eb034ad3fa1d5a7
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-12-07 22:12:51 +04:00
Linus Torvalds
d671fd6a09 vfs: clean up __d_lookup_rcu() and dentry_cmp() interfaces
The calling conventions for __d_lookup_rcu() and dentry_cmp() are
annoying in different ways, and there is actually one single underlying
reason for both of the annoyances.

The fundamental reason is that we do the returned dentry sequence number
check inside __d_lookup_rcu() instead of doing it in the caller.  This
results in two annoyances:

 - __d_lookup_rcu() now not only needs to return the dentry and the
   sequence number that goes along with the lookup, it also needs to
   return the inode pointer that was validated by that sequence number
   check.

 - and because we did the sequence number check early (to validate the
   name pointer and length) we also couldn't just pass the dentry itself
   to dentry_cmp(), we had to pass the counted string that contained the
   name.

So that sequence number decision caused two separate ugly calling
conventions.

Both of these problems would be solved if we just did the sequence
number check in the caller instead.  There's only one caller, and that
caller already has to do the sequence number check for the parent
anyway, so just do that.

That allows us to stop returning the dentry->d_inode in that in-out
argument (pointer-to-pointer-to-inode), so we can make the inode
argument just a regular input inode pointer.  The caller can just load
the inode from dentry->d_inode, and then do the sequence number check
after that to make sure that it's synchronized with the name we looked
up.

And it allows us to just pass in the dentry to dentry_cmp(), which is
what all the callers really wanted.  Sure, dentry_cmp() has to be a bit
careful about the dentry (which is not stable during RCU lookup), but
that's actually very simple.

And now that dentry_cmp() can clearly see that the first string argument
is a dentry, we can use the direct word access for that, instead of the
careful unaligned zero-padding.  The dentry name is always properly
aligned, since it is a single path component that is either embedded
into the dentry itself, or was allocated with kmalloc() (see __d_alloc).

Finally, this also uninlines the nasty slow-case for dentry comparisons:
that one *does* need to do a sequence number check, since it will call
in to the low-level filesystems, and we want to give those a stable
inode pointer and path component length/start arguments.  Doing an extra
sequence check for that slow case is not a problem, though.

Change-Id: If75355452c45e51c72838fa50e914df017b42531
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-07 22:12:51 +04:00
Linus Torvalds
51f831d729 VFS: clean up and simplify getname_flags()
This removes a number of silly games around strncpy_from_user() in
do_getname(), and removes that helper function entirely.  We instead
make getname_flags() just use strncpy_from_user() properly directly.

Removing the wrapper function simplifies things noticeably, mostly
because we no longer play the unnecessary games with segments (x86
strncpy_from_user() no longer needs the hack), but also because the
empty path handling is just much more obvious.  The return value of
"strncpy_to_user()" is much more obvious than checking an odd error
return case from do_getname().

[ non-x86 architectures were notified of this change several weeks ago,
  since it is possible that they have copied the old broken x86
  strncpy_from_user. But nobody reacted, so .. See

    http://www.spinics.net/lists/linux-arch/msg17313.html

  for details ]

Cc: linux-arch@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: Ic049b87071c10330b1cd3f864dd0c4c5d98464df
2018-12-07 22:12:51 +04:00
Eric W. Biederman
068f467ec8 vfs: Don't allow a user namespace root to make device nodes
Safely making device nodes in a container is solvable but simply
having the capability in a user namespace is not sufficient to make
this work.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Change-Id: I4eb0afd78bb4a8b106dca3002c11ae81caae9e1d
2018-12-07 22:12:51 +04:00
Miklos Szeredi
86e4719bc3 vfs: nameidata_to_filp(): don't throw away file on error
If open fails, don't put the file.  This allows it to be reused if open needs to
be retried.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change-Id: I78695bdf3ab30e2f313f1fb5ec79c9cd572f4c55
2018-12-07 22:12:51 +04:00
Miklos Szeredi
9355cee272 vfs: nameidata_to_filp(): inline __dentry_open()
Copy __dentry_open() into nameidata_to_filp().

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change-Id: If1d3ff701de987e426c3a4c813efff3d1d0db181
2018-12-07 22:12:51 +04:00
Miklos Szeredi
1e3571fc93 vfs: do_dentry_open(): don't put filp
Move put_filp() out to __dentry_open(), the only caller now.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change-Id: Ica706ac4611a66cd6e637650bd8143148fa95e44
2018-12-07 22:12:51 +04:00
Miklos Szeredi
9b645714fb vfs: split __dentry_open()
Split __dentry_open() into two functions:

  do_dentry_open() - does most of the actual work, doesn't put file on failure
  open_check_o_direct() - after a successful open, checks direct_IO method

This will allow i_op->atomic_open to do just the file initialization and leave
the direct_IO checking to the VFS.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change-Id: Ieb8a4e7c4ec9a12636c184d757a5f7a49c5e75df
2018-12-07 22:12:51 +04:00
Andrea Arcangeli
1fd1850bf6 fs/exec: fix use after free in execve
"file" can be already freed if bprm->file is NULL after
search_binary_handler() return. binfmt_script will do exactly that for
example. If the VM reuses the file after fput run(), this will result in
a use ater free.

So obtain d_is_su before search_binary_handler() runs.

This should explain this crash:

[25333.009554] Unable to handle kernel NULL pointer dereference at virtual address 00000185
[..]
[25333.009918] [2:             am:21861] PC is at do_execve+0x354/0x474

Change-Id: I2a8a814d1c0aa75625be83cb30432cf13f1a0681
Signed-off-by: Kevin F. Haggerty <haggertk@lineageos.org>
2018-02-16 20:15:06 -07:00
Daniel Rosenberg
b660e27533 ANDROID: sdcardfs: Fix missing break on default_normal
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 64672411
Change-Id: I98796df95dc9846adb77a11f49a1a254fb1618b1
2018-01-13 17:25:53 +03:00
Daniel Rosenberg
783ca29469 ANDROID: sdcardfs: Add default_normal option
The default_normal option causes mounts with the gid set to
AID_SDCARD_RW to have user specific gids, as in the normal case.

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Change-Id: I9619b8ac55f41415df943484dc8db1ea986cef6f
Bug: 64672411
2018-01-13 17:25:30 +03:00
Daniel Rosenberg
4ac97e3645 ANDROID: sdcardfs: notify lower file of opens
fsnotify_open is not called within dentry_open,
so we need to call it ourselves.

Change-Id: Ia7f323b3d615e6ca5574e114e8a5d7973fb4c119
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 70706497
2018-01-13 17:25:26 +03:00
Al Viro
b7f0468bb9 BACKPORT: dentry name snapshots
commit 49d31c2f389acfe83417083e1208422b4091cd9e upstream.

take_dentry_name_snapshot() takes a safe snapshot of dentry name;
if the name is a short one, it gets copied into caller-supplied
structure, otherwise an extra reference to external name is grabbed
(those are never modified).  In either case the pointer to stable
string is stored into the same structure.

dentry must be held by the caller of take_dentry_name_snapshot(),
but may be freely dropped afterwards - the snapshot will stay
until destroyed by release_dentry_name_snapshot().

Intended use:
	struct name_snapshot s;

	take_dentry_name_snapshot(&s, dentry);
	...
	access s.name
	...
	release_dentry_name_snapshot(&s);

Replaces fsnotify_oldname_...(), gets used in fsnotify to obtain the name
to pass down with event.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[carnil: backport 4.9: adjust context]
[bwh: Backported to 3.16:
 - External names are not ref-counted, so copy them
 - Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ghackmann@google.com: backported to 3.10: adjust context]
Signed-off-by: Greg Hackmann <ghackmann@google.com>

Change-Id: I612e687cbffa1a03107331a6b3f00911ffbebd8e
Bug: 63689921
2018-01-13 17:13:38 +03:00
Shaohua Li
df680e4101 swap: make each swap partition have one address_space
When I use several fast SSD to do swap, swapper_space.tree_lock is
heavily contended.  This makes each swap partition have one
address_space to reduce the lock contention.  There is an array of
address_space for swap.  The swap entry type is the index to the array.

In my test with 3 SSD, this increases the swapout throughput 20%.

[akpm@linux-foundation.org: revert unneeded change to  __add_to_swap_cache]
Signed-off-by: Shaohua Li <shli@fusionio.com>
Cc: Hugh Dickins <hughd@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Change-Id: I8503ace83342398bf7be3d2216616868cca65311
2018-01-01 22:02:05 +03:00
Jan Kara
24f88dea2a fs/sync.c: make sync_file_range(2) use WB_SYNC_NONE writeback
sync_file_range(2) is documented to issue writeback only for pages that
are not currently being written.  After all the system call has been
created for userspace to be able to issue background writeout and so
waiting for in-flight IO is undesirable there.  However commit
ee53a891f4 ("mm: do_sync_mapping_range integrity fix") switched
do_sync_mapping_range() and thus sync_file_range() to issue writeback in
WB_SYNC_ALL mode since do_sync_mapping_range() was used by other code
relying on WB_SYNC_ALL semantics.

These days do_sync_mapping_range() went away and we can switch
sync_file_range(2) back to issuing WB_SYNC_NONE writeback.  That should
help PostgreSQL avoid large latency spikes when flushing data in the
background.

Andres measured a 20% increase in transactions per second on an SSD disk.

Signed-off-by: Jan Kara <jack@suse.com>
Reported-by: Andres Freund <andres@anarazel.de>
Tested-By: Andres Freund <andres@anarazel.de>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-31 13:02:49 +03:00
Minchan Kim
cb7925d8da BACKPORT: mm: /proc/pid/smaps:: show proportional swap share of the mapping
We want to know per-process workingset size for smart memory management
on userland and we use swap(ex, zram) heavily to maximize memory
efficiency so workingset includes swap as well as RSS.

On such system, if there are lots of shared anonymous pages, it's really
hard to figure out exactly how many each process consumes memory(ie, rss
+ wap) if the system has lots of shared anonymous memory(e.g, android).

This patch introduces SwapPss field on /proc/<pid>/smaps so we can get
more exact workingset size per process.

Bongkyu tested it. Result is below.

1. 50M used swap
SwapTotal: 461976 kB
SwapFree: 411192 kB

$ adb shell cat /proc/*/smaps | grep "SwapPss:" | awk '{sum += $2} END {print sum}';
48236
$ adb shell cat /proc/*/smaps | grep "Swap:" | awk '{sum += $2} END {print sum}';
141184

2. 240M used swap
SwapTotal: 461976 kB
SwapFree: 216808 kB

$ adb shell cat /proc/*/smaps | grep "SwapPss:" | awk '{sum += $2} END {print sum}';
230315
$ adb shell cat /proc/*/smaps | grep "Swap:" | awk '{sum += $2} END {print sum}';
1387744

[akpm@linux-foundation.org: simplify kunmap_atomic() call]
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: Bongkyu Kim <bongkyu.kim@lge.com>
Tested-by: Bongkyu Kim <bongkyu.kim@lge.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 26190646
Change-Id: Idf92d682fdef432bdd66e530a7e7cdff8f375db1
Signed-off-by: Thierry Strudel <tstrudel@google.com>
2017-12-27 22:48:40 +03:00
JP Abgrall
16c096a3f5 ext4: Add support for FIDTRIM, a best-effort ioctl for deep discard trim
* What
This provides an interface for issuing an FITRIM which uses the
secure discard instead of just a discard.
Only the eMMC command is "secure", and not how the FS uses it:
due to the fact that the FS might reassign a region somewhere else,
the original deleted data will not be affected by the "trim" which only
handles un-used regions.
So we'll just call it "deep discard", and note that this is a
"best effort" cleanup.

* Why
Once in a while, We want to be able to cleanup most of the unused blocks
after erasing a bunch of files.
We don't want to constantly secure-discard via a mount option.

From an eMMC spec perspective, it tells the device to really get rid of
all the data for the specified blocks and not just put them back into the
pool of free ones (unlike the normal TRIM). The eMMC spec says the
secure trim handling must make sure the data (and metadata) is not available
anymore. A simple TRIM doesn't clear the data, it just puts blocks in the
free pool.
JEDEC Standard No. 84-A441
  7.6.9 Secure Erase
  7.6.10 Secure Trim

From an FS perspective, it is acceptable to leave some data behind.
 - directory entries related to deleted files
 - databases entries related to deleted files
 - small-file data stored in inode extents
 - blocks held by the FS waiting to be re-used (mitigated by sync).
 - blocks reassigned by the FS prior to FIDTRIM.

Change-Id: I676a1404a80130d93930c84898360f2e6fb2f81e
Signed-off-by: Geremy Condra <gcondra@google.com>
Signed-off-by: JP Abgrall <jpa@google.com>
2017-12-27 22:40:01 +03:00
Artem Borisov
d7992e6feb Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1
All bluetooth-related changes were omitted because of our ancient incompatible bt stack.

Change-Id: I96440b7be9342a9c1adc9476066272b827776e64
2017-12-27 17:13:15 +03:00
Michael Kerrisk
a467c2b9c8 PM: Rename CAP_EPOLLWAKEUP to CAP_BLOCK_SUSPEND
As discussed in
http://thread.gmane.org/gmane.linux.kernel/1249726/focus=1288990,
the capability introduced in 4d7e30d989
to govern EPOLLWAKEUP seems misnamed: this capability is about governing
the ability to suspend the system, not using a particular API flag
(EPOLLWAKEUP). We should make the name of the capability more general
to encourage reuse in related cases. (Whether or not this capability
should also be used to govern the use of /sys/power/wake_lock is a
question that needs to be separately resolved.)

This patch renames the capability to CAP_BLOCK_SUSPEND. In order to ensure
that the old capability name doesn't make it out into the wild, could you
please apply and push up the tree to ensure that it is incorporated
for the 3.5 release.

Change-Id: Id7abf9a14f0a4b21c02eee057aff48687326c750
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2017-12-15 16:47:14 +03:00
Rafael J. Wysocki
60bf2e03ee epoll: Fix user space breakage related to EPOLLWAKEUP
Commit 4d7e30d (epoll: Add a flag, EPOLLWAKEUP, to prevent
suspend while epoll events are ready) caused some applications to
malfunction, because they set the bit corresponding to the new
EPOLLWAKEUP flag in their eventpoll flags and they don't have the
new CAP_EPOLLWAKEUP capability.

To prevent that from happening, change epoll_ctl() to clear
EPOLLWAKEUP in epds.events if the caller doesn't have the
CAP_EPOLLWAKEUP capability instead of failing and returning an
error code, which allows the affected applications to function
normally.

Change-Id: I266634be5e16d3390fd1c62686a215af215c8d51
Reported-and-tested-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2017-12-15 16:47:02 +03:00
Arve Hjønnevåg
bf3ef86fe9 epoll: Add a flag, EPOLLWAKEUP, to prevent suspend while epoll events are ready
When an epoll_event, that has the EPOLLWAKEUP flag set, is ready, a
wakeup_source will be active to prevent suspend. This can be used to
handle wakeup events from a driver that support poll, e.g. input, if
that driver wakes up the waitqueue passed to epoll before allowing
suspend.

Change-Id: I522bfcf488bb4817336e682b22bdfb1e0beaf3e4
Signed-off-by: Arve Hjønnevåg <arve@android.com>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2017-12-15 16:46:49 +03:00
Paul Keith
e675a50f40 sdcardfs: Backport and use some 3.10 hlist/hash macros
* Fixes NPD when accessing /config/sdcardfs/packages_gid.list

Change-Id: I4b628ffab5e8a83642439661f97f720946f31daf
Signed-off-by: Paul Keith <javelinanddart@gmail.com>
2017-10-06 10:28:58 +03:00
Andrew Ruder
07b074b39e fs/super.c: sync ro remount after blocking writers
Move sync_filesystem() after sb_prepare_remount_readonly().  If writers
sneak in anywhere from sync_filesystem() to sb_prepare_remount_readonly()
it can cause inodes to be dirtied and writeback to occur well after
sys_mount() has completely successfully.

This was spotted by corrupted ubifs filesystems on reboot, but appears
that it can cause issues with any filesystem using writeback.

CRs-Fixed: 627559
Change-Id: Ib417b59d39210aab2de4e5ae48b18129e8bc3e26
Cc: Artem Bityutskiy <dedekind1@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
CC: Richard Weinberger <richard@nod.at>
Co-authored-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Andrew Ruder <andrew.ruder@elecsyscorp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Git-commit: 807612db2f
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
2017-10-06 10:28:36 +03:00
Daniel Rosenberg
f0226e8e93 ANDROID: sdcardfs: Add missing break
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 63245673
Change-Id: I5fc596420301045895e5a9a7e297fd05434babf9
2017-09-22 19:12:39 +03:00
Daniel Rosenberg
b58229a421 ANDROID: Sdcardfs: Move gid derivation under flag
This moves the code to adjust the gid/uid of lower filesystem
files under the mount flag derive_gid.

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Change-Id: I44eaad4ef67c7fcfda3b6ea3502afab94442610c
Bug: 63245673
2017-09-22 19:12:39 +03:00
Daniel Rosenberg
7a97e952ea ANDROID: mnt: Fix freeing of mount data
Fix double free on error paths

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Change-Id: I1c25a175e87e5dd5cafcdcf9d78bf4c0dc3f88ef
Bug: 65386954
Fixes: aa6d3ace42f9 ("mnt: Add filesystem private data to mount points")
2017-09-22 19:12:38 +03:00
Jaegeuk Kim
fc4f69b077 ANDROID: sdcardfs: override credential for ioctl to lower fs
Otherwise, lower_fs->ioctl() fails due to inode_owner_or_capable().

Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
Bug: 63260873
Change-Id: I623a6c7c5f8a3cbd7ec73ef89e18ddb093c43805
2017-09-22 19:12:38 +03:00
Andrew Chant
09d9a821ee sdcardfs: limit stacking depth
Limit filesystem stacking to prevent stack overflow.

Bug: 32761463
Change-Id: I8b1462b9c0d6c7f00cf110724ffb17e7f307c51e
Signed-off-by: Andrew Chant <achant@google.com>
CVE-2014-9922
2017-09-22 19:12:37 +03:00
Miklos Szeredi
511ae95677 BACKPORT: fs: limit filesystem stacking depth
Add a simple read-only counter to super_block that indicates how deep this
is in the stack of filesystems.  Previously ecryptfs was the only stackable
filesystem and it explicitly disallowed multiple layers of itself.

Overlayfs, however, can be stacked recursively and also may be stacked
on top of ecryptfs or vice versa.

To limit the kernel stack usage we must limit the depth of the
filesystem stack.  Initially the limit is set to 2.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>

(cherry picked from commit 69c433ed2e)

Bug: 32761463
Change-Id: I69b2fba2112db2ece09a1bf61a44f8fc4db00820
CVE-2014-9922
2017-09-22 19:12:37 +03:00
Daniel Rosenberg
b4b6b9d276 ANDROID: sdcardfs: Remove unnecessary lock
The mmap_sem lock does not appear to be protecting
anything, and has been removed in Samsung's more
recent versions of sdcardfs.

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Change-Id: I76ff3e33002716b8384fc8be368028ed63dffe4e
Bug: 63785372
2017-09-22 19:12:37 +03:00
Gao Xiang
aa7652a14a ANDROID: sdcardfs: use mount_nodev and fix a issue in sdcardfs_kill_sb
Use the VFS mount_nodev instead of customized mount_nodev_with_options
and fix generic_shutdown_super to kill_anon_super because of set_anon_super

Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Change-Id: Ibe46647aa2ce49d79291aa9d0295e9625cfccd80
2017-09-22 19:12:36 +03:00
Greg Hackmann
b429963b29 ANDROID: sdcardfs: remove dead function open_flags_to_access_mode()
smatch warns about the suspicious formatting in the last line of
open_flags_to_access_mode().  It turns out the only caller was deleted
over a year ago by "ANDROID: sdcardfs: Bring up to date with Android M
permissions:", so we can "fix" the function's formatting by deleting it.

Change-Id: Id85946f3eb01722eef35b1815f405a6fda3aa4ff
Signed-off-by: Greg Hackmann <ghackmann@google.com>
2017-09-22 19:12:36 +03:00
Daniel Rosenberg
72c554da0c ANDROID: sdcardfs: d_splice_alias can return error values
We must check that d_splice_alias was successful before using its
output.

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 62390017
Change-Id: Ifda0a052fb3f67e35c635a4e5e907876c5400978
2017-09-22 19:12:35 +03:00
Daniel Rosenberg
89a658a855 ANDROID: mnt: Fix next_descendent
next_descendent did not properly handle the case
where the initial mount had no slaves. In this case,
we would look for the next slave, but since don't
have a master, the check for wrapping around to the
start of the list will always fail. Instead, we check
for this case, and ensure that we end the iteration
when we come back to the root.

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 62094374
Change-Id: I43dfcee041aa3730cb4b9a1161418974ef84812e
2017-09-22 19:12:35 +03:00
Daniel Rosenberg
443df072f0 ANDROID: sdcardfs: Check for NULL in revalidate
If the inode is in the process of being evicted,
the top value may be NULL.

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 38502532
Change-Id: I0b9d04aab621e0398d44d1c5dc53293106aa5f89
2017-09-22 19:12:35 +03:00
Dmitry Shmidt
684ee76f0e ANDROID: sdcardfs: Add linux/kref.h include
Change-Id: I8be0f6fc7aa6dc1d639d2d22b230783c68574389
Signed-off-by: Dmitry Shmidt <dimitrysh@google.com>
2017-09-22 19:12:34 +03:00
Daniel Rosenberg
8472923d21 ANDROID: sdcardfs: Move top to its own struct
Move top, and the associated data, to its own struct.
This way, we can properly track refcounts on top
without interfering with the inode's accounting.

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 38045152
Change-Id: I1968e480d966c3f234800b72e43670ca11e1d3fd
2017-09-22 19:12:34 +03:00
Gao Xiang
1f52ba5673 ANDROID: sdcardfs: fix sdcardfs_destroy_inode for the inode RCU approach
According to the following commits,
fs: icache RCU free inodes
vfs: fix the stupidity with i_dentry in inode destructors

sdcardfs_destroy_inode should be fixed for the fast path safety.

Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Change-Id: I84f43c599209d23737c7e28b499dd121cb43636d
2017-09-22 19:12:33 +03:00
Daniel Roseberg
62b650471f ANDROID: sdcardfs: Don't iput if we didn't igrab
If we fail to get top, top is either NULL, or igrab found
that we're in the process of freeing that inode, and did
not grab it. Either way, we didn't grab it, and have no
business putting it.

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 38117720
Change-Id: Ie2f587483b9abb5144263156a443e89bc69b767b
2017-09-22 19:12:33 +03:00
Daniel Rosenberg
e43a6b01eb ANDROID: sdcardfs: Call lower fs's revalidate
We should be calling the lower filesystem's revalidate
inside of sdcardfs's revalidate, as wrapfs does.

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 35766959
Change-Id: I939d1c4192fafc1e21678aeab43fe3d588b8e2f4
2017-09-22 19:12:33 +03:00
Daniel Rosenberg
88598fa8ae ANDROID: sdcardfs: Avoid setting GIDs outside of valid ranges
When setting up the ownership of files on the lower filesystem,
ensure that these values are in reasonable ranges for apps. If
they aren't, default to AID_MEDIA_RW

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 37516160
Change-Id: I0bec76a61ac72aff0b993ab1ad04be8382178a00
2017-09-22 19:12:32 +03:00
Daniel Rosenberg
e293feb047 Revert "Revert "Android: sdcardfs: Don't do d_add for lower fs""
This reverts commit ffa75fdb9c408f49b9622b6d55752ed99ff61488.

Turns out we just needed the right hash.

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 37231161
Change-Id: I6a6de7f7df99ad42b20fa062913b219f64020c31
2017-09-22 19:12:32 +03:00
Daniel Rosenberg
24cce439d5 ANDROID: sdcardfs: Use filesystem specific hash
We weren't accounting for FS specific hash functions,
causing us to miss negative dentries for any FS that
had one.

Similar to a patch from esdfs
commit 75bd25a9476d ("esdfs: support lower's own hash")

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Change-Id: I32d1ba304d728e0ca2648cacfb4c2e441ae63608
2017-09-22 19:12:32 +03:00
Daniel Rosenberg
b380b6ee1c Revert "Android: sdcardfs: Don't do d_add for lower fs"
This reverts commit 60df9f12992bc067216078ae756066c5d7c74d87.

This change caused issues for sdcardfs on top of vfat

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Change-Id: Ie56a91fda582af27921cc1a9de7ae19a9a988f2a
2017-09-22 19:12:31 +03:00
Daniel Rosenberg
d3c7b5afda Android: sdcardfs: Don't complain in fixup_lower_ownership
Not all filesystems support changing the owner of a file.
We shouldn't complain if it doesn't happen.

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 37488099
Change-Id: I403e44ab7230f176e6df82f6adb4e5c82ce57f33
2017-09-22 19:12:31 +03:00
Daniel Rosenberg
cbfba1be5a Android: sdcardfs: Don't do d_add for lower fs
For file based encryption, ext4 explicitly does not
create negative dentries for encrypted files. If you
force one over it, the decrypted file will be hidden
until the cache is cleared. Instead, just fail out.

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 37231161
Change-Id: Id2a9708dfa75e1c22f89915c529789caadd2ca4b
2017-09-22 19:12:30 +03:00
Daniel Rosenberg
7727a9e52e ANDROID: sdcardfs: ->iget fixes
Adapted from wrapfs
commit 8c49eaa0sb9c ("Wrapfs: ->iget fixes")

Change where we igrab/iput to ensure we always hold a valid lower_inode.
Return ENOMEM (not EACCES) if iget5_locked returns NULL.

Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 35766959

Change-Id: Id8d4e0c0cbc685a0a77685ce73c923e9a3ddc094
2017-09-22 19:12:30 +03:00
Daniel Rosenberg
d2f7577f20 Android: sdcardfs: Change cache GID value
Change-Id: Ieb955dd26493da26a458bc20fbbe75bca32b094f
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 37193650
2017-09-22 19:12:29 +03:00
Daniel Rosenberg
c3da3c511e ANDROID: sdcardfs: Directly pass lower file for mmap
Instead of relying on a copy hack, pass the lower file
as private data. This lets the kernel find the vma
mapping for pages used by the file, allowing pages
used by mapping to be reclaimed.

This is adapted from following esdfs patches
commit 0647e638d: ("esdfs: store lower file in vm_file for mmap")
commit 064850866: ("esdfs: keep a counter for mmaped file")

Change-Id: I75b74d1e5061db1b8c13be38d184e118c0851a1a
Signed-off-by: Daniel Rosenberg <drosen@google.com>
2017-09-22 19:12:29 +03:00
Daniel Rosenberg
7fc65bd919 ANDROID: sdcardfs: update module info
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Change-Id: I958c7c226d4e9265fea8996803e5b004fb33d8ad
2017-09-22 19:12:29 +03:00
Daniel Rosenberg
5dc3989ffd ANDROID: sdcardfs: use d_splice_alias
adapted from wrapfs
commit 9671770ff8b9 ("Wrapfs: use d_splice_alias")

Refactor interpose code to allow lookup to use d_splice_alias.

Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 35766959
Change-Id: Icf51db8658202c48456724275b03dc77f73f585b
2017-09-22 19:12:28 +03:00
Daniel Rosenberg
dde08eb9d7 ANDROID: sdcardfs: fix ->llseek to update upper and lower offset
Adapted from wrapfs
commit 1d1d23a47baa ("Wrapfs: fix ->llseek to update upper and lower
offsets")

Fixes bug: xfstests generic/257. f_pos consistently is required by and
only by dir_ops->wrapfs_readdir, main_ops is not affected.

Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Signed-off-by: Mengyang Li <li.mengyang@stonybrook.edu>
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 35766959
Change-Id: I360a1368ac37ea8966910a58972b81504031d437
2017-09-22 19:12:28 +03:00
Daniel Rosenberg
4923777bb0 ANDROID: sdcardfs: copy lower inode attributes in ->ioctl
Adapted from wrapfs
commit fbc9c6f83ea6 ("Wrapfs: copy lower inode attributes in ->ioctl")
commit e97d8e26cc9e ("Wrapfs: use file_inode helper")

Some ioctls (e.g., EXT2_IOC_SETFLAGS) can change inode attributes, so copy
them from lower inode.

Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 35766959
Change-Id: I0f12684b9dbd4088b4a622c7ea9c03087f40e572
2017-09-22 19:12:28 +03:00
Daniel Rosenberg
21af28ae07 ANDROID: sdcardfs: remove unnecessary call to do_munmap
Adapted from wrapfs
commit 5be6de9ecf02 ("Wrapfs: use vm_munmap in ->mmap")
commit 2c9f6014a8bb ("Wrapfs: remove unnecessary call
to vm_unmap in ->mmap")

Code is unnecessary and causes deadlocks in newer kernels.

Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 35766959
Change-Id: Ia252d60c60799d7e28fc5f1f0f5b5ec2430a2379
2017-09-22 19:12:27 +03:00
Daniel Rosenberg
2761faa286 ANDROID: sdcardfs: Fix style issues in macros
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 35331000
Change-Id: I89c4035029dc2236081a7685c55cac595d9e7ebf
2017-09-22 19:12:27 +03:00
Daniel Rosenberg
b3fd5e6086 ANDROID: sdcardfs: Use seq_puts over seq_printf
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 35331000
Change-Id: I3795ec61ce61e324738815b1ce3b0e09b25d723f
2017-09-22 19:12:26 +03:00
Daniel Rosenberg
3a6d093272 ANDROID: sdcardfs: Use to kstrout
Switch from deprecated simple_strtoul to kstrout

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 35331000
Change-Id: If18bd133b4d2877f71e58b58fc31371ff6613ed5
2017-09-22 19:12:26 +03:00
Daniel Rosenberg
a1f2d9d927 ANDROID: sdcardfs: Use pr_[...] instead of printk
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 35331000
Change-Id: Ibc635ec865750530d32b87067779f681fe58a003
2017-09-22 19:12:26 +03:00
Daniel Rosenberg
def7e34e8b ANDROID: sdcardfs: remove unneeded null check
As pointed out by checkpatch, these functions already
handle null inputs, so the checks are not needed.

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 35331000
Change-Id: I189342f032dfcefee36b27648bb512488ad61d20
2017-09-22 19:12:24 +03:00
Daniel Rosenberg
1ff2bf9007 ANDROID: sdcardfs: Fix style issues with comments
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 35331000
Change-Id: I8791ef7eac527645ecb9407908e7e5ece35b8f80
2017-09-22 19:12:23 +03:00
Daniel Rosenberg
1fb0168abb ANDROID: sdcardfs: Fix formatting
This fixes various spacing and bracket related issues
pointed out by checkpatch.

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 35331000
Change-Id: I6e248833a7a04e3899f3ae9462d765cfcaa70c96
2017-09-22 19:12:23 +03:00
Daniel Rosenberg
9a544b4fd9 ANDROID: sdcardfs: correct order of descriptors
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 35331000
Change-Id: Ia6d16b19c8c911f41231d2a12be0740057edfacf
2017-09-22 19:12:23 +03:00
Daniel Rosenberg
cc90c372fa ANDROID: sdcardfs: Fix gid issue
We were already calculating most of these values,
and erroring out because the check was confused by this.
Instead of recalculating, adjust it as needed.

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 36160015
Change-Id: I9caf3e2fd32ca2e37ff8ed71b1d392f1761bc9a9
2017-09-22 19:12:22 +03:00
Daniel Rosenberg
183d00676a ANDROID: sdcardfs: Use tabs instead of spaces in multiuser.h
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 35331000
Change-Id: Ic7801914a7dd377e270647f81070020e1f0bab9b
2017-09-22 19:12:22 +03:00
Daniel Rosenberg
a79d11e62a ANDROID: sdcardfs: Remove uninformative prints
At best these prints do not provide useful information, and
at worst, some allow userspace to abuse the kernel log.

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 36138424
Change-Id: I812c57cc6a22b37262935ab77f48f3af4c36827e
2017-09-22 19:12:21 +03:00
Daniel Rosenberg
527954f2c9 ANDROID: sdcardfs: move path_put outside of spinlock
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 35643557
Change-Id: Ib279ebd7dd4e5884d184d67696a93e34993bc1ef
2017-09-22 19:12:21 +03:00
Daniel Rosenberg
9fc6dbe472 ANDROID: sdcardfs: Use case insensitive hash function
Case insensitive comparisons don't help us much if
we hash to different buckets...

Signed-off-by: Daniel Rosenberg <drosen@google.com>
bug: 36004503
Change-Id: I91e00dbcd860a709cbd4f7fd7fc6d855779f3285
2017-09-22 19:12:21 +03:00
Daniel Rosenberg
63cf0c300a ANDROID: sdcardfs: declare MODULE_ALIAS_FS
From commit ee616b78aa87 ("Wrapfs: declare MODULE_ALIAS_FS")

Signed-off-by: Daniel Rosenberg <drosen@google.com>
bug: 35766959
Change-Id: Ia4728ab49d065b1d2eb27825046f14b97c328cba
2017-09-22 19:12:20 +03:00
Eric W. Biederman
5c1997410b fs: Limit sys_mount to only request filesystem modules.
Modify the request_module to prefix the file system type with "fs-"
and add aliases to all of the filesystems that can be built as modules
to match.

A common practice is to build all of the kernel code and leave code
that is not commonly needed as modules, with the result that many
users are exposed to any bug anywhere in the kernel.

Looking for filesystems with a fs- prefix limits the pool of possible
modules that can be loaded by mount to just filesystems trivially
making things safer with no real cost.

Using aliases means user space can control the policy of which
filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
with blacklist and alias directives.  Allowing simple, safe,
well understood work-arounds to known problematic software.

This also addresses a rare but unfortunate problem where the filesystem
name is not the same as it's module name and module auto-loading
would not work.  While writing this patch I saw a handful of such
cases.  The most significant being autofs that lives in the module
autofs4.

This is relevant to user namespaces because we can reach the request
module in get_fs_type() without having any special permissions, and
people get uncomfortable when a user specified string (in this case
the filesystem type) goes all of the way to request_module.

After having looked at this issue I don't think there is any
particular reason to perform any filtering or permission checks beyond
making it clear in the module request that we want a filesystem
module.  The common pattern in the kernel is to call request_module()
without regards to the users permissions.  In general all a filesystem
module does once loaded is call register_filesystem() and go to sleep.
Which means there is not much attack surface exposed by loading a
filesytem module unless the filesystem is mounted.  In a user
namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
which most filesystems do not set today.

Change-Id: I623b13dbdb44bb9ba7481f29575e1ca4ad8102f4
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Kees Cook <keescook@chromium.org>
Reported-by: Kees Cook <keescook@google.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Kevin F. Haggerty <haggertk@lineageos.org>
2017-09-22 19:12:20 +03:00
Daniel Rosenberg
628b9661d7 ANDROID: sdcardfs: Get the blocksize from the lower fs
This changes sdcardfs to be more in line with the
getattr in wrapfs, which calls the lower fs's getattr
to get the block size

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 34723223
Change-Id: I1c9e16604ba580a8cdefa17f02dcc489d7351aed
2017-09-22 19:12:19 +03:00
Daniel Rosenberg
30cc539e4c ANDROID: sdcardfs: Use d_invalidate instead of drop_recurisve
drop_recursive did not properly remove stale dentries.
Instead, we use the vfs's d_invalidate, which does the proper cleanup.

Additionally, remove the no longer used drop_recursive, and
fixup_top_recursive that that are no longer used.

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Change-Id: Ibff61b0c34b725b024a050169047a415bc90f0d8
2017-09-22 19:12:19 +03:00
Daniel Rosenberg
6b5418751d ANDROID: sdcardfs: Switch to internal case insensitive compare
There were still a few places where we called into a case
insensitive lookup that was not defined by sdcardfs.
Moving them all to the same place will allow us to switch
the implementation in the future.

Additionally, the check in fixup_perms_recursive did not
take into account the length of both strings, causing
extraneous matches when the name we were looking for was
a prefix of the child name.

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Change-Id: I45ce768cd782cb4ea1ae183772781387c590ecc2
2017-09-22 19:12:18 +03:00
Daniel Rosenberg
280a7d21b7 ANDROID: sdcardfs: Use spin_lock_nested
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 36007653
Change-Id: I805d5afec797669679853fb2bb993ee38e6276e4
2017-09-22 19:12:18 +03:00
Daniel Rosenberg
9917de6f14 ANDROID: sdcardfs: Replace get/put with d_lock
dput cannot be called with a spin_lock. Instead,
we protect our accesses by holding the d_lock.

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 35643557
Change-Id: I22cf30856d75b5616cbb0c223724f5ab866b5114
2017-09-22 19:12:17 +03:00
Daniel Rosenberg
c3be309216 ANDROID: sdcardfs: rate limit warning print
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 35848445
Change-Id: Ida72ea0ece191b2ae4a8babae096b2451eb563f6
2017-09-22 19:12:17 +03:00
Daniel Rosenberg
74b45b5a54 ANDROID: sdcardfs: support direct-IO (DIO) operations
This comes from the wrapfs patch
2e346c83b26e Wrapfs: support direct-IO (DIO) operations

Signed-off-by: Li Mengyang <li.mengyang@stonybrook.edu>
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 34133558
Change-Id: I3fd779c510ab70d56b1d918f99c20421b524cdc4
2017-09-22 19:12:17 +03:00
Daniel Rosenberg
7bc8a0524c ANDROID: sdcardfs: implement vm_ops->page_mkwrite
This comes from the wrapfs patch
3dfec0ffe5e2 Wrapfs: implement vm_ops->page_mkwrite

Some file systems (e.g., ext4) require it.  Reported by Ted Ts'o.

Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 34133558
Change-Id: I1a389b2422c654a6d3046bb8ec3e20511aebfa8e
2017-09-22 19:12:16 +03:00
Daniel Rosenberg
d782165c3b ANDROID: sdcardfs: Don't bother deleting freelist
There is no point deleting entries from dlist, as
that is a temporary list on the stack from which
contains only entries that are being deleted.

Not all code paths set up dlist, so those that
don't were performing invalid accesses in
hash_del_rcu. As an additional means to prevent
any other issue, we null out the list entries when
we allocate from the cache.

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 35666680
Change-Id: Ibb1e28c08c3a600c29418d39ba1c0f3db3bf31e5
2017-09-22 19:12:16 +03:00
Daniel Rosenberg
4a7fc6483f ANDROID: sdcardfs: Add missing path_put
"ANDROID: sdcardfs: Add GID Derivation to sdcardfs" introduced
an unbalanced pat_get, leading to storage space not being freed
after deleting a file until rebooting. This adds the missing path_put.

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 34691169
Change-Id: Ia7ef97ec2eca2c555cc06b235715635afc87940e
2017-09-22 19:12:15 +03:00
Daniel Rosenberg
2f8e9489d3 ANDROID: sdcardfs: Fix incorrect hash
This adds back the hash calculation removed as part of
the previous patch, as it is in fact necessary.

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 35307857
Change-Id: Ie607332bcf2c5d2efdf924e4060ef3f576bf25dc
2017-09-22 19:12:15 +03:00
Daniel Rosenberg
f9c56b73bd ANDROID: sdcardfs: Switch strcasecmp for internal call
This moves our uses of strcasecmp over to an internal call so we can
easily change implementations later if we so desire. Additionally,
we leverage qstr's where appropriate to save time on comparisons.

Change-Id: I32fdc4fd0cd3b7b735dcfd82f60a2516fd8272a5
Signed-off-by: Daniel Rosenberg <drosen@google.com>
2017-09-22 19:12:15 +03:00
Al Viro
acf4f74449 constify d_lookup() arguments
Change-Id: I48ac8c9d7a63530b753b9d7b316e9222edeb5876
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-09-22 19:12:14 +03:00
Al Viro
44fc4d2f9a constify __d_lookup() arguments
Change-Id: I74c489fee16eb019d9d32572d867d6b54bf6cc91
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-09-22 19:12:14 +03:00
Daniel Rosenberg
2659b94b05 ANDROID: sdcardfs: switch to full_name_hash and qstr
Use the kernel's string hash function instead of rolling
our own. Additionally, save a bit of calculation by using
the qstr struct in place of strings.

Change-Id: I0bbeb5ec2a9233f40135ad632e6f22c30ffa95c1
Signed-off-by: Daniel Rosenberg <drosen@google.com>
2017-09-22 19:12:13 +03:00
Daniel Rosenberg
b134627d5d ANDROID: sdcardfs: Add GID Derivation to sdcardfs
This changes sdcardfs to modify the user and group in the
underlying filesystem depending on its usage. Ownership is
set by Android user, and package, as well as if the file is
under obb or cache. Other files can be labeled by extension.
Those values are set via the configfs interace.

To add an entry,
mkdir -p [configfs root]/sdcardfs/extensions/[gid]/[ext]

Bug: 34262585
Change-Id: I4e030ce84f094a678376349b1a96923e5076a0f4
Signed-off-by: Daniel Rosenberg <drosen@google.com>
2017-09-22 19:12:13 +03:00
Daniel Rosenberg
ce26b95540 ANDROID: sdcardfs: Remove redundant operation
We call get_derived_permission_new unconditionally, so we don't need
to call update_derived_permission_lock, which does the same thing.

Change-Id: I0748100828c6af806da807241a33bf42be614935
Signed-off-by: Daniel Rosenberg <drosen@google.com>
2017-09-22 19:12:13 +03:00