We were already calculating most of these values,
and erroring out because the check was confused by this.
Instead of recalculating, adjust it as needed.
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 36160015
Change-Id: I9caf3e2fd32ca2e37ff8ed71b1d392f1761bc9a9
At best these prints do not provide useful information, and
at worst, some allow userspace to abuse the kernel log.
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 36138424
Change-Id: I812c57cc6a22b37262935ab77f48f3af4c36827e
Case insensitive comparisons don't help us much if
we hash to different buckets...
Signed-off-by: Daniel Rosenberg <drosen@google.com>
bug: 36004503
Change-Id: I91e00dbcd860a709cbd4f7fd7fc6d855779f3285
Modify the request_module to prefix the file system type with "fs-"
and add aliases to all of the filesystems that can be built as modules
to match.
A common practice is to build all of the kernel code and leave code
that is not commonly needed as modules, with the result that many
users are exposed to any bug anywhere in the kernel.
Looking for filesystems with a fs- prefix limits the pool of possible
modules that can be loaded by mount to just filesystems trivially
making things safer with no real cost.
Using aliases means user space can control the policy of which
filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
with blacklist and alias directives. Allowing simple, safe,
well understood work-arounds to known problematic software.
This also addresses a rare but unfortunate problem where the filesystem
name is not the same as it's module name and module auto-loading
would not work. While writing this patch I saw a handful of such
cases. The most significant being autofs that lives in the module
autofs4.
This is relevant to user namespaces because we can reach the request
module in get_fs_type() without having any special permissions, and
people get uncomfortable when a user specified string (in this case
the filesystem type) goes all of the way to request_module.
After having looked at this issue I don't think there is any
particular reason to perform any filtering or permission checks beyond
making it clear in the module request that we want a filesystem
module. The common pattern in the kernel is to call request_module()
without regards to the users permissions. In general all a filesystem
module does once loaded is call register_filesystem() and go to sleep.
Which means there is not much attack surface exposed by loading a
filesytem module unless the filesystem is mounted. In a user
namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
which most filesystems do not set today.
Change-Id: I623b13dbdb44bb9ba7481f29575e1ca4ad8102f4
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Kees Cook <keescook@chromium.org>
Reported-by: Kees Cook <keescook@google.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Kevin F. Haggerty <haggertk@lineageos.org>
This changes sdcardfs to be more in line with the
getattr in wrapfs, which calls the lower fs's getattr
to get the block size
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 34723223
Change-Id: I1c9e16604ba580a8cdefa17f02dcc489d7351aed
drop_recursive did not properly remove stale dentries.
Instead, we use the vfs's d_invalidate, which does the proper cleanup.
Additionally, remove the no longer used drop_recursive, and
fixup_top_recursive that that are no longer used.
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Change-Id: Ibff61b0c34b725b024a050169047a415bc90f0d8
There were still a few places where we called into a case
insensitive lookup that was not defined by sdcardfs.
Moving them all to the same place will allow us to switch
the implementation in the future.
Additionally, the check in fixup_perms_recursive did not
take into account the length of both strings, causing
extraneous matches when the name we were looking for was
a prefix of the child name.
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Change-Id: I45ce768cd782cb4ea1ae183772781387c590ecc2
dput cannot be called with a spin_lock. Instead,
we protect our accesses by holding the d_lock.
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 35643557
Change-Id: I22cf30856d75b5616cbb0c223724f5ab866b5114
This comes from the wrapfs patch
2e346c83b26e Wrapfs: support direct-IO (DIO) operations
Signed-off-by: Li Mengyang <li.mengyang@stonybrook.edu>
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 34133558
Change-Id: I3fd779c510ab70d56b1d918f99c20421b524cdc4
This comes from the wrapfs patch
3dfec0ffe5e2 Wrapfs: implement vm_ops->page_mkwrite
Some file systems (e.g., ext4) require it. Reported by Ted Ts'o.
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu>
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 34133558
Change-Id: I1a389b2422c654a6d3046bb8ec3e20511aebfa8e
There is no point deleting entries from dlist, as
that is a temporary list on the stack from which
contains only entries that are being deleted.
Not all code paths set up dlist, so those that
don't were performing invalid accesses in
hash_del_rcu. As an additional means to prevent
any other issue, we null out the list entries when
we allocate from the cache.
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 35666680
Change-Id: Ibb1e28c08c3a600c29418d39ba1c0f3db3bf31e5
"ANDROID: sdcardfs: Add GID Derivation to sdcardfs" introduced
an unbalanced pat_get, leading to storage space not being freed
after deleting a file until rebooting. This adds the missing path_put.
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 34691169
Change-Id: Ia7ef97ec2eca2c555cc06b235715635afc87940e
This adds back the hash calculation removed as part of
the previous patch, as it is in fact necessary.
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 35307857
Change-Id: Ie607332bcf2c5d2efdf924e4060ef3f576bf25dc
This moves our uses of strcasecmp over to an internal call so we can
easily change implementations later if we so desire. Additionally,
we leverage qstr's where appropriate to save time on comparisons.
Change-Id: I32fdc4fd0cd3b7b735dcfd82f60a2516fd8272a5
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Use the kernel's string hash function instead of rolling
our own. Additionally, save a bit of calculation by using
the qstr struct in place of strings.
Change-Id: I0bbeb5ec2a9233f40135ad632e6f22c30ffa95c1
Signed-off-by: Daniel Rosenberg <drosen@google.com>
This changes sdcardfs to modify the user and group in the
underlying filesystem depending on its usage. Ownership is
set by Android user, and package, as well as if the file is
under obb or cache. Other files can be labeled by extension.
Those values are set via the configfs interace.
To add an entry,
mkdir -p [configfs root]/sdcardfs/extensions/[gid]/[ext]
Bug: 34262585
Change-Id: I4e030ce84f094a678376349b1a96923e5076a0f4
Signed-off-by: Daniel Rosenberg <drosen@google.com>
We call get_derived_permission_new unconditionally, so we don't need
to call update_derived_permission_lock, which does the same thing.
Change-Id: I0748100828c6af806da807241a33bf42be614935
Signed-off-by: Daniel Rosenberg <drosen@google.com>
This allows you to hide the existence of a package from
a user by adding them to an exclude list. If a user
creates that package's folder and is on the exclude list,
they will not see that package's id.
Bug: 34542611
Change-Id: I9eb82e0bf2457d7eb81ee56153b9c7d2f6646323
Signed-off-by: Daniel Rosenberg <drosen@google.com>
This refactors the configfs code to be more easily extended.
It will allow additional files to be added easily.
Bug: 34542611
Bug: 34262585
Change-Id: I73c9b0ae5ca7eb27f4ebef3e6807f088b512d539
Signed-off-by: Daniel Rosenberg <drosen@google.com>
This modifies the permission checks in setattr to
allow for non-owners to modify the timestamp of
files to things other than the current time.
This still requires write access, as enforced by
the permission call, but relaxes the requirement
that the caller must be the owner, allowing those
with group permissions to change it as well.
Bug: 11118565
Change-Id: Ied31f0cce2797675c7ef179eeb4e088185adcbad
Signed-off-by: Daniel Rosenberg <drosen@google.com>
propagate_remount was not accounting for the slave mounts
of other slave mounts, leading to some namespaces not
recieving the remount information.
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 33731928
Change-Id: Idc9e8c2ed126a4143229fc23f10a959c2d0a3854
Don't use lookup_one_len so we can grab the spinlock that
protects d_subdirs.
Bug: 30954918
Change-Id: I0c6a393252db7beb467e0d563739a3a14e1b5115
Signed-off-by: Daniel Rosenberg <drosen@google.com>
The current mainline has copies propagated to *all* nodes, then
tears down the copies we made for nodes that do not contain
counterparts of the desired mountpoint. That sets the right
propagation graph for the copies (at teardown time we move
the slaves of removed node to a surviving peer or directly
to master), but we end up paying a fairly steep price in
useless allocations. It's fairly easy to create a situation
where N calls of mount(2) create exactly N bindings, with
O(N^2) vfsmounts allocated and freed in process.
Fortunately, it is possible to avoid those allocations/freeings.
The trick is to create copies in the right order and find which
one would've eventually become a master with the current algorithm.
It turns out to be possible in O(nodes getting propagation) time
and with no extra allocations at all.
One part is that we need to make sure that eventual master will be
created before its slaves, so we need to walk the propagation
tree in a different order - by peer groups. And iterate through
the peers before dealing with the next group.
Another thing is finding the (earlier) copy that will be a master
of one we are about to create; to do that we are (temporary) marking
the masters of mountpoints we are attaching the copies to.
Either we are in a peer of the last mountpoint we'd dealt with,
or we have the following situation: we are attaching to mountpoint M,
the last copy S_0 had been attached to M_0 and there are sequences
S_0...S_n, M_0...M_n such that S_{i+1} is a master of S_{i},
S_{i} mounted on M{i} and we need to create a slave of the first S_{k}
such that M is getting propagation from M_{k}. It means that the master
of M_{k} will be among the sequence of masters of M. On the
other hand, the nearest marked node in that sequence will either
be the master of M_{k} or the master of M_{k-1} (the latter -
in the case if M_{k-1} is a slave of something M gets propagation
from, but in a wrong peer group).
So we go through the sequence of masters of M until we find
a marked one (P). Let N be the one before it. Then we go through
the sequence of masters of S_0 until we find one (say, S) mounted
on a node D that has P as master and check if D is a peer of N.
If it is, S will be the master of new copy, if not - the master of S
will be.
That's it for the hard part; the rest is fairly simple. Iterator
is in next_group(), handling of one prospective mountpoint is
propagate_one().
It seems to survive all tests and gives a noticably better performance
than the current mainline for setups that are seriously using shared
subtrees.
Change-Id: I45648e8a405544f768c5956711bdbdf509e2705a
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
If the dest_mnt is not shared, propagate_mnt() does nothing -
there's no mounts to propagate to and thus no copies to create.
Might as well don't bother calling it in that case.
Change-Id: Id94af8ad288bf9bfc6ffb5570562bbc2dc2e0d87
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Sdcardfs uses the same magic value as wrapfs.
This should not be the case. As it is entirely
in memory, the value can be changed without any
loss of compatibility.
Change-Id: I24200b805d5e6d32702638be99e47d50d7f2f746
Signed-off-by: Daniel Rosenberg <drosen@google.com>
This switches sdcardfs over to using permission2.
Instead of mounting several sdcardfs instances onto
the same underlaying directory, you bind mount a
single mount several times, and remount with the
options you want. These are stored in the private
mount data, allowing you to maintain the same tree,
but have different permissions for different mount
points.
Warning functions have been added for permission,
as it should never be called, and the correct
behavior is unclear.
Change-Id: I841b1d70ec60cf2b866fa48edeb74a0b0f8334f5
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Adds support for mount2, remount2, and the functions
to allocate/clone/copy the private data
The next patch will switch over to actually using it.
Change-Id: I8a43da26021d33401f655f0b2784ead161c575e3
Signed-off-by: Daniel Rosenberg <drosen@google.com>
This allows filesystems to use their mount private data to
influence the permssions they use in setattr2. It has
been separated into a new call to avoid disrupting current
setattr users.
Change-Id: I19959038309284448f1b7f232d579674ef546385
Signed-off-by: Daniel Rosenberg <drosen@google.com>
This allows filesystems to use their mount private data to
influence the permssions they return in permission2. It has
been separated into a new call to avoid disrupting current
permission users.
Change-Id: I9d416e3b8b6eca84ef3e336bd2af89ddd51df6ca
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Now we pass the vfsmount when mounting and remounting.
This allows the filesystem to actually set up the mount
specific data, although we can't quite do anything with
it yet. show_options is expanded to include data that
lives with the mount.
To avoid changing existing filesystems, these have
been added as new vfs functions.
Change-Id: If80670bfad9f287abb8ac22457e1b034c9697097
Signed-off-by: Daniel Rosenberg <drosen@google.com>
This starts to add private data associated directly
to mount points. The intent is to give filesystems
a sense of where they have come from, as a means of
letting a filesystem take different actions based on
this information.
Change-Id: Ie769d7b3bb2f5972afe05c1bf16cf88c91647ab2
Signed-off-by: Daniel Rosenberg <drosen@google.com>
This removes a deadlock under low memory conditions.
filp_open can call lookup_slow, which will attempt to
lock the parent.
Change-Id: I940643d0793f5051d1e79a56f4da2fa8ca3d8ff7
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Adding packages to the package list and moving files
takes a large amount of locks, and is currently a
heavy operation. This adds a 'top' field to the
inode_info, which points to the inode for the top
most directory whose owner you would like to match.
On permission checks and get_attr, we look up the
owner based on the information at top. When we change
a package mapping, we need only modify the information
in the corresponding top inode_info's. When renaming,
we must ensure top is set correctly in all children.
This happens when an app specific folder gets moved
outside of the folder for that app.
Change-Id: Ib749c60b568e9a45a46f8ceed985c1338246ec6c
Signed-off-by: Daniel Rosenberg <drosen@google.com>
This fixes a bug where the first lookup of a
file or folder created under a different view
would not be case insensitive. It will now
search through for a case insensitive match
if the initial lookup fails.
Bug:28024488
Change-Id: I4ff9ce297b9f2f9864b47540e740fd491c545229
Signed-off-by: Daniel Rosenberg <drosen@google.com>
The mode on files created on the lower fs should
not be affected by the umask of the calling
task's fs_struct. Instead, we create a copy
and modify it as needed. This also lets us avoid
the string shenanigans around .nomedia files.
Bug: 27992761
Change-Id: Ia3a6e56c24c6e19b3b01c1827e46403bb71c2f4c
Signed-off-by: Daniel Rosenberg <drosen@google.com>
List_for_each_entry has the property that the first argument is always
bound to a real list element, never NULL, so testing dentry is not needed.
Generated by: scripts/coccinelle/iterators/itnull.cocci
Change-Id: I51033a2649eb39451862b35b6358fe5cfe25c5f5
Cc: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Guenter Roeck <groeck@chromium.org>