android_kernel_google_msm

mirror of https://github.com/followmsi/android_kernel_google_msm.git synced 2024-11-06 23:17:41 +00:00

Author	SHA1	Message	Date
Daniel Rosenberg	8472923d21	ANDROID: sdcardfs: Move top to its own struct Move top, and the associated data, to its own struct. This way, we can properly track refcounts on top without interfering with the inode's accounting. Signed-off-by: Daniel Rosenberg <drosen@google.com> Bug: 38045152 Change-Id: I1968e480d966c3f234800b72e43670ca11e1d3fd	2017-09-22 19:12:34 +03:00
Gao Xiang	1f52ba5673	ANDROID: sdcardfs: fix sdcardfs_destroy_inode for the inode RCU approach According to the following commits, fs: icache RCU free inodes vfs: fix the stupidity with i_dentry in inode destructors sdcardfs_destroy_inode should be fixed for the fast path safety. Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Change-Id: I84f43c599209d23737c7e28b499dd121cb43636d	2017-09-22 19:12:33 +03:00
Daniel Roseberg	62b650471f	ANDROID: sdcardfs: Don't iput if we didn't igrab If we fail to get top, top is either NULL, or igrab found that we're in the process of freeing that inode, and did not grab it. Either way, we didn't grab it, and have no business putting it. Signed-off-by: Daniel Rosenberg <drosen@google.com> Bug: 38117720 Change-Id: Ie2f587483b9abb5144263156a443e89bc69b767b	2017-09-22 19:12:33 +03:00
Daniel Rosenberg	e43a6b01eb	ANDROID: sdcardfs: Call lower fs's revalidate We should be calling the lower filesystem's revalidate inside of sdcardfs's revalidate, as wrapfs does. Signed-off-by: Daniel Rosenberg <drosen@google.com> Bug: 35766959 Change-Id: I939d1c4192fafc1e21678aeab43fe3d588b8e2f4	2017-09-22 19:12:33 +03:00
Daniel Rosenberg	88598fa8ae	ANDROID: sdcardfs: Avoid setting GIDs outside of valid ranges When setting up the ownership of files on the lower filesystem, ensure that these values are in reasonable ranges for apps. If they aren't, default to AID_MEDIA_RW Signed-off-by: Daniel Rosenberg <drosen@google.com> Bug: 37516160 Change-Id: I0bec76a61ac72aff0b993ab1ad04be8382178a00	2017-09-22 19:12:32 +03:00
Daniel Rosenberg	e293feb047	Revert "Revert "Android: sdcardfs: Don't do d_add for lower fs"" This reverts commit ffa75fdb9c408f49b9622b6d55752ed99ff61488. Turns out we just needed the right hash. Signed-off-by: Daniel Rosenberg <drosen@google.com> Bug: 37231161 Change-Id: I6a6de7f7df99ad42b20fa062913b219f64020c31	2017-09-22 19:12:32 +03:00
Daniel Rosenberg	24cce439d5	ANDROID: sdcardfs: Use filesystem specific hash We weren't accounting for FS specific hash functions, causing us to miss negative dentries for any FS that had one. Similar to a patch from esdfs commit 75bd25a9476d ("esdfs: support lower's own hash") Signed-off-by: Daniel Rosenberg <drosen@google.com> Change-Id: I32d1ba304d728e0ca2648cacfb4c2e441ae63608	2017-09-22 19:12:32 +03:00
Daniel Rosenberg	b380b6ee1c	Revert "Android: sdcardfs: Don't do d_add for lower fs" This reverts commit 60df9f12992bc067216078ae756066c5d7c74d87. This change caused issues for sdcardfs on top of vfat Signed-off-by: Daniel Rosenberg <drosen@google.com> Change-Id: Ie56a91fda582af27921cc1a9de7ae19a9a988f2a	2017-09-22 19:12:31 +03:00
Daniel Rosenberg	d3c7b5afda	Android: sdcardfs: Don't complain in fixup_lower_ownership Not all filesystems support changing the owner of a file. We shouldn't complain if it doesn't happen. Signed-off-by: Daniel Rosenberg <drosen@google.com> Bug: 37488099 Change-Id: I403e44ab7230f176e6df82f6adb4e5c82ce57f33	2017-09-22 19:12:31 +03:00
Daniel Rosenberg	cbfba1be5a	Android: sdcardfs: Don't do d_add for lower fs For file based encryption, ext4 explicitly does not create negative dentries for encrypted files. If you force one over it, the decrypted file will be hidden until the cache is cleared. Instead, just fail out. Signed-off-by: Daniel Rosenberg <drosen@google.com> Bug: 37231161 Change-Id: Id2a9708dfa75e1c22f89915c529789caadd2ca4b	2017-09-22 19:12:30 +03:00
Daniel Rosenberg	7727a9e52e	ANDROID: sdcardfs: ->iget fixes Adapted from wrapfs commit 8c49eaa0sb9c ("Wrapfs: ->iget fixes") Change where we igrab/iput to ensure we always hold a valid lower_inode. Return ENOMEM (not EACCES) if iget5_locked returns NULL. Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu> Signed-off-by: Daniel Rosenberg <drosen@google.com> Bug: 35766959 Change-Id: Id8d4e0c0cbc685a0a77685ce73c923e9a3ddc094	2017-09-22 19:12:30 +03:00
Daniel Rosenberg	d2f7577f20	Android: sdcardfs: Change cache GID value Change-Id: Ieb955dd26493da26a458bc20fbbe75bca32b094f Signed-off-by: Daniel Rosenberg <drosen@google.com> Bug: 37193650	2017-09-22 19:12:29 +03:00
Daniel Rosenberg	c3da3c511e	ANDROID: sdcardfs: Directly pass lower file for mmap Instead of relying on a copy hack, pass the lower file as private data. This lets the kernel find the vma mapping for pages used by the file, allowing pages used by mapping to be reclaimed. This is adapted from following esdfs patches commit 0647e638d: ("esdfs: store lower file in vm_file for mmap") commit 064850866: ("esdfs: keep a counter for mmaped file") Change-Id: I75b74d1e5061db1b8c13be38d184e118c0851a1a Signed-off-by: Daniel Rosenberg <drosen@google.com>	2017-09-22 19:12:29 +03:00
Daniel Rosenberg	7fc65bd919	ANDROID: sdcardfs: update module info Signed-off-by: Daniel Rosenberg <drosen@google.com> Change-Id: I958c7c226d4e9265fea8996803e5b004fb33d8ad	2017-09-22 19:12:29 +03:00
Daniel Rosenberg	5dc3989ffd	ANDROID: sdcardfs: use d_splice_alias adapted from wrapfs commit 9671770ff8b9 ("Wrapfs: use d_splice_alias") Refactor interpose code to allow lookup to use d_splice_alias. Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu> Signed-off-by: Daniel Rosenberg <drosen@google.com> Bug: 35766959 Change-Id: Icf51db8658202c48456724275b03dc77f73f585b	2017-09-22 19:12:28 +03:00
Daniel Rosenberg	dde08eb9d7	ANDROID: sdcardfs: fix ->llseek to update upper and lower offset Adapted from wrapfs commit 1d1d23a47baa ("Wrapfs: fix ->llseek to update upper and lower offsets") Fixes bug: xfstests generic/257. f_pos consistently is required by and only by dir_ops->wrapfs_readdir, main_ops is not affected. Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu> Signed-off-by: Mengyang Li <li.mengyang@stonybrook.edu> Signed-off-by: Daniel Rosenberg <drosen@google.com> Bug: 35766959 Change-Id: I360a1368ac37ea8966910a58972b81504031d437	2017-09-22 19:12:28 +03:00
Daniel Rosenberg	4923777bb0	ANDROID: sdcardfs: copy lower inode attributes in ->ioctl Adapted from wrapfs commit fbc9c6f83ea6 ("Wrapfs: copy lower inode attributes in ->ioctl") commit e97d8e26cc9e ("Wrapfs: use file_inode helper") Some ioctls (e.g., EXT2_IOC_SETFLAGS) can change inode attributes, so copy them from lower inode. Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu> Signed-off-by: Daniel Rosenberg <drosen@google.com> Bug: 35766959 Change-Id: I0f12684b9dbd4088b4a622c7ea9c03087f40e572	2017-09-22 19:12:28 +03:00
Daniel Rosenberg	21af28ae07	ANDROID: sdcardfs: remove unnecessary call to do_munmap Adapted from wrapfs commit 5be6de9ecf02 ("Wrapfs: use vm_munmap in ->mmap") commit 2c9f6014a8bb ("Wrapfs: remove unnecessary call to vm_unmap in ->mmap") Code is unnecessary and causes deadlocks in newer kernels. Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu> Signed-off-by: Daniel Rosenberg <drosen@google.com> Bug: 35766959 Change-Id: Ia252d60c60799d7e28fc5f1f0f5b5ec2430a2379	2017-09-22 19:12:27 +03:00
Daniel Rosenberg	2761faa286	ANDROID: sdcardfs: Fix style issues in macros Signed-off-by: Daniel Rosenberg <drosen@google.com> Bug: 35331000 Change-Id: I89c4035029dc2236081a7685c55cac595d9e7ebf	2017-09-22 19:12:27 +03:00
Daniel Rosenberg	b3fd5e6086	ANDROID: sdcardfs: Use seq_puts over seq_printf Signed-off-by: Daniel Rosenberg <drosen@google.com> Bug: 35331000 Change-Id: I3795ec61ce61e324738815b1ce3b0e09b25d723f	2017-09-22 19:12:26 +03:00
Daniel Rosenberg	3a6d093272	ANDROID: sdcardfs: Use to kstrout Switch from deprecated simple_strtoul to kstrout Signed-off-by: Daniel Rosenberg <drosen@google.com> Bug: 35331000 Change-Id: If18bd133b4d2877f71e58b58fc31371ff6613ed5	2017-09-22 19:12:26 +03:00
Daniel Rosenberg	a1f2d9d927	ANDROID: sdcardfs: Use pr_[...] instead of printk Signed-off-by: Daniel Rosenberg <drosen@google.com> Bug: 35331000 Change-Id: Ibc635ec865750530d32b87067779f681fe58a003	2017-09-22 19:12:26 +03:00
Daniel Rosenberg	def7e34e8b	ANDROID: sdcardfs: remove unneeded null check As pointed out by checkpatch, these functions already handle null inputs, so the checks are not needed. Signed-off-by: Daniel Rosenberg <drosen@google.com> Bug: 35331000 Change-Id: I189342f032dfcefee36b27648bb512488ad61d20	2017-09-22 19:12:24 +03:00
Daniel Rosenberg	1ff2bf9007	ANDROID: sdcardfs: Fix style issues with comments Signed-off-by: Daniel Rosenberg <drosen@google.com> Bug: 35331000 Change-Id: I8791ef7eac527645ecb9407908e7e5ece35b8f80	2017-09-22 19:12:23 +03:00
Daniel Rosenberg	1fb0168abb	ANDROID: sdcardfs: Fix formatting This fixes various spacing and bracket related issues pointed out by checkpatch. Signed-off-by: Daniel Rosenberg <drosen@google.com> Bug: 35331000 Change-Id: I6e248833a7a04e3899f3ae9462d765cfcaa70c96	2017-09-22 19:12:23 +03:00
Daniel Rosenberg	9a544b4fd9	ANDROID: sdcardfs: correct order of descriptors Signed-off-by: Daniel Rosenberg <drosen@google.com> Bug: 35331000 Change-Id: Ia6d16b19c8c911f41231d2a12be0740057edfacf	2017-09-22 19:12:23 +03:00
Daniel Rosenberg	cc90c372fa	ANDROID: sdcardfs: Fix gid issue We were already calculating most of these values, and erroring out because the check was confused by this. Instead of recalculating, adjust it as needed. Signed-off-by: Daniel Rosenberg <drosen@google.com> Bug: 36160015 Change-Id: I9caf3e2fd32ca2e37ff8ed71b1d392f1761bc9a9	2017-09-22 19:12:22 +03:00
Daniel Rosenberg	183d00676a	ANDROID: sdcardfs: Use tabs instead of spaces in multiuser.h Signed-off-by: Daniel Rosenberg <drosen@google.com> Bug: 35331000 Change-Id: Ic7801914a7dd377e270647f81070020e1f0bab9b	2017-09-22 19:12:22 +03:00
Daniel Rosenberg	a79d11e62a	ANDROID: sdcardfs: Remove uninformative prints At best these prints do not provide useful information, and at worst, some allow userspace to abuse the kernel log. Signed-off-by: Daniel Rosenberg <drosen@google.com> Bug: 36138424 Change-Id: I812c57cc6a22b37262935ab77f48f3af4c36827e	2017-09-22 19:12:21 +03:00
Daniel Rosenberg	527954f2c9	ANDROID: sdcardfs: move path_put outside of spinlock Signed-off-by: Daniel Rosenberg <drosen@google.com> Bug: 35643557 Change-Id: Ib279ebd7dd4e5884d184d67696a93e34993bc1ef	2017-09-22 19:12:21 +03:00
Daniel Rosenberg	9fc6dbe472	ANDROID: sdcardfs: Use case insensitive hash function Case insensitive comparisons don't help us much if we hash to different buckets... Signed-off-by: Daniel Rosenberg <drosen@google.com> bug: 36004503 Change-Id: I91e00dbcd860a709cbd4f7fd7fc6d855779f3285	2017-09-22 19:12:21 +03:00
Daniel Rosenberg	63cf0c300a	ANDROID: sdcardfs: declare MODULE_ALIAS_FS From commit ee616b78aa87 ("Wrapfs: declare MODULE_ALIAS_FS") Signed-off-by: Daniel Rosenberg <drosen@google.com> bug: 35766959 Change-Id: Ia4728ab49d065b1d2eb27825046f14b97c328cba	2017-09-22 19:12:20 +03:00
Eric W. Biederman	5c1997410b	fs: Limit sys_mount to only request filesystem modules. Modify the request_module to prefix the file system type with "fs-" and add aliases to all of the filesystems that can be built as modules to match. A common practice is to build all of the kernel code and leave code that is not commonly needed as modules, with the result that many users are exposed to any bug anywhere in the kernel. Looking for filesystems with a fs- prefix limits the pool of possible modules that can be loaded by mount to just filesystems trivially making things safer with no real cost. Using aliases means user space can control the policy of which filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf with blacklist and alias directives. Allowing simple, safe, well understood work-arounds to known problematic software. This also addresses a rare but unfortunate problem where the filesystem name is not the same as it's module name and module auto-loading would not work. While writing this patch I saw a handful of such cases. The most significant being autofs that lives in the module autofs4. This is relevant to user namespaces because we can reach the request module in get_fs_type() without having any special permissions, and people get uncomfortable when a user specified string (in this case the filesystem type) goes all of the way to request_module. After having looked at this issue I don't think there is any particular reason to perform any filtering or permission checks beyond making it clear in the module request that we want a filesystem module. The common pattern in the kernel is to call request_module() without regards to the users permissions. In general all a filesystem module does once loaded is call register_filesystem() and go to sleep. Which means there is not much attack surface exposed by loading a filesytem module unless the filesystem is mounted. In a user namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT, which most filesystems do not set today. Change-Id: I623b13dbdb44bb9ba7481f29575e1ca4ad8102f4 Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Acked-by: Kees Cook <keescook@chromium.org> Reported-by: Kees Cook <keescook@google.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Kevin F. Haggerty <haggertk@lineageos.org>	2017-09-22 19:12:20 +03:00
Daniel Rosenberg	628b9661d7	ANDROID: sdcardfs: Get the blocksize from the lower fs This changes sdcardfs to be more in line with the getattr in wrapfs, which calls the lower fs's getattr to get the block size Signed-off-by: Daniel Rosenberg <drosen@google.com> Bug: 34723223 Change-Id: I1c9e16604ba580a8cdefa17f02dcc489d7351aed	2017-09-22 19:12:19 +03:00
Daniel Rosenberg	30cc539e4c	ANDROID: sdcardfs: Use d_invalidate instead of drop_recurisve drop_recursive did not properly remove stale dentries. Instead, we use the vfs's d_invalidate, which does the proper cleanup. Additionally, remove the no longer used drop_recursive, and fixup_top_recursive that that are no longer used. Signed-off-by: Daniel Rosenberg <drosen@google.com> Change-Id: Ibff61b0c34b725b024a050169047a415bc90f0d8	2017-09-22 19:12:19 +03:00
Daniel Rosenberg	6b5418751d	ANDROID: sdcardfs: Switch to internal case insensitive compare There were still a few places where we called into a case insensitive lookup that was not defined by sdcardfs. Moving them all to the same place will allow us to switch the implementation in the future. Additionally, the check in fixup_perms_recursive did not take into account the length of both strings, causing extraneous matches when the name we were looking for was a prefix of the child name. Signed-off-by: Daniel Rosenberg <drosen@google.com> Change-Id: I45ce768cd782cb4ea1ae183772781387c590ecc2	2017-09-22 19:12:18 +03:00
Daniel Rosenberg	280a7d21b7	ANDROID: sdcardfs: Use spin_lock_nested Signed-off-by: Daniel Rosenberg <drosen@google.com> Bug: 36007653 Change-Id: I805d5afec797669679853fb2bb993ee38e6276e4	2017-09-22 19:12:18 +03:00
Daniel Rosenberg	9917de6f14	ANDROID: sdcardfs: Replace get/put with d_lock dput cannot be called with a spin_lock. Instead, we protect our accesses by holding the d_lock. Signed-off-by: Daniel Rosenberg <drosen@google.com> Bug: 35643557 Change-Id: I22cf30856d75b5616cbb0c223724f5ab866b5114	2017-09-22 19:12:17 +03:00
Daniel Rosenberg	c3be309216	ANDROID: sdcardfs: rate limit warning print Signed-off-by: Daniel Rosenberg <drosen@google.com> Bug: 35848445 Change-Id: Ida72ea0ece191b2ae4a8babae096b2451eb563f6	2017-09-22 19:12:17 +03:00
Daniel Rosenberg	74b45b5a54	ANDROID: sdcardfs: support direct-IO (DIO) operations This comes from the wrapfs patch 2e346c83b26e Wrapfs: support direct-IO (DIO) operations Signed-off-by: Li Mengyang <li.mengyang@stonybrook.edu> Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu> Signed-off-by: Daniel Rosenberg <drosen@google.com> Bug: 34133558 Change-Id: I3fd779c510ab70d56b1d918f99c20421b524cdc4	2017-09-22 19:12:17 +03:00
Daniel Rosenberg	7bc8a0524c	ANDROID: sdcardfs: implement vm_ops->page_mkwrite This comes from the wrapfs patch 3dfec0ffe5e2 Wrapfs: implement vm_ops->page_mkwrite Some file systems (e.g., ext4) require it. Reported by Ted Ts'o. Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu> Signed-off-by: Daniel Rosenberg <drosen@google.com> Bug: 34133558 Change-Id: I1a389b2422c654a6d3046bb8ec3e20511aebfa8e	2017-09-22 19:12:16 +03:00
Daniel Rosenberg	d782165c3b	ANDROID: sdcardfs: Don't bother deleting freelist There is no point deleting entries from dlist, as that is a temporary list on the stack from which contains only entries that are being deleted. Not all code paths set up dlist, so those that don't were performing invalid accesses in hash_del_rcu. As an additional means to prevent any other issue, we null out the list entries when we allocate from the cache. Signed-off-by: Daniel Rosenberg <drosen@google.com> Bug: 35666680 Change-Id: Ibb1e28c08c3a600c29418d39ba1c0f3db3bf31e5	2017-09-22 19:12:16 +03:00
Daniel Rosenberg	4a7fc6483f	ANDROID: sdcardfs: Add missing path_put "ANDROID: sdcardfs: Add GID Derivation to sdcardfs" introduced an unbalanced pat_get, leading to storage space not being freed after deleting a file until rebooting. This adds the missing path_put. Signed-off-by: Daniel Rosenberg <drosen@google.com> Bug: 34691169 Change-Id: Ia7ef97ec2eca2c555cc06b235715635afc87940e	2017-09-22 19:12:15 +03:00
Daniel Rosenberg	2f8e9489d3	ANDROID: sdcardfs: Fix incorrect hash This adds back the hash calculation removed as part of the previous patch, as it is in fact necessary. Signed-off-by: Daniel Rosenberg <drosen@google.com> Bug: 35307857 Change-Id: Ie607332bcf2c5d2efdf924e4060ef3f576bf25dc	2017-09-22 19:12:15 +03:00
Daniel Rosenberg	f9c56b73bd	ANDROID: sdcardfs: Switch strcasecmp for internal call This moves our uses of strcasecmp over to an internal call so we can easily change implementations later if we so desire. Additionally, we leverage qstr's where appropriate to save time on comparisons. Change-Id: I32fdc4fd0cd3b7b735dcfd82f60a2516fd8272a5 Signed-off-by: Daniel Rosenberg <drosen@google.com>	2017-09-22 19:12:15 +03:00
Al Viro	acf4f74449	constify d_lookup() arguments Change-Id: I48ac8c9d7a63530b753b9d7b316e9222edeb5876 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2017-09-22 19:12:14 +03:00
Al Viro	44fc4d2f9a	constify __d_lookup() arguments Change-Id: I74c489fee16eb019d9d32572d867d6b54bf6cc91 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2017-09-22 19:12:14 +03:00
Daniel Rosenberg	2659b94b05	ANDROID: sdcardfs: switch to full_name_hash and qstr Use the kernel's string hash function instead of rolling our own. Additionally, save a bit of calculation by using the qstr struct in place of strings. Change-Id: I0bbeb5ec2a9233f40135ad632e6f22c30ffa95c1 Signed-off-by: Daniel Rosenberg <drosen@google.com>	2017-09-22 19:12:13 +03:00
Daniel Rosenberg	b134627d5d	ANDROID: sdcardfs: Add GID Derivation to sdcardfs This changes sdcardfs to modify the user and group in the underlying filesystem depending on its usage. Ownership is set by Android user, and package, as well as if the file is under obb or cache. Other files can be labeled by extension. Those values are set via the configfs interace. To add an entry, mkdir -p [configfs root]/sdcardfs/extensions/[gid]/[ext] Bug: 34262585 Change-Id: I4e030ce84f094a678376349b1a96923e5076a0f4 Signed-off-by: Daniel Rosenberg <drosen@google.com>	2017-09-22 19:12:13 +03:00
Daniel Rosenberg	ce26b95540	ANDROID: sdcardfs: Remove redundant operation We call get_derived_permission_new unconditionally, so we don't need to call update_derived_permission_lock, which does the same thing. Change-Id: I0748100828c6af806da807241a33bf42be614935 Signed-off-by: Daniel Rosenberg <drosen@google.com>	2017-09-22 19:12:13 +03:00
Daniel Rosenberg	230896c581	ANDROID: sdcardfs: add support for user permission isolation This allows you to hide the existence of a package from a user by adding them to an exclude list. If a user creates that package's folder and is on the exclude list, they will not see that package's id. Bug: 34542611 Change-Id: I9eb82e0bf2457d7eb81ee56153b9c7d2f6646323 Signed-off-by: Daniel Rosenberg <drosen@google.com>	2017-09-22 19:12:12 +03:00
Daniel Rosenberg	9e78e3b970	ANDROID: sdcardfs: Refactor configfs interface This refactors the configfs code to be more easily extended. It will allow additional files to be added easily. Bug: 34542611 Bug: 34262585 Change-Id: I73c9b0ae5ca7eb27f4ebef3e6807f088b512d539 Signed-off-by: Daniel Rosenberg <drosen@google.com>	2017-09-22 19:12:12 +03:00
Daniel Rosenberg	26117e4ce4	ANDROID: sdcardfs: Allow non-owners to touch This modifies the permission checks in setattr to allow for non-owners to modify the timestamp of files to things other than the current time. This still requires write access, as enforced by the permission call, but relaxes the requirement that the caller must be the owner, allowing those with group permissions to change it as well. Bug: 11118565 Change-Id: Ied31f0cce2797675c7ef179eeb4e088185adcbad Signed-off-by: Daniel Rosenberg <drosen@google.com>	2017-09-22 19:12:11 +03:00
Daniel Rosenberg	fdfefc2e98	ANDROID: mnt: remount should propagate to slaves of slaves propagate_remount was not accounting for the slave mounts of other slave mounts, leading to some namespaces not recieving the remount information. Signed-off-by: Daniel Rosenberg <drosen@google.com> Bug: 33731928 Change-Id: Idc9e8c2ed126a4143229fc23f10a959c2d0a3854	2017-09-22 19:12:11 +03:00
Daniel Rosenberg	cd769ec65b	ANDROID: sdcardfs: Fix locking issue with permision fix up Don't use lookup_one_len so we can grab the spinlock that protects d_subdirs. Bug: 30954918 Change-Id: I0c6a393252db7beb467e0d563739a3a14e1b5115 Signed-off-by: Daniel Rosenberg <drosen@google.com>	2017-09-22 19:12:11 +03:00
Daniel Rosenberg	ac0146e438	ANDROID: vfs: Missed updating truncate to truncate2 Bug: 30954918 Change-Id: I8163d3f86dd7aadb2ab3fc11816754f331986f05 Signed-off-by: Daniel Rosenberg <drosen@google.com>	2017-09-22 19:12:10 +03:00
Al Viro	c4d2a199dd	BACKPORT: smarter propagate_mnt() The current mainline has copies propagated to all nodes, then tears down the copies we made for nodes that do not contain counterparts of the desired mountpoint. That sets the right propagation graph for the copies (at teardown time we move the slaves of removed node to a surviving peer or directly to master), but we end up paying a fairly steep price in useless allocations. It's fairly easy to create a situation where N calls of mount(2) create exactly N bindings, with O(N^2) vfsmounts allocated and freed in process. Fortunately, it is possible to avoid those allocations/freeings. The trick is to create copies in the right order and find which one would've eventually become a master with the current algorithm. It turns out to be possible in O(nodes getting propagation) time and with no extra allocations at all. One part is that we need to make sure that eventual master will be created before its slaves, so we need to walk the propagation tree in a different order - by peer groups. And iterate through the peers before dealing with the next group. Another thing is finding the (earlier) copy that will be a master of one we are about to create; to do that we are (temporary) marking the masters of mountpoints we are attaching the copies to. Either we are in a peer of the last mountpoint we'd dealt with, or we have the following situation: we are attaching to mountpoint M, the last copy S_0 had been attached to M_0 and there are sequences S_0...S_n, M_0...M_n such that S_{i+1} is a master of S_{i}, S_{i} mounted on M{i} and we need to create a slave of the first S_{k} such that M is getting propagation from M_{k}. It means that the master of M_{k} will be among the sequence of masters of M. On the other hand, the nearest marked node in that sequence will either be the master of M_{k} or the master of M_{k-1} (the latter - in the case if M_{k-1} is a slave of something M gets propagation from, but in a wrong peer group). So we go through the sequence of masters of M until we find a marked one (P). Let N be the one before it. Then we go through the sequence of masters of S_0 until we find one (say, S) mounted on a node D that has P as master and check if D is a peer of N. If it is, S will be the master of new copy, if not - the master of S will be. That's it for the hard part; the rest is fairly simple. Iterator is in next_group(), handling of one prospective mountpoint is propagate_one(). It seems to survive all tests and gives a noticably better performance than the current mainline for setups that are seriously using shared subtrees. Change-Id: I45648e8a405544f768c5956711bdbdf509e2705a Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2017-09-22 19:12:10 +03:00
Al Viro	bd0f854d88	BACKPORT: don't bother with propagate_mnt() unless the target is shared If the dest_mnt is not shared, propagate_mnt() does nothing - there's no mounts to propagate to and thus no copies to create. Might as well don't bother calling it in that case. Change-Id: Id94af8ad288bf9bfc6ffb5570562bbc2dc2e0d87 Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2017-09-22 19:12:09 +03:00
Daniel Rosenberg	0ccedbc84f	sdcardfs: Use per mount permissions This switches sdcardfs over to using permission2. Instead of mounting several sdcardfs instances onto the same underlaying directory, you bind mount a single mount several times, and remount with the options you want. These are stored in the private mount data, allowing you to maintain the same tree, but have different permissions for different mount points. Warning functions have been added for permission, as it should never be called, and the correct behavior is unclear. Change-Id: I841b1d70ec60cf2b866fa48edeb74a0b0f8334f5 Signed-off-by: Daniel Rosenberg <drosen@google.com>	2017-09-22 19:12:08 +03:00
Daniel Rosenberg	3edd0f78ef	sdcardfs: Add gid and mask to private mount data Adds support for mount2, remount2, and the functions to allocate/clone/copy the private data The next patch will switch over to actually using it. Change-Id: I8a43da26021d33401f655f0b2784ead161c575e3 Signed-off-by: Daniel Rosenberg <drosen@google.com>	2017-09-22 19:12:08 +03:00
Daniel Rosenberg	8967b8cde8	sdcardfs: User new permission2 functions Change-Id: Ic7e0fb8fdcebb31e657b079fe02ac834c4a50db9 Signed-off-by: Daniel Rosenberg <drosen@google.com>	2017-09-22 19:12:08 +03:00
Daniel Rosenberg	1620d1d7d4	vfs: Add setattr2 for filesystems with per mount permissions This allows filesystems to use their mount private data to influence the permssions they use in setattr2. It has been separated into a new call to avoid disrupting current setattr users. Change-Id: I19959038309284448f1b7f232d579674ef546385 Signed-off-by: Daniel Rosenberg <drosen@google.com>	2017-09-22 19:12:07 +03:00
Daniel Rosenberg	19a3f7c232	vfs: Add permission2 for filesystems with per mount permissions This allows filesystems to use their mount private data to influence the permssions they return in permission2. It has been separated into a new call to avoid disrupting current permission users. Change-Id: I9d416e3b8b6eca84ef3e336bd2af89ddd51df6ca Signed-off-by: Daniel Rosenberg <drosen@google.com>	2017-09-22 19:12:07 +03:00
Daniel Rosenberg	2686aceaa7	vfs: Allow filesystems to access their private mount data Now we pass the vfsmount when mounting and remounting. This allows the filesystem to actually set up the mount specific data, although we can't quite do anything with it yet. show_options is expanded to include data that lives with the mount. To avoid changing existing filesystems, these have been added as new vfs functions. Change-Id: If80670bfad9f287abb8ac22457e1b034c9697097 Signed-off-by: Daniel Rosenberg <drosen@google.com>	2017-09-22 19:12:06 +03:00
Daniel Rosenberg	ccbd24c7a0	mnt: Add filesystem private data to mount points This starts to add private data associated directly to mount points. The intent is to give filesystems a sense of where they have come from, as a means of letting a filesystem take different actions based on this information. Change-Id: Ie769d7b3bb2f5972afe05c1bf16cf88c91647ab2 Signed-off-by: Daniel Rosenberg <drosen@google.com>	2017-09-22 19:12:06 +03:00
Daniel Rosenberg	313cdb2651	ANDROID: sdcardfs: Fix backport issue for 3.10 Don't use make_kuid/make_guid Bug: 30954918 Change-Id: I56de640771872aeeae5a69c42bf2ce8a5cfa413f Signed-off-by: Daniel Rosenberg <drosen@google.com>	2017-09-22 19:12:05 +03:00
Daniel Rosenberg	87b619e0cb	sdcardfs: Move directory unlock before touch This removes a deadlock under low memory conditions. filp_open can call lookup_slow, which will attempt to lock the parent. Change-Id: I940643d0793f5051d1e79a56f4da2fa8ca3d8ff7 Signed-off-by: Daniel Rosenberg <drosen@google.com>	2017-09-22 19:12:05 +03:00
alvin_liang	b94d192e3e	sdcardfs: fix external storage exporting incorrect uid Symptom: App cannot write into per-app folder Root Cause: sdcardfs exports incorrect uid Solution: fix uid Project: All Note: Test done by RD: passed Change-Id: Iff64f6f40ba4c679f07f4426d3db6e6d0db7e3ca	2017-09-22 19:12:04 +03:00
Daniel Rosenberg	00fb55c68d	sdcardfs: Added top to sdcardfs_inode_info Adding packages to the package list and moving files takes a large amount of locks, and is currently a heavy operation. This adds a 'top' field to the inode_info, which points to the inode for the top most directory whose owner you would like to match. On permission checks and get_attr, we look up the owner based on the information at top. When we change a package mapping, we need only modify the information in the corresponding top inode_info's. When renaming, we must ensure top is set correctly in all children. This happens when an app specific folder gets moved outside of the folder for that app. Change-Id: Ib749c60b568e9a45a46f8ceed985c1338246ec6c Signed-off-by: Daniel Rosenberg <drosen@google.com>	2017-09-22 19:12:04 +03:00
Daniel Rosenberg	853f8e0523	sdcardfs: Switch package list to RCU Switched the package id hashmap to use RCU. Change-Id: I9fdcab279009005bf28536247d11e13babab0b93 Signed-off-by: Daniel Rosenberg <drosen@google.com>	2017-09-22 19:12:03 +03:00
Daniel Rosenberg	b7ae873970	sdcardfs: Fix locking for permission fix up Iterating over d_subdirs requires taking d_lock. Removed several unneeded locks. Change-Id: I5b1588e54c7e6ee19b756d6705171c7f829e2650 Signed-off-by: Daniel Rosenberg <drosen@google.com>	2017-09-22 19:12:03 +03:00
Daniel Rosenberg	82f20f9741	sdcardfs: Check for other cases on path lookup This fixes a bug where the first lookup of a file or folder created under a different view would not be case insensitive. It will now search through for a case insensitive match if the initial lookup fails. Bug:28024488 Change-Id: I4ff9ce297b9f2f9864b47540e740fd491c545229 Signed-off-by: Daniel Rosenberg <drosen@google.com>	2017-09-22 19:12:03 +03:00
Daniel Rosenberg	feccf66542	sdcardfs: override umask on mkdir and create The mode on files created on the lower fs should not be affected by the umask of the calling task's fs_struct. Instead, we create a copy and modify it as needed. This also lets us avoid the string shenanigans around .nomedia files. Bug: 27992761 Change-Id: Ia3a6e56c24c6e19b3b01c1827e46403bb71c2f4c Signed-off-by: Daniel Rosenberg <drosen@google.com>	2017-09-22 19:12:02 +03:00
Julia Lawall	b46375c331	ANDROID: sdcardfs: fix itnull.cocci warnings List_for_each_entry has the property that the first argument is always bound to a real list element, never NULL, so testing dentry is not needed. Generated by: scripts/coccinelle/iterators/itnull.cocci Change-Id: I51033a2649eb39451862b35b6358fe5cfe25c5f5 Cc: Daniel Rosenberg <drosen@google.com> Signed-off-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Guenter Roeck <groeck@chromium.org>	2017-09-22 19:12:02 +03:00
Daniel Rosenberg	ccf7c04945	sdcardfs: Truncate packages_gid.list on overflow packages_gid.list was improperly returning the wrong count. Use scnprintf instead, and inform the user that the list was truncated if it is. Bug: 30013843 Change-Id: Ida2b2ef7cd86dd87300bfb4c2cdb6bfe2ee1650d Signed-off-by: Daniel Rosenberg <drosen@google.com>	2017-09-22 19:12:01 +03:00
Daniel Rosenberg	611237fca8	vfs: change d_canonical_path to take two paths bug: 23904372 Change-Id: I4a686d64b6de37decf60019be1718e1d820193e6 Signed-off-by: Daniel Rosenberg <drosen@google.com>	2017-09-22 19:12:01 +03:00
Daniel Rosenberg	f5fea6a938	fuse: Add support for d_canonical_path Allows FUSE to report to inotify that it is acting as a layered filesystem. The userspace component returns a string representing the location of the underlying file. If the string cannot be resolved into a path, the top level path is returned instead. bug: 23904372 Change-Id: Iabdca0bbedfbff59e9c820c58636a68ef9683d9f Signed-off-by: Daniel Rosenberg <drosen@google.com>	2017-09-22 19:12:00 +03:00
Daniel Rosenberg	10db9d59bf	sdcardfs: remove unneeded __init and __exit Change-Id: I2a2d45d52f891332174c3000e8681c5167c1564f	2017-09-22 19:12:00 +03:00
Daniel Rosenberg	679046d6e7	sdcardfs: Remove unused code Change-Id: Ie97cba27ce44818ac56cfe40954f164ad44eccf6	2017-09-22 19:12:00 +03:00
Daniel Rosenberg	158b6be502	sdcardfs: remove effectless config option CONFIG_SDCARD_FS_CI_SEARCH only guards a define for LOOKUP_CASE_INSENSITIVE, which is never used in the kernel. Remove both, along with the option matching that supports it. Change-Id: I363a8f31de8ee7a7a934d75300cc9ba8176e2edf Signed-off-by: Daniel Rosenberg <drosen@google.com>	2017-09-22 19:11:59 +03:00
Daniel Rosenberg	49e4c3ad85	inotify: Fix erroneous update of bit count Patch "vfs: add d_canonical_path for stacked filesystem support" erroneously updated the ALL_INOTIFY_BITS count. This changes it back Change-Id: Idb04edc736da276159d30f04c40cff9d6b1e070f	2017-09-22 19:11:59 +03:00
Daniel Rosenberg	3bf0ab5c8d	sdcardfs: Add support for d_canonicalize Change-Id: I5d6f0e71b8ca99aec4b0894412f1dfd1cfe12add Signed-off-by: Daniel Rosenberg <drosen@google.com>	2017-09-22 19:11:58 +03:00
Daniel Rosenberg	9f8d208e0e	vfs: add d_canonical_path for stacked filesystem support Inotify does not currently know when a filesystem is acting as a wrapper around another fs. This means that inotify watchers will miss any modifications to the base file, as well as any made in a separate stacked fs that points to the same file. d_canonical_path solves this problem by allowing the fs to map a dentry to a path in the lower fs. Inotify can use it to find the appropriate place to watch to be informed of all changes to a file. Change-Id: I09563baffad1711a045e45c1bd0bd8713c2cc0b6 Signed-off-by: Daniel Rosenberg <drosen@google.com>	2017-09-22 19:11:58 +03:00
Daniel Rosenberg	8e6b0f6c8c	sdcardfs: Bring up to date with Android M permissions: In M, the workings of sdcardfs were changed significantly. This brings sdcardfs into line with the changes. Change-Id: I10e91a84a884c838feef7aa26c0a2b21f02e052e	2017-09-22 19:11:57 +03:00
Daniel Campello	1dbc72eb35	sdcardfs: Changed type-cast in packagelist management Change-Id: Ic8842de2d7274b7a5438938d2febf5d8da867148	2017-09-22 19:11:57 +03:00
fluxi	fd2464db10	sdcardfs: Port to 3.4 Analog port to 3.10 by Daniel Campello <campello@google.com>. Change-Id: I0b05890cdd4332c5cfc2ffdf66a3f3a7890cce35	2017-09-22 19:11:56 +03:00
Daniel Campello	6b980874d5	Included sdcardfs source code for kernel 3.0 Only included the source code as is for kernel 3.0. Following patches take care of porting this file system to version 3.10. Change-Id: I09e76db77cd98a059053ba5b6fd88572a4b75b5b Signed-off-by: Daniel Campello <campello@google.com>	2017-09-22 19:11:56 +03:00
Al Viro	be084ce9f1	move d_rcu from overlapping d_child to overlapping d_alias commit 946e51f2bf37f1656916eb75bd0742ba33983c28 upstream. Change-Id: I85366e6ce0423ec9620bcc9cd3e7695e81aa1171 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> [bwh: Backported to 3.2: - Apply name changes in all the different places we use d_alias and d_child - Move the WARN_ON() in __d_free() to d_free() as we don't have dentry_free()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> [lizf: Backported to 3.4: - adjust context - need one more name change in debugfs]	2017-09-22 19:11:55 +03:00
Al Viro	f787204b2f	get rid of kern_path_parent() all callers want the same thing, actually - a kinda-sorta analog of kern_path_create(). I.e. they want parent vfsmount/dentry (with ->i_mutex held, to make sure the child dentry is still their child) + the child dentry. Signed-off-by Al Viro <viro@zeniv.linux.org.uk> Change-Id: I58cc7b0a087646516db9af69962447d27fb3ee8b	2017-09-22 19:11:54 +03:00
Andy Lutomirski	b6dd4f6779	UPSTREAM: capabilities: ambient capabilities Credit where credit is due: this idea comes from Christoph Lameter with a lot of valuable input from Serge Hallyn. This patch is heavily based on Christoph's patch. ===== The status quo ===== On Linux, there are a number of capabilities defined by the kernel. To perform various privileged tasks, processes can wield capabilities that they hold. Each task has four capability masks: effective (pE), permitted (pP), inheritable (pI), and a bounding set (X). When the kernel checks for a capability, it checks pE. The other capability masks serve to modify what capabilities can be in pE. Any task can remove capabilities from pE, pP, or pI at any time. If a task has a capability in pP, it can add that capability to pE and/or pI. If a task has CAP_SETPCAP, then it can add any capability to pI, and it can remove capabilities from X. Tasks are not the only things that can have capabilities; files can also have capabilities. A file can have no capabilty information at all [1]. If a file has capability information, then it has a permitted mask (fP) and an inheritable mask (fI) as well as a single effective bit (fE) [2]. File capabilities modify the capabilities of tasks that execve(2) them. A task that successfully calls execve has its capabilities modified for the file ultimately being excecuted (i.e. the binary itself if that binary is ELF or for the interpreter if the binary is a script.) [3] In the capability evolution rules, for each mask Z, pZ represents the old value and pZ' represents the new value. The rules are: pP' = (X & fP) \| (pI & fI) pI' = pI pE' = (fE ? pP' : 0) X is unchanged For setuid binaries, fP, fI, and fE are modified by a moderately complicated set of rules that emulate POSIX behavior. Similarly, if euid == 0 or ruid == 0, then fP, fI, and fE are modified differently (primary, fP and fI usually end up being the full set). For nonroot users executing binaries with neither setuid nor file caps, fI and fP are empty and fE is false. As an extra complication, if you execute a process as nonroot and fE is set, then the "secure exec" rules are in effect: AT_SECURE gets set, LD_PRELOAD doesn't work, etc. This is rather messy. We've learned that making any changes is dangerous, though: if a new kernel version allows an unprivileged program to change its security state in a way that persists cross execution of a setuid program or a program with file caps, this persistent state is surprisingly likely to allow setuid or file-capped programs to be exploited for privilege escalation. ===== The problem ===== Capability inheritance is basically useless. If you aren't root and you execute an ordinary binary, fI is zero, so your capabilities have no effect whatsoever on pP'. This means that you can't usefully execute a helper process or a shell command with elevated capabilities if you aren't root. On current kernels, you can sort of work around this by setting fI to the full set for most or all non-setuid executable files. This causes pP' = pI for nonroot, and inheritance works. No one does this because it's a PITA and it isn't even supported on most filesystems. If you try this, you'll discover that every nonroot program ends up with secure exec rules, breaking many things. This is a problem that has bitten many people who have tried to use capabilities for anything useful. ===== The proposed change ===== This patch adds a fifth capability mask called the ambient mask (pA). pA does what most people expect pI to do. pA obeys the invariant that no bit can ever be set in pA if it is not set in both pP and pI. Dropping a bit from pP or pI drops that bit from pA. This ensures that existing programs that try to drop capabilities still do so, with a complication. Because capability inheritance is so broken, setting KEEPCAPS, using setresuid to switch to nonroot uids, and then calling execve effectively drops capabilities. Therefore, setresuid from root to nonroot conditionally clears pA unless SECBIT_NO_SETUID_FIXUP is set. Processes that don't like this can re-add bits to pA afterwards. The capability evolution rules are changed: pA' = (file caps or setuid or setgid ? 0 : pA) pP' = (X & fP) \| (pI & fI) \| pA' pI' = pI pE' = (fE ? pP' : pA') X is unchanged If you are nonroot but you have a capability, you can add it to pA. If you do so, your children get that capability in pA, pP, and pE. For example, you can set pA = CAP_NET_BIND_SERVICE, and your children can automatically bind low-numbered ports. Hallelujah! Unprivileged users can create user namespaces, map themselves to a nonzero uid, and create both privileged (relative to their namespace) and unprivileged process trees. This is currently more or less impossible. Hallelujah! You cannot use pA to try to subvert a setuid, setgid, or file-capped program: if you execute any such program, pA gets cleared and the resulting evolution rules are unchanged by this patch. Users with nonzero pA are unlikely to unintentionally leak that capability. If they run programs that try to drop privileges, dropping privileges will still work. It's worth noting that the degree of paranoia in this patch could possibly be reduced without causing serious problems. Specifically, if we allowed pA to persist across executing non-pA-aware setuid binaries and across setresuid, then, naively, the only capabilities that could leak as a result would be the capabilities in pA, and any attacker already has those capabilities. This would make me nervous, though -- setuid binaries that tried to privilege-separate might fail to do so, and putting CAP_DAC_READ_SEARCH or CAP_DAC_OVERRIDE into pA could have unexpected side effects. (Whether these unexpected side effects would be exploitable is an open question.) I've therefore taken the more paranoid route. We can revisit this later. An alternative would be to require PR_SET_NO_NEW_PRIVS before setting ambient capabilities. I think that this would be annoying and would make granting otherwise unprivileged users minor ambient capabilities (CAP_NET_BIND_SERVICE or CAP_NET_RAW for example) much less useful than it is with this patch. ===== Footnotes ===== [1] Files that are missing the "security.capability" xattr or that have unrecognized values for that xattr end up with has_cap set to false. The code that does that appears to be complicated for no good reason. [2] The libcap capability mask parsers and formatters are dangerously misleading and the documentation is flat-out wrong. fE is not a mask; it's a single bit. This has probably confused every single person who has tried to use file capabilities. [3] Linux very confusingly processes both the script and the interpreter if applicable, for reasons that elude me. The results from thinking about a script's file capabilities and/or setuid bits are mostly discarded. Preliminary userspace code is here, but it needs updating: https://git.kernel.org/cgit/linux/kernel/git/luto/util-linux-playground.git/commit/?h=cap_ambient&id=7f5afbd175d2 Here is a test program that can be used to verify the functionality (from Christoph): /* * Test program for the ambient capabilities. This program spawns a shell * that allows running processes with a defined set of capabilities. * * (C) 2015 Christoph Lameter <cl@linux.com> * Released under: GPL v3 or later. * * * Compile using: * * gcc -o ambient_test ambient_test.o -lcap-ng * * This program must have the following capabilities to run properly: * Permissions for CAP_NET_RAW, CAP_NET_ADMIN, CAP_SYS_NICE * * A command to equip the binary with the right caps is: * * setcap cap_net_raw,cap_net_admin,cap_sys_nice+p ambient_test * * * To get a shell with additional caps that can be inherited by other processes: * * ./ambient_test /bin/bash * * * Verifying that it works: * * From the bash spawed by ambient_test run * * cat /proc/$$/status * * and have a look at the capabilities. / / * Definitions from the kernel header files. These are going to be removed * when the /usr/include files have these defined. / static void set_ambient_cap(int cap) { int rc; capng_get_caps_process(); rc = capng_update(CAPNG_ADD, CAPNG_INHERITABLE, cap); if (rc) { printf("Cannot add inheritable cap\n"); exit(2); } capng_apply(CAPNG_SELECT_CAPS); / Note the two 0s at the end. Kernel checks for these / if (prctl(PR_CAP_AMBIENT, PR_CAP_AMBIENT_RAISE, cap, 0, 0)) { perror("Cannot set cap"); exit(1); } } int main(int argc, char *argv) { int rc; set_ambient_cap(CAP_NET_RAW); set_ambient_cap(CAP_NET_ADMIN); set_ambient_cap(CAP_SYS_NICE); printf("Ambient_test forking shell\n"); if (execv(argv[1], argv + 1)) perror("Cannot exec"); return 0; } Signed-off-by: Christoph Lameter <cl@linux.com> # Original author Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Aaron Jones <aaronmdjones@gmail.com> Cc: Ted Ts'o <tytso@mit.edu> Cc: Andrew G. Morgan <morgan@kernel.org> Cc: Mimi Zohar <zohar@linux.vnet.ibm.com> Cc: Austin S Hemmelgarn <ahferroin7@gmail.com> Cc: Markku Savela <msa@moth.iki.fi> Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: James Morris <james.l.morris@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 58319057b7847667f0c9585b9de0e8932b0fdb08) Bug: 31038224 Change-Id: I88bc5caa782dc6be23dc7e839ff8e11b9a903f8c Signed-off-by: Jorge Lucangeli Obes <jorgelo@google.com>	2017-09-01 13:38:08 +03:00
Greg Hackmann	84a377cc19	timerfd: support CLOCK_BOOTTIME clock Add CLOCK_BOOTTIME support to timerfd Change-Id: I14dee6d1104f15a05f463a632268ac4564753faf Signed-off-by: Greg Hackmann <ghackmann@google.com>	2017-08-27 19:07:23 +03:00
Todd Poynor	9ae9589197	timerfd: add alarm timers Add support for clocks CLOCK_REALTIME_ALARM and CLOCK_BOOTTIME_ALARM. Change-Id: Iafc8445d3d7ffb35110c860f1607bf03f1edb895 Signed-off-by: Todd Poynor <toddpoynor@google.com>	2017-08-27 19:07:22 +03:00
John Stultz	1bc537874e	alarmtimers: Squash upstream changes staging: android-alarm: Switch from wakelocks to wakeup sources In their current AOSP tree, the Android in-kernel wakelock infrastructure has been reimplemented in terms of wakeup sources: http://git.linaro.org/gitweb?p=people/jstultz/android.git;a=commitdiff;h=e9911f4efdc55af703b8b3bb8c839e6f5dd173bb The Android alarm driver currently has stubbed out calls to wakelock functionality. So this patch simply converts the stubbed out wakelock calls to wakeup source calls, and removes the empty wakelock macros Greg, would you mind queuing this in staging-next? CC: Colin Cross <ccross@android.com> CC: Arve Hjønnevåg <arve@android.com> CC: Greg KH <gregkh@linuxfoundation.org> CC: Android Kernel Team <kernel-team@android.com> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Staging: android: alarm: Rename pr_alarm to alarm_dbg Rename a macro to make it explicit it's for debugging. Use %s: __func__ instead of embedding function names. Coalesce formats, align arguments. Signed-off-by: Joe Perches <joe@perches.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> staging: Android: Fix NULL pointer related warning in alarm-dev.c file Fixes the following sparse warning: drivers/staging/android/alarm-dev.c:259:35: warning: Using plain integer as NULL pointer Cc: Brian Swetland <swetland@google.com> Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> staging: android: alarm: remove unnecessary goto statement Signed-off-by: Devendra Naga <devendra.aaru@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Staging: android: Alarm driver cleanups Little cleanups. Enum value ANDROID_ALARM_TYPE_COUNT was treated as an alarm type within a switch statement. That condition was unreachable though. Signed-off-by: Dae S. Kim <dae@velatum.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> staging: alarm-dev: Drop pre Android 1.0 _OLD ioctls Per Colin's comment: "The "support old userspace code" comment for those two ioctls has been there since pre-Android 1.0. Those apis are not exposed to Android apps, I don't see any problem deleting them." Thus this patch removes the ANDROID_ALARM_SET_OLD and ANDROID_ALARM_SET_AND_WAIT_OLD ioctl compatability logic. Change-Id: I5138aaa3cdbfb758aaef2cc7591cb5340b3640a0 Cc: Serban Constantinescu <serban.constantinescu@arm.com> Cc: Arve Hjønnevåg <arve@android.com> Cc: Colin Cross <ccross@google.com> Cc: Android Kernel Team <kernel-team@android.com> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> staging: alarm-dev: Refactor alarm-dev ioctl code in prep for compat_ioctl Cleanup the Android alarm-dev driver's ioctl code to refactor it in preparation for compat_ioctl support. Cc: Serban Constantinescu <serban.constantinescu@arm.com> Cc: Arve Hjønnevåg <arve@android.com> Cc: Colin Cross <ccross@google.com> Cc: Android Kernel Team <kernel-team@android.com> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> staging: alarm-dev: Implement compat_ioctl support Implement compat_ioctl support for the alarm-dev ioctl. Cc: Serban Constantinescu <serban.constantinescu@arm.com> Cc: Arve Hjønnevåg <arve@android.com> Cc: Colin Cross <ccross@google.com> Cc: Android Kernel Team <kernel-team@android.com> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> staging: alarm-dev: information leak in alarm_ioctl() Smatch complains that if we pass an invalid clock type then "ts" is never set. We need to check for errors earlier, otherwise we end up passing uninitialized stack data to userspace. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> staging: alarm-dev: information leak in alarm_compat_ioctl() If we pass an invalid clock type then "ts" is never set. We need to check for errors earlier, otherwise we end up passing uninitialized stack data to userspace. Reported-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> rtc: alarm: Add power-on alarm feature Android does not support powering up of phone through alarm. Adding shutdown hook in alarm driver which will set alarm while phone is going down so as to power-up the phone after alarm expiration. Change-Id: Ic2611e33ae9c1f8e83f21efdb93e26ac9f9499de Signed-off-by: Matthew Qin <yqin@codeaurora.org> qpnp-rtc: clear alarm register when rtc irq is disabled The rtc alarm register should be cleared when the rtc irq is disabled Change-Id: I97a8bf989ff610093240a6b308a297702da6cb89 Signed-off-by: Xiaocheng Li <lix@codeaurora.org> Signed-off-by: Matthew Qin <yqin@codeaurora.org> alarm : Fix the race conditions in alarm-dev.c There will be race conditions between alarm set and alarm clear if set_power_on_alarm is out of the lock.But set_power_on_alarm can't be put into spin_lock. So add it into mutex_lock. CRs-Fixed: 639115 Change-Id: I4226a95e499211c0d50ff7ce269467a57a410dc7 Signed-off-by: Mao Jinlong <c_jmao@codeaurora.org> switch timerfd_[sg]ettime(2) to fget_light() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> time: Enable alarmtimers Change-Id: I4d1ea553aa707ab0467e00dea86cedc8f6797b78	2017-08-25 20:00:02 +03:00
Jin Qian	9e20025f8b	f2fs: sanity check checkpoint segno and blkoff Make sure segno and blkoff read from raw image are valid. Cc: stable@vger.kernel.org Signed-off-by: Jin Qian <jinqian@google.com> [Jaegeuk Kim: adjust minor coding style] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Change-Id: Ie2505c071233c1a9dec2729fe1ad467689a1b7a2 (cherry picked from commit 15d3042a937c13f5d9244241c7a9c8416ff6e82a)	2017-08-07 18:11:20 -06:00
Jin Qian	46e0dfc447	f2fs: sanity check segment count F2FS uses 4 bytes to represent block address. As a result, supported size of disk is 16 TB and it equals to 16 * 1024 * 1024 / 2 segments. Signed-off-by: Jin Qian <jinqian@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Change-Id: I16b3cd6279bff1a221781a80b9b34744c9e7098f (cherry picked from commit b9dd46188edc2f0d1f37328637860bb65a771124)	2017-08-07 18:11:13 -06:00
Thomas Gleixner	5a34ec804c	timerfd: Protect the might cancel mechanism proper The handling of the might_cancel queueing is not properly protected, so parallel operations on the file descriptor can race with each other and lead to list corruptions or use after free. Protect the context for these operations with a seperate lock. The wait queue lock cannot be reused for this because that would create a lock inversion scenario vs. the cancel lock. Replacing might_cancel with an atomic (atomic_t or atomic bit) does not help either because it still can race vs. the actual list operation. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "linux-fsdevel@vger.kernel.org" Cc: syzkaller <syzkaller@googlegroups.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1701311521430.3457@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Change-Id: I1f2d38a919ceb1ca1c7c9471dece0c1126383912 (cherry picked from commit 1e38da300e1e395a15048b0af1e5305bd91402f6)	2017-08-07 18:11:00 -06:00
Jan Kara	6f25be195e	udf: Check path length when reading symlink Symlink reading code does not check whether the resulting path fits into the page provided by the generic code. This isn't as easy as just checking the symlink size because of various encoding conversions we perform on path. So we have to check whether there is still enough space in the buffer on the fly. Change-Id: Id56d129029eaf2e651cf7236103fb73aa540ae1f CC: stable@vger.kernel.org Reported-by: Carl Henrik Lunde <chlunde@ping.uio.no> Signed-off-by: Jan Kara <jack@suse.cz>	2017-07-10 01:48:57 +03:00
Kees Cook	b48a26fd3d	fs/exec.c: account for argv/envp pointers commit 98da7d08850fb8bdeb395d6368ed15753304aa0c upstream. When limiting the argv/envp strings during exec to 1/4 of the stack limit, the storage of the pointers to the strings was not included. This means that an exec with huge numbers of tiny strings could eat 1/4 of the stack limit in strings and then additional space would be later used by the pointers to the strings. For example, on 32-bit with a 8MB stack rlimit, an exec with 1677721 single-byte strings would consume less than 2MB of stack, the max (8MB / 4) amount allowed, but the pointers to the strings would consume the remaining additional stack space (1677721 * 4 == 6710884). The result (1677721 + 6710884 == 8388605) would exhaust stack space entirely. Controlling this stack exhaustion could result in pathological behavior in setuid binaries (CVE-2017-1000365). [akpm@linux-foundation.org: additional commenting from Kees] Fixes: `b6a2fea393` ("mm: variable length argument support") Link: http://lkml.kernel.org/r/20170622001720.GA32173@beast Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Qualys Security Advisory <qsa@qualys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Change-Id: I2e01d7be2d52415264ff48c632bfe307008c4e03	2017-07-04 01:21:47 +03:00
Hugh Dickins	45c60e5957	mm: larger stack guard gap, between vmas commit 1be7107fbe18eed3e319a6c3e83c78254b693acb upstream. Stack guard page is a useful feature to reduce a risk of stack smashing into a different mapping. We have been using a single page gap which is sufficient to prevent having stack adjacent to a different mapping. But this seems to be insufficient in the light of the stack usage in userspace. E.g. glibc uses as large as 64kB alloca() in many commonly used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX] which is 256kB or stack strings with MAX_ARG_STRLEN. This will become especially dangerous for suid binaries and the default no limit for the stack size limit because those applications can be tricked to consume a large portion of the stack and a single glibc call could jump over the guard page. These attacks are not theoretical, unfortunatelly. Make those attacks less probable by increasing the stack guard gap to 1MB (on systems with 4k pages; but make it depend on the page size because systems with larger base pages might cap stack allocations in the PAGE_SIZE units) which should cover larger alloca() and VLA stack allocations. It is obviously not a full fix because the problem is somehow inherent, but it should reduce attack space a lot. One could argue that the gap size should be configurable from userspace, but that can be done later when somebody finds that the new 1MB is wrong for some special case applications. For now, add a kernel command line option (stack_guard_gap) to specify the stack gap size (in page units). Implementation wise, first delete all the old code for stack guard page: because although we could get away with accounting one extra page in a stack vma, accounting a larger gap can break userspace - case in point, a program run with "ulimit -S -v 20000" failed when the 1MB gap was counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK and strict non-overcommit mode. Instead of keeping gap inside the stack vma, maintain the stack guard gap as a gap between vmas: using vm_start_gap() in place of vm_start (or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few places which need to respect the gap - mainly arch_get_unmapped_area(), and and the vma tree's subtree_gap support for that. Change-Id: I611023b0bfe1cab7b3e5da13e331a7baaaaf6eb0 Original-patch-by: Oleg Nesterov <oleg@redhat.com> Original-patch-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Hugh Dickins <hughd@google.com> [wt: backport to 4.11: adjust context] [wt: backport to 4.9: adjust context ; kernel doc was not in admin-guide] [wt: backport to 4.4: adjust context ; drop ppc hugetlb_radix changes] [wt: backport to 3.18: adjust context ; no FOLL_POPULATE ; s390 uses generic arch_get_unmapped_area()] [wt: backport to 3.16: adjust context] [wt: backport to 3.10: adjust context ; code logic in PARISC's arch_get_unmapped_area() wasn't found ; code inserted into expand_upwards() and expand_downwards() runs under anon_vma lock; changes for gup.c:faultin_page go to memory.c:__get_user_pages(); included Hugh Dickins' fixes] Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Flex1911 <dedsa2002@gmail.com>	2017-07-02 13:03:27 +03:00
Linus Torvalds	939b2740ab	splice: introduce FMODE_SPLICE_READ and FMODE_SPLICE_WRITE Introduce FMODE_SPLICE_READ and FMODE_SPLICE_WRITE. These modes check whether it is legal to read or write a file using splice. Both get automatically set on regular files and are not checked when a 'struct fileoperations' includes the splice_{read,write} methods. Change-Id: Icb601a7db12d4e07a62d790edaa8a9a5aed3ba2a Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>	2017-06-26 21:30:22 +03:00
Laura Abbott	5d5a14df85	fs: fuse: Add replacment for CMA pages into the LRU cache CMA pages are currently replaced in the FUSE file system since FUSE may hold on to CMA pages for a long time, preventing migration. The replacement page is added to the file cache but not the LRU cache. This may prevent the page from being properly aged and dropped, creating poor performance under tight memory condition. Fix this by adding the new page to the LRU cache after creation. Change-Id: Ib349abf1024d48386b835335f3fbacae040b6241 CRs-Fixed: 586855 Signed-off-by: Laura Abbott <lauraa@codeaurora.org>	2017-06-26 21:01:44 +03:00
Jan Kara	b1a8c88774	BACKPORT: posix_acl: Clear SGID bit when setting file permissions [Partially applied during f2fs inclusion, changes now aligned to upstream] (cherry pick from commit 073931017b49d9458aa351605b43a7e34598caef) When file permissions are modified via chmod(2) and the user is not in the owning group or capable of CAP_FSETID, the setgid bit is cleared in inode_change_ok(). Setting a POSIX ACL via setxattr(2) sets the file permissions as well as the new ACL, but doesn't clear the setgid bit in a similar way; this allows to bypass the check in chmod(2). Fix that. NB: conflicts resolution included extending the change to all visible users of the near deprecated function posix_acl_equiv_mode replaced with posix_acl_update_mode. We did not resolve the ACL leak in this CL, require additional upstream fixes. References: CVE-2016-7097 Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Bug: 32458736 [haggertk]: Backport to 3.4/msm8974 * convert use of capable_wrt_inode_uidgid to capable Change-Id: I19591ad452cc825ac282b3cfd2daaa72aa9a1ac1	2017-06-26 20:26:17 +03:00
Adrian Salido	21e685af37	fs/proc/array.c: make safe access to group_leader As mentioned in commit `52ee2dfdd4` ("pids: refactor vnr/nr_ns helpers to make them safe"). _nr_ns helpers used to be buggy. The commit addresses most of the helpers but is missing task_tgid_xxx() Without this protection there is a possible use after free reported by kasan instrumented kernel: ================================================================== BUG: KASAN: use-after-free in task_tgid_nr_ns+0x2c/0x44 at addr Read of size 8 by task cat/2472 CPU: 1 PID: 2472 Comm: cat Tainted: ** Hardware name: Google Tegra210 Smaug Rev 1,3+ (DT) Call trace: [<ffffffc00020ad2c>] dump_backtrace+0x0/0x17c [<ffffffc00020aec0>] show_stack+0x18/0x24 [<ffffffc0011573d0>] dump_stack+0x94/0x100 [<ffffffc0003c7dc0>] kasan_report+0x308/0x554 [<ffffffc0003c7518>] __asan_load8+0x20/0x7c [<ffffffc00025a54c>] task_tgid_nr_ns+0x28/0x44 [<ffffffc00046951c>] proc_pid_status+0x444/0x1080 [<ffffffc000460f60>] proc_single_show+0x8c/0xdc [<ffffffc0004081b0>] seq_read+0x2e8/0x6f0 [<ffffffc0003d1420>] vfs_read+0xd8/0x1e0 [<ffffffc0003d1b98>] SyS_read+0x68/0xd4 Accessing group_leader while holding rcu_lock and using the now safe helpers introduced in the commit mentioned, this race condition is addressed. Signed-off-by: Adrian Salido <salidoa@google.com> Change-Id: I4315217922dda375a30a3581c0c1740dda7b531b Bug: 31495866	2017-06-07 13:26:13 -06:00
Tom Marshall	903457eb2c	kernel: Fix potential refcount leak in su check Change-Id: I3d241ae805ba708c18bccfd5e5d6cdcc8a5bc1c8	2017-05-19 18:41:42 -06:00
Tom Marshall	75ec7fa33f	kernel: Only expose su when daemon is running Note: this is for the 3.4 kernel It has been claimed that the PG implementation of 'su' has security vulnerabilities even when disabled. Unfortunately, the people that find these vulnerabilities often like to keep them private so they can profit from exploits while leaving users exposed to malicious hackers. In order to reduce the attack surface for vulnerabilites, it is therefore necessary to make 'su' completely inaccessible when it is not in use (except by the root and system users). Change-Id: Ia7d50ba46c3d932c2b0ca5fc8e9ec69ec9045f85	2017-05-19 18:41:25 -06:00
Ben Hutchings	ade551c944	splice: Apply generic position and size checks to each write We need to check the position and size of file writes against various limits, using generic_write_check(). This was not being done for the splice write path. It was fixed upstream by commit `8d0207652c` ("->splice_write() via ->write_iter()") but we can't apply that. CVE-2014-7822 Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Change-Id: Ic7e7bd78d4594c993c9684d32a0ddeaf70165bce (cherry picked from commit 894c6350eaad7e613ae267504014a456e00a3e2a)	2017-04-17 15:41:36 -06:00
Jeff Mahoney	dae4a6db45	ecryptfs: don't allow mmap when the lower fs doesn't support it There are legitimate reasons to disallow mmap on certain files, notably in sysfs or procfs. We shouldn't emulate mmap support on file systems that don't offer support natively. CVE-2016-1583 Signed-off-by: Jeff Mahoney <jeffm@suse.com> Cc: stable@vger.kernel.org [tyhicks: clean up f_op check by using ecryptfs_file_to_lower()] Signed-off-by: Tyler Hicks <tyhicks@canonical.com> (adapted from commit f0fe970df3838c202ef6c07a4c2b36838ef0a88b) Change-Id: I3eb979e9476847834eeea0ecbaf07a53329a7219	2017-03-03 16:49:07 -07:00
Eryu Guan	e0dd30eb33	ext4: validate s_first_meta_bg at mount time Ralf Spenneberg reported that he hit a kernel crash when mounting a modified ext4 image. And it turns out that kernel crashed when calculating fs overhead (ext4_calculate_overhead()), this is because the image has very large s_first_meta_bg (debug code shows it's 842150400), and ext4 overruns the memory in count_overhead() when setting bitmap buffer, which is PAGE_SIZE. ext4_calculate_overhead(): buf = get_zeroed_page(GFP_NOFS); <=== PAGE_SIZE buffer blks = count_overhead(sb, i, buf); count_overhead(): for (j = ext4_bg_num_gdb(sb, grp); j > 0; j--) { <=== j = 842150400 ext4_set_bit(EXT4_B2C(sbi, s++), buf); <=== buffer overrun count++; } This can be reproduced easily for me by this script: #!/bin/bash rm -f fs.img mkdir -p /mnt/ext4 fallocate -l 16M fs.img mke2fs -t ext4 -O bigalloc,meta_bg,^resize_inode -F fs.img debugfs -w -R "ssv first_meta_bg 842150400" fs.img mount -o loop fs.img /mnt/ext4 Fix it by validating s_first_meta_bg first at mount time, and refusing to mount if its value exceeds the largest possible meta_bg number. Reported-by: Ralf Spenneberg <ralf@os-t.de> Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca> (cherry picked from commit 3a4b77cd47bb837b8557595ec7425f281f2ca1fe) (minor backport adapted from cf851ad35fd1e9c7b8ed00741eca613bc1a9c8c8) Change-Id: If183ad4a873705c9a0312087577705298b3586fe	2017-03-03 13:40:24 -07:00
Nick Desaulniers	a8c9068848	BACKPORT: aio: mark AIO pseudo-fs noexec This ensures that do_mmap() won't implicitly make AIO memory mappings executable if the READ_IMPLIES_EXEC personality flag is set. Such behavior is problematic because the security_mmap_file LSM hook doesn't catch this case, potentially permitting an attacker to bypass a W^X policy enforced by SELinux. I have tested the patch on my machine. To test the behavior, compile and run this: #define _GNU_SOURCE #include <unistd.h> #include <sys/personality.h> #include <linux/aio_abi.h> #include <err.h> #include <stdlib.h> #include <stdio.h> #include <sys/syscall.h> int main(void) { personality(READ_IMPLIES_EXEC); aio_context_t ctx = 0; if (syscall(__NR_io_setup, 1, &ctx)) err(1, "io_setup"); char cmd[1000]; sprintf(cmd, "cat /proc/%d/maps \| grep -F '/[aio]'", (int)getpid()); system(cmd); return 0; } In the output, "rw-s" is good, "rwxs" is bad. Signed-off-by: Jann Horn <jann@thejh.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 22f6b4d34fcf039c63a94e7670e0da24f8575a5a) (cherry picked from googlesource commit `bc02d1d9f5`) Bug: 31711619 Change-Id: I9f2872703bef240d6b82320c744529459bb076dc	2017-03-03 13:15:25 -07:00
Jan Kara	bab8f030b0	isofs: Fix infinite looping over CE entries Rock Ridge extensions define so called Continuation Entries (CE) which define where is further space with Rock Ridge data. Corrupted isofs image can contain arbitrarily long chain of these, including a one containing loop and thus causing kernel to end in an infinite loop when traversing these entries. Limit the traversal to 32 entries which should be more than enough space to store all the Rock Ridge data. Reported-by: P J P <ppandit@redhat.com> CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> (cherry picked from commit f54e18f1b831c92f6512d2eedb224cd63d607d3d) Change-Id: I62cd59b27ac11fbf0a04b0d02874df7f390338bb	2017-03-03 12:44:58 -07:00
Omar Sandoval	dbe124efba	block: fix use-after-free in sys_ioprio_get() get_task_ioprio() accesses the task->io_context without holding the task lock and thus can race with exit_io_context(), leading to a use-after-free. The reproducer below hits this within a few seconds on my 4-core QEMU VM: int main(int argc, char *argv) { pid_t pid, child; long nproc, i; / ioprio_set(IOPRIO_WHO_PROCESS, 0, IOPRIO_PRIO_VALUE(IOPRIO_CLASS_IDLE, 0)); / syscall(SYS_ioprio_set, 1, 0, 0x6000); nproc = sysconf(_SC_NPROCESSORS_ONLN); for (i = 0; i < nproc; i++) { pid = fork(); assert(pid != -1); if (pid == 0) { for (;;) { pid = fork(); assert(pid != -1); if (pid == 0) { _exit(0); } else { child = wait(NULL); assert(child == pid); } } } pid = fork(); assert(pid != -1); if (pid == 0) { for (;;) { / ioprio_get(IOPRIO_WHO_PGRP, 0); / syscall(SYS_ioprio_get, 2, 0); } } } for (;;) { / ioprio_get(IOPRIO_WHO_PGRP, 0); */ syscall(SYS_ioprio_get, 2, 0); } return 0; } This gets us KASAN dumps like this: [ 35.526914] ================================================================== [ 35.530009] BUG: KASAN: out-of-bounds in get_task_ioprio+0x7b/0x90 at addr ffff880066f34e6c [ 35.530009] Read of size 2 by task ioprio-gpf/363 [ 35.530009] ============================================================================= [ 35.530009] BUG blkdev_ioc (Not tainted): kasan: bad access detected [ 35.530009] ----------------------------------------------------------------------------- [ 35.530009] Disabling lock debugging due to kernel taint [ 35.530009] INFO: Allocated in create_task_io_context+0x2b/0x370 age=0 cpu=0 pid=360 [ 35.530009] ___slab_alloc+0x55d/0x5a0 [ 35.530009] __slab_alloc.isra.20+0x2b/0x40 [ 35.530009] kmem_cache_alloc_node+0x84/0x200 [ 35.530009] create_task_io_context+0x2b/0x370 [ 35.530009] get_task_io_context+0x92/0xb0 [ 35.530009] copy_process.part.8+0x5029/0x5660 [ 35.530009] _do_fork+0x155/0x7e0 [ 35.530009] SyS_clone+0x19/0x20 [ 35.530009] do_syscall_64+0x195/0x3a0 [ 35.530009] return_from_SYSCALL_64+0x0/0x6a [ 35.530009] INFO: Freed in put_io_context+0xe7/0x120 age=0 cpu=0 pid=1060 [ 35.530009] __slab_free+0x27b/0x3d0 [ 35.530009] kmem_cache_free+0x1fb/0x220 [ 35.530009] put_io_context+0xe7/0x120 [ 35.530009] put_io_context_active+0x238/0x380 [ 35.530009] exit_io_context+0x66/0x80 [ 35.530009] do_exit+0x158e/0x2b90 [ 35.530009] do_group_exit+0xe5/0x2b0 [ 35.530009] SyS_exit_group+0x1d/0x20 [ 35.530009] entry_SYSCALL_64_fastpath+0x1a/0xa4 [ 35.530009] INFO: Slab 0xffffea00019bcd00 objects=20 used=4 fp=0xffff880066f34ff0 flags=0x1fffe0000004080 [ 35.530009] INFO: Object 0xffff880066f34e58 @offset=3672 fp=0x0000000000000001 [ 35.530009] ================================================================== Fix it by grabbing the task lock while we poke at the io_context. Change-Id: I4261aaf076fab943a80a45b0a77e023aa4ecbbd8 Cc: stable@vger.kernel.org Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-11-11 13:35:57 +11:00
Nick Desaulniers	c3f8d15467	fs: ext4: disable support for fallocate FALLOC_FL_PUNCH_HOLE Bug: 28760453 Change-Id: I019c2de559db9e4b95860ab852211b456d78c4ca Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>	2016-10-31 23:29:10 +11:00
Eric W. Biederman	61f1c9ad34	mnt: Fail collect_mounts when applied to unmounted mounts The only users of collect_mounts are in audit_tree.c In audit_trim_trees and audit_add_tree_rule the path passed into collect_mounts is generated from kern_path passed an audit_tree pathname which is guaranteed to be an absolute path. In those cases collect_mounts is obviously intended to work on mounted paths and if a race results in paths that are unmounted when collect_mounts it is reasonable to fail early. The paths passed into audit_tag_tree don't have the absolute path check. But are used to play with fsnotify and otherwise interact with the audit_trees, so again operating only on mounted paths appears reasonable. Avoid having to worry about what happens when we try and audit unmounted filesystems by restricting collect_mounts to mounts that appear in the mount tree. Change-Id: I2edfee6d6951a2179ce8f53785b65ddb1eb95629 Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2016-10-31 22:51:16 +11:00
Thomas Gleixner	34ba3c370d	kthread: Prevent unpark race which puts threads on the wrong cpu The smpboot threads rely on the park/unpark mechanism which binds per cpu threads on a particular core. Though the functionality is racy: CPU0 CPU1 CPU2 unpark(T) wake_up_process(T) clear(SHOULD_PARK) T runs leave parkme() due to !SHOULD_PARK bind_to(CPU2) BUG_ON(wrong CPU) We cannot let the tasks move themself to the target CPU as one of those tasks is actually the migration thread itself, which requires that it starts running on the target cpu right away. The solution to this problem is to prevent wakeups in park mode which are not from unpark(). That way we can guarantee that the association of the task to the target cpu is working correctly. Add a new task state (TASK_PARKED) which prevents other wakeups and use this state explicitly for the unpark wakeup. Peter noticed: Also, since the task state is visible to userspace and all the parked tasks are still in the PID space, its a good hint in ps and friends that these tasks aren't really there for the moment. The migration thread has another related issue. CPU0 CPU1 Bring up CPU2 create_thread(T) park(T) wait_for_completion() parkme() complete() sched_set_stop_task() schedule(TASK_PARKED) The sched_set_stop_task() call is issued while the task is on the runqueue of CPU1 and that confuses the hell out of the stop_task class on that cpu. So we need the same synchronizaion before sched_set_stop_task(). Change-Id: I9ad6fbe65992ad5b5cb9a252470a56ec51a4ff4f Reported-by: Dave Jones <davej@redhat.com> Reported-and-tested-by: Dave Hansen <dave@sr71.net> Reported-and-tested-by: Borislav Petkov <bp@alien8.de> Acked-by: Peter Ziljstra <peterz@infradead.org> Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: dhillf@gmail.com Cc: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1304091635430.21884@ionos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-10-29 23:12:39 +08:00
Jaegeuk Kim	e787ff9965	f2fs: set fsync mark only for the last dnode In order to give atomic writes, we should consider power failure during sync_node_pages in fsync. So, this patch marks fsync flag only in the last dnode block. Change-Id: Ib44a91bf820f6631fe359a8ac430ede77ceda403 Acked-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:38 +08:00
Jaegeuk Kim	69baed249c	f2fs: report unwritten status in fsync_node_pages The fsync_node_pages should return pass or failure so that user could know fsync is completed or not. Change-Id: I3d588c44ad7452e66d3d6a795f2060de75fd5d0f Acked-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:38 +08:00
Jaegeuk Kim	2d6e13b5bb	f2fs: flush dirty pages before starting atomic writes If somebody wrote some data before atomic writes, we should flush them in order to handle atomic data in a right period. Change-Id: I35611d9016330ef837554cff263bcbb10b4cc810 Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:38 +08:00
Jaegeuk Kim	2e84eaaee8	f2fs: unset atomic/volatile flag in f2fs_release_file The atomic/volatile operation should be done in pair of start and commit ioctl. For example, if a killed process remains open-ended atomic operation, we should drop its flag as well as its atomic data. Otherwise, if sqlite initiates another operation which doesn't require atomic writes, it will lose every data, since f2fs still treats with them as atomic writes; nobody will trigger its commit. Change-Id: Ic97f7d88a1158e2f21f4bd5447870ff578641fb3 Reported-by: Miao Xie <miaoxie@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:38 +08:00
Jaegeuk Kim	1282b71690	f2fs: fix dropping inmemory pages in a wrong time When one reader closes its file while the other writer is doing atomic writes, f2fs_release_file drops atomic data resulting in an empty commit. This patch fixes this wrong commit problem by checking openess of the file. Process0 Process1 open file start atomic write write data read data close file f2fs_release_file() clear atomic data commit atomic write Change-Id: I99b90b569a56cb53bccf8758f870e0f49849c6fd Reported-by: Miao Xie <miaoxie@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:38 +08:00
Jaegeuk Kim	4ab3c246ba	f2fs: split sync_node_pages with fsync_node_pages This patch splits the existing sync_node_pages into (f)sync_node_pages. The fsync_node_pages is used for f2fs_sync_file only. Change-Id: I207b087a54f1a0c2e994a78cd6ed475578d7044e Acked-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:38 +08:00
Jaegeuk Kim	3a5683358f	f2fs: avoid writing 0'th page in volatile writes The first page of volatile writes usually contains a sort of header information which will be used for recovery. (e.g., journal header of sqlite) If this is written without other journal data, user needs to handle the stale journal information. Change-Id: I85f4cfe4cbef32ed43b0f52d7328b42d411dd2da Acked-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:38 +08:00
Jaegeuk Kim	7c65b74491	f2fs: avoid needless lock for node pages when fsyncing a file When fsync is called, sync_node_pages finds a proper direct node pages to flush. But, it locks unrelated direct node pages together unnecessarily. Change-Id: I6adc83f2e6592aea707851ee6e365afcc0e36f92 Acked-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:37 +08:00
Chao Yu	36490001bb	f2fs: fix deadlock when flush inline data Below backtrace info was reported by Yunlei He: Call Trace: [<ffffffff817a9395>] schedule+0x35/0x80 [<ffffffff817abb7d>] rwsem_down_read_failed+0xed/0x130 [<ffffffff813c12a8>] call_rwsem_down_read_failed+0x18/0x [<ffffffff817ab1d0>] down_read+0x20/0x30 [<ffffffffa02a1a12>] f2fs_evict_inode+0x242/0x3a0 [f2fs] [<ffffffff81217057>] evict+0xc7/0x1a0 [<ffffffff81217cd6>] iput+0x196/0x200 [<ffffffff812134f9>] __dentry_kill+0x179/0x1e0 [<ffffffff812136f9>] dput+0x199/0x1f0 [<ffffffff811fe77b>] __fput+0x18b/0x220 [<ffffffff811fe84e>] ____fput+0xe/0x10 [<ffffffff81097427>] task_work_run+0x77/0x90 [<ffffffff81074d62>] exit_to_usermode_loop+0x73/0xa2 [<ffffffff81003b7a>] do_syscall_64+0xfa/0x110 [<ffffffff817acf65>] entry_SYSCALL64_slow_path+0x25/0x25 Call Trace: [<ffffffff817a9395>] schedule+0x35/0x80 [<ffffffff81216dc3>] __wait_on_freeing_inode+0xa3/0xd0 [<ffffffff810bc300>] ? autoremove_wake_function+0x40/0x4 [<ffffffff8121771d>] find_inode_fast+0x7d/0xb0 [<ffffffff8121794a>] ilookup+0x6a/0xd0 [<ffffffffa02bc740>] sync_node_pages+0x210/0x650 [f2fs] [<ffffffff8122e690>] ? do_fsync+0x70/0x70 [<ffffffffa02b085e>] block_operations+0x9e/0xf0 [f2fs] [<ffffffff8137b795>] ? bio_endio+0x55/0x60 [<ffffffffa02b0942>] write_checkpoint+0x92/0xba0 [f2fs] [<ffffffff8117da57>] ? mempool_free_slab+0x17/0x20 [<ffffffff8117de8b>] ? mempool_free+0x2b/0x80 [<ffffffff8122e690>] ? do_fsync+0x70/0x70 [<ffffffffa02a53e3>] f2fs_sync_fs+0x63/0xd0 [f2fs] [<ffffffff8129630f>] ? ext4_sync_fs+0xbf/0x190 [<ffffffff8122e6b0>] sync_fs_one_sb+0x20/0x30 [<ffffffff812002e9>] iterate_supers+0xb9/0x110 [<ffffffff8122e7b5>] sys_sync+0x55/0x90 [<ffffffff81003ae9>] do_syscall_64+0x69/0x110 [<ffffffff817acf65>] entry_SYSCALL64_slow_path+0x25/0x25 With following excuting serials, we will set inline_node in inode page after inode was unlinked, result in a deadloop described as below: 1. open file 2. write file 3. unlink file 4. write file 5. close file Thread A Thread B - dput - iput_final - inode->i_state \|= I_FREEING - evict - f2fs_evict_inode - f2fs_sync_fs - write_checkpoint - block_operations - f2fs_lock_all (down_write(cp_rwsem)) - f2fs_lock_op (down_read(cp_rwsem)) - sync_node_pages - ilookup - find_inode_fast - __wait_on_freeing_inode (wait on I_FREEING clear) Here, we change to set inline_node flag only for linked inode for fixing. Change-Id: Ibf4326ecb4ba68e45e4e964092e1d2955341bc56 Reported-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Tested-by: Jaegeuk Kim <jaegeuk@kernel.org> Cc: stable@vger.kernel.org # v4.6 Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:37 +08:00
Chao Yu	2f63209408	f2fs: fix to update dirty page count correctly Once we failed to merge inline data into inode page during flushing inline inode, we will skip invoking inode_dec_dirty_pages, which makes dirty page count incorrect, result in panic in ->evict_inode, Fix it. ------------[ cut here ]------------ kernel BUG at /home/yuchao/git/devf2fs/inode.c:336! invalid opcode: 0000 [#1] PREEMPT SMP CPU: 3 PID: 10004 Comm: umount Tainted: G O 4.6.0-rc5+ #17 Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 task: f0c33000 ti: c5212000 task.ti: c5212000 EIP: 0060:[<f89aacb5>] EFLAGS: 00010202 CPU: 3 EIP is at f2fs_evict_inode+0x85/0x490 [f2fs] EAX: 00000001 EBX: c4529ea0 ECX: 00000001 EDX: 00000000 ESI: c0131000 EDI: f89dd0a0 EBP: c5213e9c ESP: `c5213e78` DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 CR0: 80050033 CR2: b75878c0 CR3: 1a36a700 CR4: 000406f0 Stack: c4529ea0 c4529ef4 c5213e8c c176d45c c4529ef4 00000000 c4529ea0 c4529fac f89dd0a0 c5213eb0 c1204a68 c5213ed8 c452a2b4 c6680930 c5213ec0 c1204b64 c6680d44 c6680620 c5213eec c120588d ee84b000 ee84b5c0 c5214000 ee84b5e0 Call Trace: [<c176d45c>] ? _raw_spin_unlock+0x2c/0x50 [<c1204a68>] evict+0xa8/0x170 [<c1204b64>] dispose_list+0x34/0x50 [<c120588d>] evict_inodes+0x10d/0x130 [<c11ea941>] generic_shutdown_super+0x41/0xe0 [<c1185190>] ? unregister_shrinker+0x40/0x50 [<c1185190>] ? unregister_shrinker+0x40/0x50 [<c11eac52>] kill_block_super+0x22/0x70 [<f89af23e>] kill_f2fs_super+0x1e/0x20 [f2fs] [<c11eae1d>] deactivate_locked_super+0x3d/0x70 [<c11eb383>] deactivate_super+0x43/0x60 [<c1208ec9>] cleanup_mnt+0x39/0x80 [<c1208f50>] __cleanup_mnt+0x10/0x20 [<c107d091>] task_work_run+0x71/0x90 [<c105725a>] exit_to_usermode_loop+0x72/0x9e [<c1001c7c>] do_fast_syscall_32+0x19c/0x1c0 [<c176dd48>] sysenter_past_esp+0x45/0x74 EIP: [<f89aacb5>] f2fs_evict_inode+0x85/0x490 [f2fs] SS:ESP 0068:c5213e78 ---[ end trace d30536330b7fdc58 ]--- Change-Id: I68907a13e6ac726e54f5c2bbe219bc2c8400a558 Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:37 +08:00
Jaegeuk Kim	0222e52820	ext4/fscrypto: avoid RCU lookup in d_revalidate As Al pointed, d_revalidate should return RCU lookup before using d_inode. This was originally introduced by: commit `34286d6662` ("fs: rcu-walk aware d_revalidate method"). Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Theodore Ts'o <tytso@mit.edu> Cc: stable <stable@vger.kernel.org> Conflicts: fs/ext4/crypto.c	2016-10-29 23:12:37 +08:00
Jaegeuk Kim	4f963da44b	fscrypto: don't let data integrity writebacks fail with ENOMEM This patch fixes the issue introduced by the ext4 crypto fix in a same manner. For F2FS, however, we flush the pending IOs and wait for a while to acquire free memory. Fixes: c9af28fdd4492 ("ext4 crypto: don't let data integrity writebacks fail with ENOMEM") Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Conflicts: fs/crypto/crypto.c	2016-10-29 23:12:37 +08:00
Jaegeuk Kim	92d54d00b5	f2fs: use dget_parent and file_dentry in f2fs_file_open This patch synced with the below two ext4 crypto fixes together. In 4.6-rc1, f2fs newly introduced accessing f_path.dentry which crashes overlayfs. To fix, now we need to use file_dentry() to access that field. [Backport NOTE] - Over 4.2, it should use file_dentry Fixes: c0a37d487884 ("ext4: use file_dentry()") Fixes: 9dd78d8c9a7b ("ext4: use dget_parent() in ext4_file_open()") Cc: Miklos Szeredi <mszeredi@redhat.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:36 +08:00
Jaegeuk Kim	c5c8bb5165	fscrypto: use dget_parent() in fscrypt_d_revalidate() This patch updates fscrypto along with the below ext4 crypto change. Fixes: 3d43bcfef5f0 ("ext4 crypto: use dget_parent() in ext4_d_revalidate()") Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Conflicts: fs/crypto/crypto.c	2016-10-29 23:12:36 +08:00
Shuoran Liu	5e468690df	f2fs: retrieve IO write stat from the right place In the following patch, f2fs: split journal cache from curseg cache journal cache is split from curseg cache. So IO write statistics should be retrived from journal cache but not curseg->sum_blk. Otherwise, it will get 0, and the stat is lost. Signed-off-by: Shuoran Liu <liushuoran@huawei.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:36 +08:00
Jaegeuk Kim	df869c0814	f2fs crypto: fix corrupted symlink in encrypted case In the encrypted symlink case, we should check its corrupted symname after decrypting it. Otherwise, we can report -ENOENT incorrectly, if encrypted symname starts with '\0'. Cc: stable 4.5+ <stable@vger.kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:36 +08:00
Jaegeuk Kim	b54ac0bf29	f2fs: cover large section in sanity check of super This patch fixes the bug which does not cover a large section case when checking the sanity of superblock. If f2fs detects misalignment, it will fix the superblock during the mount time, so it doesn't need to trigger fsck.f2fs further. Reported-by: Matthias Prager <linux@matthiasprager.de> Reported-by: David Gnedt <david.gnedt@davizone.at> Cc: stable 4.5+ <stable@vger.kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:36 +08:00
Linus Torvalds	1db29fa179	f2fs/crypto: fix xts_tweak initialization Commit 0b81d07790726 ("fs crypto: move per-file encryption from f2fs tree to fs/crypto") moved the f2fs crypto files to fs/crypto/ and renamed the symbol prefixes from "f2fs_" to "fscrypt_" (and from "F2FS_" to just "FS" for preprocessor symbols). Because of the symbol renaming, it's a bit hard to see it as a file move: use git show -M30 0b81d07790726 to lower the rename detection to just 30% similarity and make git show the files as renamed (the header file won't be shown as a rename even then - since all it contains is symbol definitions, it looks almost completely different). Even with the renames showing as renames, the diffs are not all that easy to read, since so much is just the renames. But Eric Biggers noticed that it's not just all renames: the initialization of the xts_tweak had been broken too, using the inode number rather than the page offset. That's not right - it makes the xfs_tweak the same for all pages of each inode. It _might_ make sense to make the xfs_tweak contain both the offset _and_ the inode number, but not just the inode number. Reported-by: Eric Biggers <ebiggers3@gmail.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-29 23:12:36 +08:00
Jaegeuk Kim	841b7347fa	f2fs: submit node page write bios when really required If many threads calls fsync with data writes, we don't need to flush every bios having node page writes. The f2fs_wait_on_page_writeback will flush its bios when the page is really needed. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:36 +08:00
Arnd Bergmann	f9b55fda32	f2fs: add missing argument to f2fs_setxattr stub The f2fs_setxattr() prototype for CONFIG_F2FS_FS_XATTR=n has been wrong for a long time, since `8ae8f1627f` ("f2fs: support xattr security labels"), but there have never been any callers, so it did not matter. Now, the function gets called from f2fs_ioc_keyctl(), which causes a build failure: fs/f2fs/file.c: In function 'f2fs_ioc_keyctl': include/linux/stddef.h:7:14: error: passing argument 6 of 'f2fs_setxattr' makes integer from pointer without a cast [-Werror=int-conversion] #define NULL ((void )0) ^ fs/f2fs/file.c:1599:27: note: in expansion of macro 'NULL' value, F2FS_KEY_SIZE, NULL, type); ^ In file included from ../fs/f2fs/file.c:29:0: fs/f2fs/xattr.h:129:19: note: expected 'int' but argument is of type 'void ' static inline int f2fs_setxattr(struct inode inode, int index, ^ fs/f2fs/file.c:1597:9: error: too many arguments to function 'f2fs_setxattr' return f2fs_setxattr(inode, F2FS_XATTR_INDEX_KEY, ^ In file included from ../fs/f2fs/file.c:29:0: fs/f2fs/xattr.h:129:19: note: declared here static inline int f2fs_setxattr(struct inode inode, int index, Thsi changes the prototype of the empty stub function to match that of the actual implementation. This will not make the key management work when F2FS_FS_XATTR is disabled, but it gets it to build at least. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:36 +08:00
Chao Yu	135b6ab7ff	f2fs: fix to avoid unneeded unlock_new_inode During ->lookup, I_NEW state of inode was been cleared in f2fs_iget, so in error path, we don't need to clear it again. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:36 +08:00
Chao Yu	0487130e26	f2fs: clean up opened code with f2fs_update_dentry Just clean up opened code with existing function, no logic change. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:36 +08:00
Jaegeuk Kim	1e100874fc	f2fs: declare static functions Just to avoid sparse warnings. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:36 +08:00
Fan Li	6be6530f86	f2fs: modify the readahead method in ra_node_page() ra_node_page() is used to read ahead one node page. Comparing to regular read, it's faster because it doesn't wait for IO completion. But if it is called twice for reading the same block, and the IO request from the first call hasn't been completed before the second call, the second call will have to wait until the read is over. Here use the code in __do_page_cache_readahead() to solve this problem. It does nothing when someone else already puts the page in mapping. The status of page should be assured by whoever puts it there. This implement also prevents alteration of page reference count. Signed-off-by: Fan li <fanofcode.li@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:36 +08:00
Jaegeuk Kim	6f86d87cf6	f2fs crypto: sync ext4_lookup and ext4_file_open This patch tries to catch up with lookup and open policies in ext4. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:36 +08:00
Jaegeuk Kim	ea38719073	f2fs: define not-set fallocate flags This fixes building bugs in aosp. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:35 +08:00
Jaegeuk Kim	47393901fb	fs crypto: move per-file encryption from f2fs tree to fs/crypto This patch adds the renamed functions moved from the f2fs crypto files. [Backporting to 3.10] - Removed d_is_negative() in fscrypt_d_revalidate(). 1. definitions for per-file encryption used by ext4 and f2fs. 2. crypto.c for encrypt/decrypt functions a. IO preparation: - fscrypt_get_ctx / fscrypt_release_ctx b. before IOs: - fscrypt_encrypt_page - fscrypt_decrypt_page - fscrypt_zeroout_range c. after IOs: - fscrypt_decrypt_bio_pages - fscrypt_pullback_bio_page - fscrypt_restore_control_page 3. policy.c supporting context management. a. For ioctls: - fscrypt_process_policy - fscrypt_get_policy b. For context permission - fscrypt_has_permitted_context - fscrypt_inherit_context 4. keyinfo.c to handle permissions - fscrypt_get_encryption_info - fscrypt_free_encryption_info 5. fname.c to support filename encryption a. general wrapper functions - fscrypt_fname_disk_to_usr - fscrypt_fname_usr_to_disk - fscrypt_setup_filename - fscrypt_free_filename b. specific filename handling functions - fscrypt_fname_alloc_buffer - fscrypt_fname_free_buffer 6. Makefile and Kconfig Cc: Al Viro <viro@ftp.linux.org.uk> Signed-off-by: Michael Halcrow <mhalcrow@google.com> Signed-off-by: Ildar Muslukhov <ildarm@google.com> Signed-off-by: Uday Savagaonkar <savagaon@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Conflicts: fs/f2fs/f2fs.h fs/f2fs/inode.c fs/f2fs/super.c include/linux/fs.h Change-Id: I162f3562aed66e7e377ec38714fae96c651fbdd7	2016-10-29 23:12:35 +08:00
Yang Shi	2b9dcf81e6	f2fs: mutex can't be used by down_write_nest_lock() f2fs_lock_all() calls down_write_nest_lock() to acquire a rw_sem and check a mutex, but down_write_nest_lock() is designed for two rw_sem accoring to the comment in include/linux/rwsem.h. And, other than f2fs, it is just called in mm/mmap.c with two rwsem. So, it looks it is used wrongly by f2fs. And, it causes the below compile warning on -rt kernel too. In file included from fs/f2fs/xattr.c:25:0: fs/f2fs/f2fs.h: In function 'f2fs_lock_all': fs/f2fs/f2fs.h:962:34: warning: passing argument 2 of 'down_write_nest_lock' from incompatible pointer type [-Wincompatible-pointer-types] f2fs_down_write(&sbi->cp_rwsem, &sbi->cp_mutex); ^ fs/f2fs/f2fs.h:27:55: note: in definition of macro 'f2fs_down_write' #define f2fs_down_write(x, y) down_write_nest_lock(x, y) ^ In file included from include/linux/rwsem.h:22:0, from fs/f2fs/xattr.c:21: include/linux/rwsem_rt.h:138:20: note: expected 'struct rw_semaphore ' but argument is of type 'struct mutex ' static inline void down_write_nest_lock(struct rw_semaphore *sem, Signed-off-by: Yang Shi <yang.shi@linaro.org> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:35 +08:00
Liu Xue	55340ae5aa	f2fs: recovery missing dot dentries in root directory If f2fs was corrupted with missing dot dentries in root dirctory, it needs to recover them after fsck.f2fs set F2FS_INLINE_DOTS flag in directory inode when fsck.f2fs detects missing dot dentries. Signed-off-by: Xue Liu <liuxueliu.liu@huawei.com> Signed-off-by: Yong Sheng <shengyong1@huawei.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:35 +08:00
Willy Tarreau	dcaffd537f	pipe: limit the per-user amount of pages allocated in pipes On no-so-small systems, it is possible for a single process to cause an OOM condition by filling large pipes with data that are never read. A typical process filling 4000 pipes with 1 MB of data will use 4 GB of memory. On small systems it may be tricky to set the pipe max size to prevent this from happening. This patch makes it possible to enforce a per-user soft limit above which new pipes will be limited to a single page, effectively limiting them to 4 kB each, as well as a hard limit above which no new pipes may be created for this user. This has the effect of protecting the system against memory abuse without hurting other users, and still allowing pipes to work correctly though with less data at once. The limit are controlled by two new sysctls : pipe-user-pages-soft, and pipe-user-pages-hard. Both may be disabled by setting them to zero. The default soft limit allows the default number of FDs per process (1024) to create pipes of the default size (64kB), thus reaching a limit of 64MB before starting to create only smaller pipes. With 256 processes limited to 1024 FDs each, this results in 102464kB + (2561024 - 1024) * 4kB = 1084 MB of memory allocated for a user. The hard limit is disabled by default to avoid breaking existing applications that make intensive use of pipes (eg: for splicing). Reported-by: socketpair@gmail.com Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Mitigates: CVE-2013-4312 (Linux 2.0+) Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Conflicts: Documentation/sysctl/fs.txt fs/pipe.c include/linux/sched.h Change-Id: Ic7c678af18129943e16715fdaa64a97a7f0854be	2016-10-29 23:12:35 +08:00
Theodore Ts'o	2da70f4e34	ext4: avoid hang when mounting non-journal filesystems with orphan list When trying to mount a file system which does not contain a journal, but which does have a orphan list containing an inode which needs to be truncated, the mount call with hang forever in ext4_orphan_cleanup() because ext4_orphan_del() will return immediately without removing the inode from the orphan list, leading to an uninterruptible loop in kernel code which will busy out one of the CPU's on the system. This can be trivially reproduced by trying to mount the file system found in tests/f_orphan_extents_inode/image.gz from the e2fsprogs source tree. If a malicious user were to put this on a USB stick, and mount it on a Linux desktop which has automatic mounts enabled, this could be considered a potential denial of service attack. (Not a big deal in practice, but professional paranoids worry about such things, and have even been known to allocate CVE numbers for such problems.) Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Zheng Liu <wenqing.lz@taobao.com> Cc: stable@vger.kernel.org	2016-10-29 23:12:34 +08:00
Anatol Pomozov	012f862b13	ext4: make orphan functions be no-op in no-journal mode Instead of checking whether the handle is valid, we check if journal is enabled. This avoids taking the s_orphan_lock mutex in all cases when there is no journal in use, including the error paths where ext4_orphan_del() is called with a handle set to NULL. Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2016-10-29 23:12:34 +08:00
Chao Yu	b86bad0997	f2fs: fix to avoid deadlock when merging inline data When testing with fsstress, kworker and user threads were both blocked: INFO: task kworker/u16:1:16580 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kworker/u16:1 D ffff8803f2595390 0 16580 2 0x00000000 Workqueue: writeback bdi_writeback_workfn (flush-251:0) ffff8802730e5760 0000000000000046 ffff880274729fc0 0000000000012440 ffff8802730e5fd8 ffff8802730e4010 0000000000012440 0000000000012440 ffff8802730e5fd8 0000000000012440 ffff880274729fc0 ffff88026eb50000 Call Trace: [<ffffffff816fe9d9>] schedule+0x29/0x70 [<ffffffff816ff895>] rwsem_down_read_failed+0xa5/0xf9 [<ffffffff81378584>] call_rwsem_down_read_failed+0x14/0x30 [<ffffffffa0694feb>] f2fs_write_data_page+0x31b/0x420 [f2fs] [<ffffffffa0690f1a>] __f2fs_writepage+0x1a/0x50 [f2fs] [<ffffffffa06922a0>] f2fs_write_data_pages+0xe0/0x290 [f2fs] [<ffffffff811473b3>] do_writepages+0x23/0x40 [<ffffffff811cc3ee>] __writeback_single_inode+0x4e/0x250 [<ffffffff811cd4f1>] writeback_sb_inodes+0x2c1/0x470 [<ffffffff811cd73e>] __writeback_inodes_wb+0x9e/0xd0 [<ffffffff811cda0b>] wb_writeback+0x1fb/0x2d0 [<ffffffff811cdb7c>] wb_do_writeback+0x9c/0x220 [<ffffffff811ce232>] bdi_writeback_workfn+0x72/0x1c0 [<ffffffff8106b74e>] process_one_work+0x1de/0x5b0 [<ffffffff8106e78f>] worker_thread+0x11f/0x3e0 [<ffffffff810750ce>] kthread+0xde/0xf0 [<ffffffff817093f8>] ret_from_fork+0x58/0x90 fsstress thread stack: [<ffffffff81139f0e>] sleep_on_page+0xe/0x20 [<ffffffff81139ef7>] __lock_page+0x67/0x70 [<ffffffff8113b100>] find_lock_page+0x50/0x80 [<ffffffff8113b24f>] find_or_create_page+0x3f/0xb0 [<ffffffffa06983a9>] sync_node_pages+0x259/0x810 [f2fs] [<ffffffffa068d874>] write_checkpoint+0x1a4/0xce0 [f2fs] [<ffffffffa0686b0c>] f2fs_sync_fs+0x7c/0xd0 [f2fs] [<ffffffffa067c813>] f2fs_sync_file+0x143/0x5f0 [f2fs] [<ffffffff811d301b>] vfs_fsync_range+0x2b/0x40 [<ffffffff811d304c>] vfs_fsync+0x1c/0x20 [<ffffffff811d3291>] do_fsync+0x41/0x70 [<ffffffff811d32d3>] SyS_fdatasync+0x13/0x20 [<ffffffff817094a2>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff The reason of this issue is: CPU0: CPU1: - f2fs_write_data_pages - f2fs_sync_fs - write_checkpoint - block_operations - f2fs_lock_all - down_write(sbi->cp_rwsem) - lock_page(page) - f2fs_write_data_page - sync_node_pages - flush_inline_data - pagecache_get_page(page, GFP_LOCK) - f2fs_lock_op - down_read(sbi->cp_rwsem) This patch alters to use trylock_page in flush_inline_data to fix this ABBA deadlock issue. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:32 +08:00
Chao Yu	09f4658771	f2fs: introduce f2fs_flush_merged_bios for cleanup Add a new helper f2fs_flush_merged_bios to clean up redundant codes. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:32 +08:00
Chao Yu	cc7e1e5fe8	f2fs: introduce f2fs_update_data_blkaddr for cleanup Add a new help f2fs_update_data_blkaddr to clean up redundant codes. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:32 +08:00
Chao Yu	e90d738a5e	f2fs crypto: fix incorrect positioning for GCing encrypted data page For now, flow of GCing an encrypted data page: 1) try to grab meta page in meta inode's mapping with index of old block address of that data page 2) load data of ciphertext into meta page 3) allocate new block address 4) write the meta page into new block address 5) update block address pointer in direct node page. Other reader/writer will use f2fs_wait_on_encrypted_page_writeback to check and wait on GCed encrypted data cached in meta page writebacked in order to avoid inconsistence among data page cache, meta page cache and data on-disk when updating. However, we will use new block address updated in step 5) as an index to lookup meta page in inner bio buffer. That would be wrong, and we will never find the GCing meta page, since we use the old block address as index of that page in step 1). This patch fixes the issue by adjust the order of step 1) and step 3), and in step 1) grab page with index generated in step 3). Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Conflicts: fs/f2fs/gc.c	2016-10-29 23:12:32 +08:00
Chao Yu	b22711753f	f2fs: fix incorrect upper bound when iterating inode mapping tree 1. Inode mapping tree can index page in range of [0, ULONG_MAX], however, in some places, f2fs only search or iterate page in ragne of [0, LONG_MAX], result in miss hitting in page cache. 2. filemap_fdatawait_range accepts range parameters in unit of bytes, so the max range it covers should be [0, LLONG_MAX], if we use [0, LONG_MAX] as range for waiting on writeback, big number of pages will not be covered. This patch corrects above two issues. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:32 +08:00
Yunlei He	76fa243e45	f2fs: avoid hungtask problem caused by losing wake_up The D state of wait_on_all_pages_writeback should be waken by function f2fs_write_end_io when all writeback pages have been succesfully written to device. It's possible that wake_up comes between get_pages and io_schedule. Maybe in this case it will lost wake_up and still in D state even if all pages have been write back to device, and finally, the whole system will be into the hungtask state. if (!get_pages(sbi, F2FS_WRITEBACK)) break; <--------- wake_up io_schedule(); Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Biao He <hebiao6@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:32 +08:00
Chao Yu	d37aaf04b7	f2fs: trace old block address for CoWed page This patch enables to trace old block address of CoWed page for better debugging. f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4f0, oldaddr = 0xfe8ab, newaddr = 0xfee90 rw = WRITE_SYNC, type = NODE f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4f8, oldaddr = 0xfe8b0, newaddr = 0xfee91 rw = WRITE_SYNC, type = NODE f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4fa, oldaddr = 0xfe8ae, newaddr = 0xfee92 rw = WRITE_SYNC, type = NODE f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x96, oldaddr = 0xf049b, newaddr = 0x2bbe rw = WRITE, type = DATA f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x97, oldaddr = 0xf049c, newaddr = 0x2bbf rw = WRITE, type = DATA f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x98, oldaddr = 0xf049d, newaddr = 0x2bc0 rw = WRITE, type = DATA f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x47, oldaddr = 0xffffffff, newaddr = 0xf2631 rw = WRITE, type = DATA f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x48, oldaddr = 0xffffffff, newaddr = 0xf2632 rw = WRITE, type = DATA f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x49, oldaddr = 0xffffffff, newaddr = 0xf2633 rw = WRITE, type = DATA Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:32 +08:00
Chao Yu	3ccd1c10f5	f2fs: try to flush inode after merging inline data When flushing node pages, if current node page is an inline inode page, we will try to merge inline data from data page into inline inode page, then skip flushing current node page, it will decrease the number of nodes to be flushed in batch in this round, which may lead to worse performance. This patch gives a chance to flush just merged inline inode pages for performance. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:32 +08:00
Chao Yu	7ad7e4c310	f2fs: show more info about superblock recovery This patch changes to show more info in message log about the recovery of the corrupted superblock during ->mount, e.g. the index of corrupted superblock and the result of recovery. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:32 +08:00
Chao Yu	5c16430b87	f2fs: fix the wrong stat count of calling gc With a partition which was formated as multi segments in one section, we stated incorrectly for count of gc operation. e.g., for a partition with segs_per_sec = 4 cat /sys/kernel/debug/f2fs/status GC calls: 208 (BG: 7) - data segments : 104 (52) - node segments : 104 (24) GC called count should be (104 (data segs) + 104 (node segs)) / 4 = 52, rather than 208. Fix it. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:32 +08:00
Jaegeuk Kim	60f5accac7	f2fs: remain last victim segment number ascending order This patch avoids to remain inefficient victim segment number selected by a victim. For example, if all the dirty segments has same valid blocks, we can get the victim segments descending order due to keeping wrong last segment number. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:32 +08:00
Shawn Lin	e8d08bb60c	f2fs: reuse read_inline_data for f2fs_convert_inline_page f2fs_convert_inline_page introduce what read_inline_data already does for copying out the inline data from inode_page. We can use read_inline_data instead to simplify the code. Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:32 +08:00
Chao Yu	dd44751411	f2fs: fix to delete old dirent in converted inline directory in ->rename When doing test with fstests/generic/068 in inline_dentry enabled f2fs, following oops dmesg will be reported: ------------[ cut here ]------------ WARNING: CPU: 5 PID: 11841 at fs/inode.c:273 drop_nlink+0x49/0x50() Modules linked in: f2fs(O) ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state CPU: 5 PID: 11841 Comm: fsstress Tainted: G O 4.5.0-rc1 #45 Hardware name: Hewlett-Packard HP Z220 CMT Workstation/1790, BIOS K51 v01.61 05/16/2013 0000000000000111 ffff88009cdf7ae8 ffffffff813e5944 0000000000002e41 0000000000000000 0000000000000111 0000000000000000 ffff88009cdf7b28 ffffffff8106a587 ffff88009cdf7b58 ffff8804078fe180 ffff880374a64e00 Call Trace: [<ffffffff813e5944>] dump_stack+0x48/0x64 [<ffffffff8106a587>] warn_slowpath_common+0x97/0xe0 [<ffffffff8106a5ea>] warn_slowpath_null+0x1a/0x20 [<ffffffff81231039>] drop_nlink+0x49/0x50 [<ffffffffa07b95b4>] f2fs_rename2+0xe04/0x10c0 [f2fs] [<ffffffff81231ff1>] ? lock_two_nondirectories+0x81/0x90 [<ffffffff813f454d>] ? lockref_get+0x1d/0x30 [<ffffffff81220f70>] vfs_rename+0x2e0/0x640 [<ffffffff8121f9db>] ? lookup_dcache+0x3b/0xd0 [<ffffffff810b8e41>] ? update_fast_ctr+0x21/0x40 [<ffffffff8134ff12>] ? security_path_rename+0xa2/0xd0 [<ffffffff81224af6>] SYSC_renameat2+0x4b6/0x540 [<ffffffff810ba8ed>] ? trace_hardirqs_off+0xd/0x10 [<ffffffff810022ba>] ? exit_to_usermode_loop+0x7a/0xd0 [<ffffffff817e0ade>] ? int_ret_from_sys_call+0x52/0x9f [<ffffffff810bdc90>] ? trace_hardirqs_on_caller+0x100/0x1c0 [<ffffffff81224b8e>] SyS_renameat2+0xe/0x10 [<ffffffff8121f08e>] SyS_rename+0x1e/0x20 [<ffffffff817e0957>] entry_SYSCALL_64_fastpath+0x12/0x6f ---[ end trace 2b31e17995404e42 ]--- This is because: in the same inline directory, when we renaming one file from source name to target name which is not existed, once space of inline dentry is not enough, inline conversion will be triggered, after that all data in inline dentry will be moved to normal dentry page. After attaching the new entry in coverted dentry page, still we try to remove old entry in original inline dentry, since old entry has been moved, so it obviously doesn't make any effect, result in remaining old entry in converted dentry page. Now, we have two valid dentries pointed to the same inode which has nlink value of 1, deleting them both, above warning appears. This issue can be reproduced easily as below steps: 1. mount f2fs with inline_dentry option 2. mkdir dir 3. touch 180 files named [001-180] in dir 4. rename dir/180 dir/181 5. rm dir/180 dir/181 Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:31 +08:00
Chao Yu	7af1656411	f2fs: detect error of update_dent_inode in ->rename Should check and show correct return value of update_dent_inode in ->rename. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:31 +08:00
Shawn Lin	a03b195403	f2fs: move sanity checking of cp into get_valid_checkpoint >From the function name of get_valid_checkpoint, it seems to return the valid cp or NULL for caller to check. If no valid one is found, f2fs_fill_super will print the err log. But if get_valid_checkpoint get one valid(the return value indicate that it's valid, however actually it is invalid after sanity checking), then print another similar err log. That seems strange. Let's keep sanity checking inside the procedure of geting valid cp. Another improvement we gained from this move is that even the large volume is supported, we check the cp in advanced to skip the following procedure if failing the sanity checking. Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:31 +08:00
Shawn Lin	1d4eb788ac	f2fs: slightly reorganize read_raw_super_block read_raw_super_block was introduced to help find the first valid superblock. Commit da554e48caab ("f2fs: recovering broken superblock during mount") changed the behaviour to read both of them and check whether need the recovery flag or not. So the comment before this function isn't consistent with what it actually does. Also, the origin code use two tags to round the err cases, which isn't so readable. So this patch amend the comment and slightly reorganize it. Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:31 +08:00
Chao Yu	33336cb167	f2fs: reorder nat cache lock in cache_nat_entry When lookuping nat entry in cache_nat_entry, if we fail to hit nat cache, we try to load nat entries a) from journal of current segment cache or b) from NAT pages for updating, during the process, write lock of nat_tree_lock will be held to avoid inconsistent condition in between nid cache and nat cache caused by racing among nat entry shrinker, checkpointer, nat entry updater. But this way may cause low efficient when updating nat cache, because it serializes accessing in journal cache or reading NAT pages. Here, we reorder lock and update flow as below to enhance accessing concurrency: - get_node_info - down_read(nat_tree_lock) - lookup nat cache --- hit -> unlock & return - lookup journal cache --- hit -> unlock & goto update - up_read(nat_tree_lock) update: - down_write(nat_tree_lock) - cache_nat_entry - lookup nat cache --- nohit -> update - up_write(nat_tree_lock) Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:31 +08:00
Chao Yu	626dcbd8f5	f2fs: split journal cache from curseg cache In curseg cache, f2fs caches two different parts: - datas of current summay block, i.e. summary entries, footer info. - journal info, i.e. sparse nat/sit entries or io stat info. With this approach, 1) it may cause higher lock contention when we access or update both of the parts of cache since we use the same mutex lock curseg_mutex to protect the cache. 2) current summary block with last journal info will be writebacked into device as a normal summary block when flushing, however, we treat journal info as valid one only in current summary, so most normal summary blocks contain junk journal data, it wastes remaining space of summary block. So, in order to fix above issues, we split curseg cache into two parts: a) current summary block, protected by original mutex lock curseg_mutex b) journal cache, protected by newly introduced r/w semaphore journal_rwsem When loading curseg cache during ->mount, we store summary info and journal info into different caches; When doing checkpoint, we combine datas of two cache into current summary block for persisting. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:31 +08:00
Chao Yu	80f42a5919	f2fs: enhance IO path with block plug Try to use block plug in more place as below to let process cache bios as much as possbile, in order to reduce lock overhead of queue in IO scheduler. 1) sync_meta_pages 2) ra_meta_pages 3) f2fs_balance_fs_bg Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:31 +08:00
Chao Yu	0f9485738f	f2fs: introduce f2fs_journal struct to wrap journal info Introduce a new structure f2fs_journal to wrap journal info in struct f2fs_summary_block for readability. struct f2fs_journal { union { __le16 n_nats; __le16 n_sits; }; union { struct nat_journal nat_j; struct sit_journal sit_j; struct f2fs_extra_info info; }; } __packed; struct f2fs_summary_block { struct f2fs_summary entries[ENTRIES_IN_SUM]; struct f2fs_journal journal; struct summary_footer footer; } __packed; Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:31 +08:00
Chao Yu	06f3d93e20	f2fs crypto: avoid unneeded memory allocation when {en/de}crypting symlink This patch adopts f2fs with codes of ext4, it removes unneeded memory allocation in creating/accessing path of symlink. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Conflicts: fs/f2fs/crypto_fname.c	2016-10-29 23:12:31 +08:00
Chao Yu	3b92fcd38b	f2fs crypto: handle unexpected lack of encryption keys This patch syncs f2fs with commit abdd438b26b4 ("ext4 crypto: handle unexpected lack of encryption keys") from ext4. Fix up attempts by users to try to write to a file when they don't have access to the encryption key. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:31 +08:00
Chao Yu	a18a1be377	f2fs crypto: make sure the encryption info is initialized on opendir(2) This patch syncs f2fs with commit 6bc445e0ff44 ("ext4 crypto: make sure the encryption info is initialized on opendir(2)") from ext4. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:31 +08:00
Chao Yu	3d95525a4a	f2fs: support revoking atomic written pages f2fs support atomic write with following semantics: 1. open db file 2. ioctl start atomic write 3. (write db file) * n 4. ioctl commit atomic write 5. close db file With this flow we can avoid file becoming corrupted when abnormal power cut, because we hold data of transaction in referenced pages linked in inmem_pages list of inode, but without setting them dirty, so these data won't be persisted unless we commit them in step 4. But we should still hold journal db file in memory by using volatile write, because our semantics of 'atomic write support' is incomplete, in step 4, we could fail to submit all dirty data of transaction, once partial dirty data was committed in storage, then after a checkpoint & abnormal power-cut, db file will be corrupted forever. So this patch tries to improve atomic write flow by adding a revoking flow, once inner error occurs in committing, this gives another chance to try to revoke these partial submitted data of current transaction, it makes committing operation more like aotmical one. If we're not lucky, once revoking operation was failed, EAGAIN will be reported to user for suggesting doing the recovery with held journal file, or retrying current transaction again. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:31 +08:00
Chao Yu	6399d80f86	f2fs: split drop_inmem_pages from commit_inmem_pages Split drop_inmem_pages from commit_inmem_pages for code readability, and prepare for the following modification. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:31 +08:00
Jaegeuk Kim	de89056ac2	f2fs: avoid garbage lenghs in dentries This patch fixes to eliminate garbage name lengths in dentries in order to provide correct answers of readdir. For example, if a valid dentry consists of: bitmap : 1 1 1 1 len : 32 0 x 0, readdir can start with second bit_pos having len = 0. Or, it can start with third bit_pos having garbage. In both of cases, we should avoid to try filling dentries. So, this patch not only removes any garbage length, but also avoid entering zero length case in readdir. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:30 +08:00
Jaegeuk Kim	9be179b0be	f2fs: use correct errno This patch is to fix misused error number. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:30 +08:00
Jaegeuk Kim	f87ff6041f	f2fs crypto: add missing locking for keyring_key access This patch adopts: ext4 crypto: add missing locking for keyring_key access Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Conflicts: fs/f2fs/crypto_key.c	2016-10-29 23:12:30 +08:00
Jaegeuk Kim	84d0f07b55	f2fs crypto: check for too-short encrypted file names This patch adopts: ext4 crypto: check for too-short encrypted file names An encrypted file name should never be shorter than an 16 bytes, the AES block size. The 3.10 crypto layer will oops and crash the kernel if ciphertext shorter than the block size is passed to it. Fortunately, in modern kernels the crypto layer will not crash the kernel in this scenario, but nevertheless, it represents a corrupted directory, and we should detect it and mark the file system as corrupted so that e2fsck can fix this. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:30 +08:00
Jaegeuk Kim	1513d8ceda	f2fs crypto: f2fs_page_crypto() doesn't need a encryption context This patch adopts: ext4 crypto: ext4_page_crypto() doesn't need a encryption context Since ext4_page_crypto() doesn't need an encryption context (at least not any more), this allows us to simplify a number function signature and also allows us to avoid needing to allocate a context in ext4_block_write_begin(). It also means we no longer need a separate ext4_decrypt_one() function. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:30 +08:00
Jaegeuk Kim	d88d6ef8de	f2fs crypto: fix spelling typo in comment This patch adopts: ext4 crypto: fix spelling typo in comment Signed-off-by: Laurent Navet <laurent.navet@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:30 +08:00
Jaegeuk Kim	a940126577	f2fs crypto: replace some BUG_ON()'s with error checks This patch adopts: ext4 crypto: replace some BUG_ON()'s with error checks Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Conflicts: fs/f2fs/crypto_key.c	2016-10-29 23:12:30 +08:00
Jaegeuk Kim	309f065641	f2fs: increase i_size to avoid missing data When finsert is doing with dirting pages, we should increase i_size right away. Otherwise, the moved page is able to be dropped by the following filemap_write_and_wait_range before updating i_size. Especially, it can be done by if ((page->index >= end_index + 1) \|\| !offset) goto out; in f2fs_write_data_page. This should resolve the below xfstests/091 failure reported by Dave. $ diff -u tests/generic/091.out /home/dave/src/xfstests-dev/results//f2fs/generic/091.out.bad --- tests/generic/091.out 2014-01-20 16:57:33.000000000 +1100 +++ /home/dave/src/xfstests-dev/results//f2fs/generic/091.out.bad 2016-02-08 15:21:02.701375087 +1100 @@ -1,7 +1,18 @@ QA output created by 091 fsx -N 10000 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W -fsx -N 10000 -o 8192 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W -fsx -N 10000 -o 32768 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W -fsx -N 10000 -o 8192 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W -fsx -N 10000 -o 32768 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W -fsx -N 10000 -o 128000 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -W +mapped writes DISABLED +skipping insert range behind EOF +skipping insert range behind EOF +truncating to largest ever: 0x11e00 +dowrite: write: Invalid argument +LOG DUMP (7 total operations): +1( 1 mod 256): SKIPPED (no operation) +2( 2 mod 256): SKIPPED (no operation) +3( 3 mod 256): FALLOC 0x2e0f2 thru 0x3134a (0x3258 bytes) PAST_EOF +4( 4 mod 256): SKIPPED (no operation) +5( 5 mod 256): SKIPPED (no operation) +6( 6 mod 256): TRUNCATE UP from 0x0 to 0x11e00 +7( 7 mod 256): WRITE 0x73400 thru 0x79fff (0x6c00 bytes) HOLE +Log of operations saved to "/mnt/test/junk.fsxops"; replay with --replay-ops +Correct content saved for comparison +(maybe hexdump "/mnt/test/junk" vs "/mnt/test/junk.fsxgood") Reported-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:30 +08:00
Jaegeuk Kim	df014cd170	f2fs: preallocate blocks for buffered aio writes This patch preallocates data blocks for buffered aio writes. With this patch, we can avoid redundant locking and unlocking of node pages given consecutive aio request. [For 3.10] - Add preallocationg for generic_splice_write(sendfile) for xfstests/249, 285 Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Conflicts: fs/f2fs/data.c	2016-10-29 23:12:30 +08:00
Jaegeuk Kim	a206ff8ff8	f2fs: move dio preallocation into f2fs_file_write_iter This patch moves preallocation code for direct IOs into f2fs_file_write_iter. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Conflicts: fs/f2fs/data.c fs/f2fs/file.c	2016-10-29 23:12:30 +08:00
Yunlei He	6356b2f4f9	f2fs: fix missing skip pages info fix missing skip pages info in f2fs_writepages trace event. Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:30 +08:00
Chao Yu	9a1fd07149	f2fs: introduce f2fs_submit_merged_bio_cond f2fs use single bio buffer per type data (META/NODE/DATA) for caching writes locating in continuous block address as many as possible, after submitting, these writes may be still cached in bio buffer, so we have to flush cached writes in bio buffer by calling f2fs_submit_merged_bio. Unfortunately, in the scenario of high concurrency, bio buffer could be flushed by someone else before we submit it as below reasons: a) there is no space in bio buffer. b) add a request of different type (SYNC, ASYNC). c) add a discontinuous block address. For this condition, f2fs_submit_merged_bio will be devastating, because it could break the following merging of writes in bio buffer, split one big bio into two smaller one. This patch introduces f2fs_submit_merged_bio_cond which can do a conditional submitting with bio buffer, before submitting it will judge whether: - page in DATA type bio buffer is matching with specified page; - page in DATA type bio buffer is belong to specified inode; - page in NODE type bio buffer is belong to specified inode; If there is no eligible page in bio buffer, we will skip submitting step, result in gaining more chance to merge consecutive block IOs in bio cache. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:30 +08:00
Jaegeuk Kim	af7ffe2704	f2fs: fix conflict on page->private usage This patch fixes confilct on page->private value between f2fs_trace_pid and atomic page. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:30 +08:00
Jaegeuk Kim	7f78a775da	f2fs: flush bios to handle cp_error in put_super Sometimes, if cp_error is set, there remains under-writeback pages, resulting in kernel hang in put_super. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:29 +08:00
Jaegeuk Kim	568ca9d039	f2fs: wait on page's writeback in writepages path Likewise f2fs_write_cache_pages, let's do for node and meta pages too. Especially, for node blocks, we should do this before marking its fsync and dentry flags. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:29 +08:00
Chao Yu	c943a5d1aa	f2fs: speed up handling holes in fiemap This patch makes f2fs_map_blocks supporting returning next potential page offset which skips hole region in indirect tree of inode, and use it to speed up fiemap in handling big hole case. Test method: xfs_io -f /mnt/f2fs/file -c "pwrite 1099511627776 4096" time xfs_io -f /mnt/f2fs/file -c "fiemap -v" Before: time xfs_io -f /mnt/f2fs/file -c "fiemap -v" /mnt/f2fs/file: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..2147483647]: hole 2147483648 1: [2147483648..2147483655]: 81920..81927 8 0x1 real 3m3.518s user 0m0.000s sys 3m3.456s After: time xfs_io -f /mnt/f2fs/file -c "fiemap -v" /mnt/f2fs/file: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..2147483647]: hole 2147483648 1: [2147483648..2147483655]: 81920..81927 8 0x1 real 0m0.008s user 0m0.000s sys 0m0.008s Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:29 +08:00
Chao Yu	99300aead6	f2fs: introduce get_next_page_offset to speed up SEEK_DATA When seeking data in ->llseek, if we encounter a big hole which covers several dnode pages, we will try to seek data from index of page which is the first page of next dnode page, at most we could skip searching (ADDRS_PER_BLOCK - 1) pages. However it's still not efficient, because if our indirect/double-indirect pointer are NULL, there are no dnode page locate in the tree indirect/ double-indirect pointer point to, it's not necessary to search the whole region. This patch introduces get_next_page_offset to calculate next page offset based on current searching level and max searching level returned from get_dnode_of_data, with this, we could skip searching the entire area indirect or double-indirect node block is not exist. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:29 +08:00
Chao Yu	e8e6b247e9	f2fs: remove unneeded pointer conversion There are redundant pointer conversion in following call stack: - at position a, inode was been converted to f2fs_file_info. - at position b, f2fs_file_info was been converted to inode again. - truncate_blocks(inode,..) - fi = F2FS_I(inode) ---a - ADDRS_PER_PAGE(node_page, fi) - addrs_per_inode(fi) - inode = &fi->vfs_inode ---b - f2fs_has_inline_xattr(inode) - fi = F2FS_I(inode) - is_inode_flag_set(fi,..) In order to avoid unneeded conversion, alter ADDRS_PER_PAGE and addrs_per_inode to acept parameter with type of inode pointer. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:29 +08:00
Chao Yu	ea2c43f60e	f2fs: simplify __allocate_data_blocks This patch uses existing function f2fs_map_block to simplify implementation of __allocate_data_blocks. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:29 +08:00
Chao Yu	d65b55a172	f2fs: simplify f2fs_map_blocks In f2fs_map_blocks, we use duplicated codes to handle first block mapping and the following blocks mapping, it's unnecessary. This patch simplifies f2fs_map_blocks to avoid using copied codes. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:29 +08:00
Shuoran Liu	9876399dc3	f2fs: introduce lifetime write IO statistics This patch introduces lifetime IO write statistics exposed to the sysfs interface. The write IO amount is obtained from block layer, accumulated in the file system and stored in the hot node summary of checkpoint. Signed-off-by: Shuoran Liu <liushuoran@huawei.com> Signed-off-by: Pengyang Hou <houpengyang@huawei.com> [Jaegeuk Kim: add sysfs documentation] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:29 +08:00
Jaegeuk Kim	0007854c9f	f2fs: give scheduling point in shrinking path It needs to give a chance to be rescheduled while shrinking slab entries. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:29 +08:00
Hou Pengyang	faf1258aa7	f2fs: improve shrink performance of extent nodes On the worst case, we need to scan the whole radix tree and every rb-tree to free the victimed extent_nodes when shrinking. Pengyang initially introduced a victim_list to record the victimed extent_nodes, and free these extent_nodes by just scanning a list. Later, Chao Yu enhances the original patch to improve memory footprint by removing victim list. The policy of lru list shrinking becomes: 1) lock lru list's lock 2) trylock extent tree's lock 3) remove extent node from lru list 4) unlock lru list's lock 5) do shrink 6) repeat 1) to 5) Signed-off-by: Hou Pengyang <houpengyang@huawei.com> Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:29 +08:00
Jaegeuk Kim	64a2858911	f2fs: don't set cached_en if it will be freed If en has empty list pointer, it will be freed sooner, so we don't need to set cached_en with it. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:29 +08:00
Jaegeuk Kim	2f6563b2ef	f2fs: move extent_node list operations being coupled with rbtree operation This patch moves extent_node list operations to be handled together with its rbtree operations. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:28 +08:00
Hou Pengyang	285b5811ff	f2fs: reconstruct the code to free an extent_node There are three steps to free an extent node: 1) list_del_init, 2)__detach_extent_node, 3) kmem_cache_free In path f2fs_destroy_extent_tree, 1->2->3 to free a node, But in path f2fs_update_extent_tree_range, it is 2->1->3. This patch makes all the order to be: 1->2->3 It makes sense, since in the next patch, we import a victim list in the path shrink_extent_tree, we could check if the extent_node is in the victim list by checking the list_empty(). So it is necessary to put 1) first. Signed-off-by: Hou Pengyang <houpengyang@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:28 +08:00
Jaegeuk Kim	bfe1d80660	f2fs: use wq_has_sleeper for cp_wait wait_queue We need to use wq_has_sleeper including smp_mb to consider cp_wait concurrency. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:28 +08:00
Fan Li	81e6758a09	f2fs: avoid unnecessary search while finding victim in gc variable nsearched in get_victim_by_default() indicates the number of dirty segments we already checked. There are 2 problems about the way it updates: 1. When p.ofs_unit is greater than 1, the victim we find consists of multiple segments, possibly more than 1 dirty segment. But nsearched always increases by 1. 2. If segments have been found but not been chosen, nsearched won't increase. So even we have checked all dirty segments, nsearched may still less than p.max_search. All these problems could cause unnecessary search after all dirty segments have already been checked. Signed-off-by: Fan li <fanofcode.li@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:28 +08:00
Yunlei He	5c863d9060	f2fs: delete unnecessary wait for page writeback no need to wait inline file page writeback for no one use it, so this patch delete unnecessary wait. Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:28 +08:00
Jaegeuk Kim	fa43b1083f	f2fs: use wait_for_stable_page to avoid contention In write_begin, if storage supports stable_page, we don't need to wait for writeback to update its contents. This patch introduces to use wait_for_stable_page instead of wait_on_page_writeback. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:28 +08:00
Chao Yu	c8d6e41c9a	f2fs: enhance foreground GC If we configure section consist of multiple segments, foreground GC will do the garbage collection with following approach: for each segment in victim section blk_start_plug for each valid block in segment write out by OPU method submit bio cache <--- blk_finish_plug <--- There are two issue: 1) for most of the time, 'submit bio cache' will break the merging in current bio buffer from writes of next segments, making a smaller bio submitting. 2) block plug only cover IO submitting in one segment, which reduce opportunity of merging IOs in plug with multiple segments. So refactor the code as below structure to strive for biggest opportunity of merging IOs: blk_start_plug for each segment in victim section for each valid block in segment write out by OPU method submit bio cache blk_finish_plug Test method: 1. mkfs.f2fs -s 8 /dev/sdX 2. touch 32 files 3. write 2M data into each file 4. punch 1.5M data from offset 0 for each file 5. trigger foreground gc through ioctl Before patch, there are totoally 40 bios submitted. f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65536, size = 122880 f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65776, size = 122880 f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66016, size = 122880 f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66256, size = 122880 f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66496, size = 32768 ----repeat for 8 times After patch, there are totally 35 bios submitted. f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65536, size = 122880 ----repeat 34 times f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 73696, size = 16384 Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:28 +08:00
Jaegeuk Kim	1924b1d13d	f2fs: don't need to call set_page_dirty for io error If end_io gets an error, we don't need to set the page as dirty, since we already set f2fs_stop_checkpoint which will not flush any data. This will resolve the following warning. ====================================================== [ INFO: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected ] 4.4.0+ #9 Tainted: G O ------------------------------------------------------ xfs_io/26773 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: (&(&sbi->inode_lock[i])->rlock){+.+...}, at: [<ffffffffc025483f>] update_dirty_page+0x6f/0xd0 [f2fs] and this task is already holding: (&(&q->__queue_lock)->rlock){-.-.-.}, at: [<ffffffff81396ea2>] blk_queue_bio+0x422/0x490 which would create a new lock dependency: (&(&q->__queue_lock)->rlock){-.-.-.} -> (&(&sbi->inode_lock[i])->rlock){+.+...} Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Conflicts: fs/f2fs/data.c	2016-10-29 23:12:28 +08:00
Jaegeuk Kim	e038208489	f2fs: avoid needless sync_inode_page when reading inline_data In write_begin, if there is an inline_data, f2fs loads it into 0'th data page. Since it's the read path, we don't need to sync its inode page. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:28 +08:00
Jaegeuk Kim	87abb094a6	f2fs: don't need to sync node page at every time In write_end, we don't need to sync inode page at every time. Instead, we can expect f2fs_write_inode will update later. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:28 +08:00
Jaegeuk Kim	c060dfefbb	f2fs: avoid multiple node page writes due to inline_data The sceanrio is: 1. create fully node blocks 2. flush node blocks 3. write inline_data for all the node blocks again 4. flush node blocks redundantly So, this patch tries to flush inline_data when flushing node blocks. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:28 +08:00
Jaegeuk Kim	6130193402	f2fs: do f2fs_balance_fs when block is allocated We should consider data block allocation to trigger f2fs_balance_fs. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:28 +08:00
Jaegeuk Kim	54d7b9f302	f2fs: fix to overcome inline_data floods The scenario is: 1. create lots of node blocks 2. sync 3. write lots of inline_data -> got panic due to no free space In that case, we should flush node blocks when writing inline_data in #3, and trigger gc as well. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:28 +08:00
Jaegeuk Kim	1b914a30c4	f2fs: use writepages->lock for WB_SYNC_ALL If there are many writepages calls by multiple threads in background, we don't need to serialize to merge all the bios, since it's background. In such the case, it'd better to run writepages concurrently. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:27 +08:00
Jaegeuk Kim	bfa33c6b5a	f2fs: remove needless condition check This patch removes needless condition variable. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:27 +08:00
Chao Yu	58d92819d5	f2fs: correct search area in get_new_segment get_new_segment starts from current segment position, tries to search a free segment among its right neighbors locate in same section. But previously our search area was set as [current segment, max segment], which means we have to search to more bits in free_segmap bitmap for some worse cases. So here we correct the search area to [current segment, last segment in section] to avoid unnecessary searching. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:27 +08:00
Chao Yu	619df17d64	f2fs: export dirty_nats_ratio in sysfs This patch exports a new sysfs entry 'dirty_nat_ratio' to control threshold of dirty nat entries, if current ratio exceeds configured threshold, checkpoint will be triggered in f2fs_balance_fs_bg for flushing dirty nats. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Conflicts: Documentation/ABI/testing/sysfs-fs-f2fs	2016-10-29 23:12:27 +08:00
Chao Yu	496bb6b28a	f2fs: flush dirty nat entries when exceeding threshold When testing f2fs with xfstest, generic/251 is stuck for long time, the case uses below serials to obtain fresh released space in device, in order to prepare for following fstrim test. 1. rm -rf /mnt/dir 2. mkdir /mnt/dir/ 3. cp -axT `pwd`/ /mnt/dir/ 4. goto 1 During preparing step, all nat entries will be cached in nat cache, most of them are dirty entries with invalid blkaddr, which means nodes related to these entries have been truncated, and they could be reused after the dirty entries been checkpointed. However, there was no checkpoint been triggered, so nid allocators (e.g. mkdir, creat) will run into long journey of iterating all NAT pages, looking for free nids in alloc_nid->build_free_nids. Here, in f2fs_balance_fs_bg we give another chance to do checkpoint to flush nat entries for reusing them in free nid cache when dirty entry count exceeds 10% of max count. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:27 +08:00
Chao Yu	5d38e01ab0	f2fs: relocate is_merged_page Operations in is_merged_page is related to inner bio cache, move it to data.c. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Conflicts: fs/f2fs/segment.c	2016-10-29 23:12:27 +08:00
Eric Sandeen	52b81d3536	ext4: fix unjournaled inode bitmap modification commit `119c0d4460` changed ext4_new_inode() such that the inode bitmap was being modified outside a transaction, which could lead to corruption, and was discovered when journal_checksum found a bad checksum in the journal during log replay. Nix ran into this when using the journal_async_commit mount option, which enables journal checksumming. The ensuing journal replay failures due to the bad checksums led to filesystem corruption reported as the now infamous "Apparent serious progressive ext4 data corruption bug" [ Changed by tytso to only call ext4_journal_get_write_access() only when we're fairly certain that we're going to allocate the inode. ] I've tested this by mounting with journal_checksum and running fsstress then dropping power; I've also tested by hacking DM to create snapshots w/o first quiescing, which allows me to test journal replay repeatedly w/o actually power-cycling the box. Without the patch I hit a journal checksum error every time. With this fix it survives many iterations. Change-Id: Id679b89e870c62f83476f836650ac9068908a84e Reported-by: Nix <nix@esperi.org.uk> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org	2016-10-29 23:12:26 +08:00
Jaegeuk Kim	db748062ad	f2fs: should unset atomic flag after successful commit If there is an error during commit, we should keep the flag in order to abort it. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:25 +08:00
Jaegeuk Kim	7a2dc25c86	f2fs: fix wrong memory condition check This patch fixes wrong decision for avaliable_free_memory. The return valus is already set as false, so we should consider true condition below only. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Conflicts: fs/f2fs/node.c	2016-10-29 23:12:25 +08:00
Jaegeuk Kim	85abb2f17e	f2fs: monitor the number of background checkpoint This patch adds to show the number of background checkpoint. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:25 +08:00
Jaegeuk Kim	389d2eece6	f2fs: detect idle time depending on user behavior This patch adds last time that user requested filesystem operations. This information is used to detect whether system is idle or not later. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Conflicts: Documentation/ABI/testing/sysfs-fs-f2fs	2016-10-29 23:12:25 +08:00
Jaegeuk Kim	b074a0de9c	f2fs: introduce time and interval facility This patch adds time and interval arrays to store some timing variables. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:25 +08:00
Chao Yu	f1cbf37ac2	f2fs: skip releasing nodes in chindless extent tree If there are no nodes in extent tree, let's skip releasing step to avoid any overhead of grabbing/releasing extent tree lock. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:25 +08:00
Chao Yu	157a5e1f6a	f2fs: use atomic type for node count in extent tree 1. rename field in struct extent_tree from count to node_cnt for readability. 2. alter to use atomic type for node_cnt. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:25 +08:00
Chao Yu	4f11a62b68	f2fs: recognize encrypted data in f2fs_fiemap This patch fixes to teach f2fs_fiemap to recognize encrypted data. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:25 +08:00
Jaegeuk Kim	baac2dde6b	f2fs: clean up f2fs_balance_fs This patch adds one parameter to clean up all the callers of f2fs_balance_fs. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Conflicts: fs/f2fs/namei.c	2016-10-29 23:12:25 +08:00
Jaegeuk Kim	b5199c6522	f2fs: remove redundant calls This patch removes redundant calls. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:25 +08:00
Jaegeuk Kim	d5dd3ee188	f2fs: avoid unnecessary f2fs_balance_fs calls Only when node page is newly dirtied, it needs to check whether we need to do f2fs_gc. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:25 +08:00
Jaegeuk Kim	0788a58411	f2fs: check the page status filled from disk After reading a page, we need to check whether there is any error. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:24 +08:00
Chao Yu	c25b54f508	f2fs: introduce __get_node_page to reuse common code There are duplicated code in between get_node_page and get_node_page_ra, introduce __get_node_page to includes common parts of these two, and export get_node_page and get_node_page_ra by reusing __get_node_page. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Conflicts: fs/f2fs/node.c	2016-10-29 23:12:24 +08:00
Chao Yu	01fbe1c8b3	f2fs: check node id earily when readaheading node page Add node id check in ra_node_page and get_node_page_ra like get_node_page. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:24 +08:00
Fan Li	3cb1aba93a	f2fs: read isize while holding i_mutex in fiemap make sure the isize we read doesn't change during the process. Signed-off-by: Fan li <fanofcode.li@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:24 +08:00
Jaegeuk Kim	83ef8a9fcd	Revert "f2fs: check the node block address of newly allocated nid" Original issue is fixed by: f2fs: cover more area with nat_tree_lock This reverts commit 24928634f81b1592e83b37dcd89ed45c28f12feb.	2016-10-29 23:12:24 +08:00
Jaegeuk Kim	e694adb9e1	f2fs: cover more area with nat_tree_lock There was a subtle bug on nat cache management which incurs wrong nid allocation or wrong block addresses when try_to_free_nats is triggered heavily. This patch enlarges the previous coverage of nat_tree_lock to avoid data race. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:24 +08:00
Chao Yu	ed22a357c2	f2fs: introduce max_file_blocks in sbi Introduce max_file_blocks in sbi to store max block index of file in f2fs, it could be used to avoid unneeded calculation of max block index in runtime. Signed-off-by: Chao Yu <chao2.yu@samsung.com> [Jaegeuk Kim: fix overflow of sbi->max_file_blocks] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:24 +08:00
Chao Yu	1b4b58672a	f2fs crypto: check CONFIG_F2FS_FS_XATTR for encrypted symlink Add missed CONFIG_F2FS_FS_XATTR for encrypted symlink inode in order to avoid unneeded registry of ->{get,set,remove}xattr. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:24 +08:00
Jaegeuk Kim	b167c6ac4c	f2fs: introduce zombie list for fast shrinking extent trees This patch removes refcount, and instead, adds zombie_list to shrink directly without radix tree traverse. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:24 +08:00
Jaegeuk Kim	2dfff0e691	f2fs: monitor zombie_tree count This patch adds an entry to show the number of zombie extent_tree. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:24 +08:00
Jaegeuk Kim	ad7ad27045	f2fs: use IPU for fdatasync This patch fixes missing IPU condition when fdatasync is called. With this patch, fdatasync is able to avoid additional node writes for recovery. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:24 +08:00
Jaegeuk Kim	a4aa034e8f	f2fs: write pending bios when cp_error is set When testing ioc_shutdown, put_super is able to be hanged by waiting for writebacking pages as follows. INFO: task umount:2723 blocked for more than 120 seconds. Tainted: G O 4.4.0-rc3+ #8 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. umount D ffff88000859f9d8 0 2723 2110 0x00000000 ffff88000859f9d8 0000000000000000 0000000000000000 ffffffff81e11540 ffff880078c225c0 ffff8800085a0000 ffff88007fc17440 7fffffffffffffff ffffffff818239f0 ffff88000859fb48 ffff88000859f9f0 ffffffff8182310c Call Trace: [<ffffffff818239f0>] ? bit_wait+0x50/0x50 [<ffffffff8182310c>] schedule+0x3c/0x90 [<ffffffff81827fb9>] schedule_timeout+0x2d9/0x430 [<ffffffff810e0f8f>] ? mark_held_locks+0x6f/0xa0 [<ffffffff8111614d>] ? ktime_get+0x7d/0x140 [<ffffffff818239f0>] ? bit_wait+0x50/0x50 [<ffffffff8106a655>] ? kvm_clock_get_cycles+0x25/0x30 [<ffffffff8111617c>] ? ktime_get+0xac/0x140 [<ffffffff818239f0>] ? bit_wait+0x50/0x50 [<ffffffff81822564>] io_schedule_timeout+0xa4/0x110 [<ffffffff81823a25>] bit_wait_io+0x35/0x50 [<ffffffff818235bd>] __wait_on_bit+0x5d/0x90 [<ffffffff811b9e8b>] wait_on_page_bit+0xcb/0xf0 [<ffffffff810d5f90>] ? autoremove_wake_function+0x40/0x40 [<ffffffff811cf84c>] truncate_inode_pages_range+0x4bc/0x840 [<ffffffff811cfc3d>] truncate_inode_pages_final+0x4d/0x60 [<ffffffffc023ced5>] f2fs_evict_inode+0x75/0x400 [f2fs] [<ffffffff812639bc>] evict+0xbc/0x190 [<ffffffff81263d19>] iput+0x229/0x2c0 [<ffffffffc0241885>] f2fs_put_super+0x105/0x1a0 [f2fs] [<ffffffff8124756a>] generic_shutdown_super+0x6a/0xf0 [<ffffffff812478f7>] kill_block_super+0x27/0x70 [<ffffffffc0241290>] kill_f2fs_super+0x20/0x30 [f2fs] [<ffffffff81247b03>] deactivate_locked_super+0x43/0x70 [<ffffffff81247f4c>] deactivate_super+0x5c/0x60 [<ffffffff81268d2f>] cleanup_mnt+0x3f/0x90 [<ffffffff81268dc2>] __cleanup_mnt+0x12/0x20 [<ffffffff810ac463>] task_work_run+0x73/0xa0 [<ffffffff810032ac>] exit_to_usermode_loop+0xcc/0xd0 [<ffffffff81003e7c>] syscall_return_slowpath+0xcc/0xe0 [<ffffffff81829ea2>] int_ret_from_sys_call+0x25/0x9f Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:24 +08:00
Jaegeuk Kim	e1496a4648	f2fs: remove f2fs_bug_on in terms of max_depth There is no report on this bug_on case, but if malicious attacker changed this field intentionally, we can just reset it as a MAX value. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:24 +08:00
Jaegeuk Kim	30a9582a1d	f2fs: fix f2fs_ioc_abort_volatile_write There are two rules to handle aborting volatile or atomic writes. 1. drop atomic writes - we don't need to keep any stale db data. 2. write journal data - we should keep the journal data with fsync for db recovery. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:23 +08:00
Chao Yu	899b32588e	f2fs: fix to skip recovering dot dentries in a readonly fs If filesystem is readonly, leave user message info instead of recovering inline dot inode. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:23 +08:00
Jaegeuk Kim	3b07fd693d	f2fs: load largest extent all the time Otherwise, we can get mismatched largest extent information. One example is: 1. mount f2fs w/ extent_cache 2. make a small extent 3. umount 4. mount f2fs w/o extent_cache 5. update the largest extent 6. umount 7. mount f2fs w/ extent_cache 8. get the old extent made by #2 Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:23 +08:00
Jaegeuk Kim	a15f311013	f2fs: use i_size_read to get i_size We need to use i_size_read() to get inode->i_size. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Conflicts: fs/f2fs/data.c	2016-10-29 23:12:23 +08:00
Jaegeuk Kim	0978d2a646	f2fs: early check broken symlink length in the encrypted case If link is broken, its len is zero, and we don't need to move forward. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:23 +08:00
Chao Yu	71568f2f05	f2fs: clean up f2fs_ioc_write_checkpoint Use f2fs_sync_fs to clean up codes in f2fs_ioc_write_checkpoint. Signed-off-by: Chao Yu <chao2.yu@samsung.com> [Jaegeuk Kim: remove unused err variable] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:23 +08:00
Yunlei He	e40bd22303	f2fs: add a max block check for get_data_block_bmap This patch adds a max block check for get_data_block_bmap. Trinity test program will send a block number as parameter into ioctl_fibmap, which will be used in get_node_path(), when the block number large than f2fs max blocks, it will trigger kernel bug. Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Xue Liu <liuxueliu.liu@huawei.com> [Jaegeuk Kim: fix missing condition, pointed by Chao Yu] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:23 +08:00
Fan Li	e465ea8aa2	f2fs: fix bugs and simplify codes of f2fs_fiemap fix bugs: 1. len could be updated incorrectly when start+len is beyond isize. 2. If there is a hole consisting of more than two blocks, it could fail to add FIEMAP_EXTENT_LAST flag for the last extent. 3. If there is an extent beyond isize, when we search extents in a range that ends at isize, it will also return the extent beyond isize, which is outside the range. Signed-off-by: Fan li <fanofcode.li@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:23 +08:00
Chao Yu	562f92afb9	f2fs: let user being aware of IO error Sometimes we keep dumb when IO error occur in lower layer device, so user will not receive any error return value for some operation, but actually, the operation did not succeed. This sould be avoided, so this patch reports such kind of error to user. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Conflicts: fs/f2fs/data.c	2016-10-29 23:12:23 +08:00
Chao Yu	ce1ff1416d	f2fs: add missing f2fs_balance_fs in __recover_dot_dentries __recover_do_dentries will try to grab free space in storage, so fix to add missing f2fs_balance_fs here. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:23 +08:00
Jaegeuk Kim	73efe696d4	f2fs: declare static function The __f2fs_commit_super is static. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-29 23:12:23 +08:00

... 3 4 5 6 7 ...

27882 commits