android_kernel_google_msm

mirror of https://github.com/followmsi/android_kernel_google_msm.git synced 2024-11-06 23:17:41 +00:00

Author	SHA1	Message	Date
Jan Kara	b1a8c88774	BACKPORT: posix_acl: Clear SGID bit when setting file permissions [Partially applied during f2fs inclusion, changes now aligned to upstream] (cherry pick from commit 073931017b49d9458aa351605b43a7e34598caef) When file permissions are modified via chmod(2) and the user is not in the owning group or capable of CAP_FSETID, the setgid bit is cleared in inode_change_ok(). Setting a POSIX ACL via setxattr(2) sets the file permissions as well as the new ACL, but doesn't clear the setgid bit in a similar way; this allows to bypass the check in chmod(2). Fix that. NB: conflicts resolution included extending the change to all visible users of the near deprecated function posix_acl_equiv_mode replaced with posix_acl_update_mode. We did not resolve the ACL leak in this CL, require additional upstream fixes. References: CVE-2016-7097 Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Bug: 32458736 [haggertk]: Backport to 3.4/msm8974 * convert use of capable_wrt_inode_uidgid to capable Change-Id: I19591ad452cc825ac282b3cfd2daaa72aa9a1ac1	2017-06-26 20:26:17 +03:00
Eryu Guan	e0dd30eb33	ext4: validate s_first_meta_bg at mount time Ralf Spenneberg reported that he hit a kernel crash when mounting a modified ext4 image. And it turns out that kernel crashed when calculating fs overhead (ext4_calculate_overhead()), this is because the image has very large s_first_meta_bg (debug code shows it's 842150400), and ext4 overruns the memory in count_overhead() when setting bitmap buffer, which is PAGE_SIZE. ext4_calculate_overhead(): buf = get_zeroed_page(GFP_NOFS); <=== PAGE_SIZE buffer blks = count_overhead(sb, i, buf); count_overhead(): for (j = ext4_bg_num_gdb(sb, grp); j > 0; j--) { <=== j = 842150400 ext4_set_bit(EXT4_B2C(sbi, s++), buf); <=== buffer overrun count++; } This can be reproduced easily for me by this script: #!/bin/bash rm -f fs.img mkdir -p /mnt/ext4 fallocate -l 16M fs.img mke2fs -t ext4 -O bigalloc,meta_bg,^resize_inode -F fs.img debugfs -w -R "ssv first_meta_bg 842150400" fs.img mount -o loop fs.img /mnt/ext4 Fix it by validating s_first_meta_bg first at mount time, and refusing to mount if its value exceeds the largest possible meta_bg number. Reported-by: Ralf Spenneberg <ralf@os-t.de> Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca> (cherry picked from commit 3a4b77cd47bb837b8557595ec7425f281f2ca1fe) (minor backport adapted from cf851ad35fd1e9c7b8ed00741eca613bc1a9c8c8) Change-Id: If183ad4a873705c9a0312087577705298b3586fe	2017-03-03 13:40:24 -07:00
Nick Desaulniers	c3f8d15467	fs: ext4: disable support for fallocate FALLOC_FL_PUNCH_HOLE Bug: 28760453 Change-Id: I019c2de559db9e4b95860ab852211b456d78c4ca Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>	2016-10-31 23:29:10 +11:00
Theodore Ts'o	2da70f4e34	ext4: avoid hang when mounting non-journal filesystems with orphan list When trying to mount a file system which does not contain a journal, but which does have a orphan list containing an inode which needs to be truncated, the mount call with hang forever in ext4_orphan_cleanup() because ext4_orphan_del() will return immediately without removing the inode from the orphan list, leading to an uninterruptible loop in kernel code which will busy out one of the CPU's on the system. This can be trivially reproduced by trying to mount the file system found in tests/f_orphan_extents_inode/image.gz from the e2fsprogs source tree. If a malicious user were to put this on a USB stick, and mount it on a Linux desktop which has automatic mounts enabled, this could be considered a potential denial of service attack. (Not a big deal in practice, but professional paranoids worry about such things, and have even been known to allocate CVE numbers for such problems.) Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Zheng Liu <wenqing.lz@taobao.com> Cc: stable@vger.kernel.org	2016-10-29 23:12:34 +08:00
Anatol Pomozov	012f862b13	ext4: make orphan functions be no-op in no-journal mode Instead of checking whether the handle is valid, we check if journal is enabled. This avoids taking the s_orphan_lock mutex in all cases when there is no journal in use, including the error paths where ext4_orphan_del() is called with a handle set to NULL. Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2016-10-29 23:12:34 +08:00
Eric Sandeen	52b81d3536	ext4: fix unjournaled inode bitmap modification commit `119c0d4460` changed ext4_new_inode() such that the inode bitmap was being modified outside a transaction, which could lead to corruption, and was discovered when journal_checksum found a bad checksum in the journal during log replay. Nix ran into this when using the journal_async_commit mount option, which enables journal checksumming. The ensuing journal replay failures due to the bad checksums led to filesystem corruption reported as the now infamous "Apparent serious progressive ext4 data corruption bug" [ Changed by tytso to only call ext4_journal_get_write_access() only when we're fairly certain that we're going to allocate the inode. ] I've tested this by mounting with journal_checksum and running fsstress then dropping power; I've also tested by hacking DM to create snapshots w/o first quiescing, which allows me to test journal replay repeatedly w/o actually power-cycling the box. Without the patch I hit a journal checksum error every time. With this fix it survives many iterations. Change-Id: Id679b89e870c62f83476f836650ac9068908a84e Reported-by: Nix <nix@esperi.org.uk> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org	2016-10-29 23:12:26 +08:00
johnny1_lin	3db674a036	ext4: protect group inode free counting with group lock Patch from Ken that can solve fsstress failing issue commit b9fa7bb8ff207eeb27d2e0ed45b8c3acf1a7af8c Author: Tao Ma <boyu.mt@taobao.com> Date: Mon May 28 18:20:59 2012 -0400 ext4: protect group inode free counting with group lock Now when we set the group inode free count, we don't have a proper group lock so that multiple threads may decrease the inode free count at the same time. And e2fsck will complain something like: Free inodes count wrong for group #1 (1, counted=0). Fix? no Free inodes count wrong for group #2 (3, counted=0). Fix? no Directories count wrong for group #2 (780, counted=779). Fix? no Free inodes count wrong for group #3 (2272, counted=2273). Fix? no So this patch try to protect it with the ext4_lock_group. btw, it is found by xfstests test case 269 and the volume is mkfsed with the parameter "-O ^resize_inode,^uninit_bg,extent,meta_bg,flex_bg,ext_attr" and I have run it 100 times and the error in e2fsck doesn't show up again. Change-Id: Iba773843728759e1d64d4ff57765288eb5977665 Reviewed-on: http://mcrd1-5.corpnet.asus/code-review/master/67871 Reviewed-by: Lin Johnny1 <Johnny1_Lin@asus.com> Tested-by: Lin Johnny1 <Johnny1_Lin@asus.com> Reviewed-by: Sam hblee <Sam_hblee@asus.com>	2013-04-18 16:07:55 -07:00
Theodore Ts'o	d226614eb3	ext4: fix potential deadlock in ext4_nonda_switch() In ext4_nonda_switch(), if the file system is getting full we used to call writeback_inodes_sb_if_idle(). The problem is that we can be holding i_mutex already, and this causes a potential deadlock when writeback_inodes_sb_if_idle() when it tries to take s_umount. (See lockdep output below). As it turns out we don't need need to hold s_umount; the fact that we are in the middle of the write(2) system call will keep the superblock pinned. Unfortunately writeback_inodes_sb() checks to make sure s_umount is taken, and the VFS uses a different mechanism for making sure the file system doesn't get unmounted out from under us. The simplest way of dealing with this is to just simply grab s_umount using a trylock, and skip kicking the writeback flusher thread in the very unlikely case that we can't take a read lock on s_umount without blocking. Also, we now check the cirteria for kicking the writeback thread before we decide to whether to fall back to non-delayed writeback, so if there are any outstanding delayed allocation writes, we try to get them resolved as soon as possible. [ INFO: possible circular locking dependency detected ] 3.6.0-rc1-00042-gce894ca #367 Not tainted ------------------------------------------------------- dd/8298 is trying to acquire lock: (&type->s_umount_key#18){++++..}, at: [<c02277d4>] writeback_inodes_sb_if_idle+0x28/0x46 but task is already holding lock: (&sb->s_type->i_mutex_key#8){+.+...}, at: [<c01ddcce>] generic_file_aio_write+0x5f/0xd3 which lock already depends on the new lock. 2 locks held by dd/8298: #0: (sb_writers#2){.+.+.+}, at: [<c01ddcc5>] generic_file_aio_write+0x56/0xd3 #1: (&sb->s_type->i_mutex_key#8){+.+...}, at: [<c01ddcce>] generic_file_aio_write+0x5f/0xd3 stack backtrace: Pid: 8298, comm: dd Not tainted 3.6.0-rc1-00042-gce894ca #367 Call Trace: [<c015b79c>] ? console_unlock+0x345/0x372 [<c06d62a1>] print_circular_bug+0x190/0x19d [<c019906c>] __lock_acquire+0x86d/0xb6c [<c01999db>] ? mark_held_locks+0x5c/0x7b [<c0199724>] lock_acquire+0x66/0xb9 [<c02277d4>] ? writeback_inodes_sb_if_idle+0x28/0x46 [<c06db935>] down_read+0x28/0x58 [<c02277d4>] ? writeback_inodes_sb_if_idle+0x28/0x46 [<c02277d4>] writeback_inodes_sb_if_idle+0x28/0x46 [<c026f3b2>] ext4_nonda_switch+0xe1/0xf4 [<c0271ece>] ext4_da_write_begin+0x27/0x193 [<c01dcdb0>] generic_file_buffered_write+0xc8/0x1bb [<c01ddc47>] __generic_file_aio_write+0x1dd/0x205 [<c01ddce7>] generic_file_aio_write+0x78/0xd3 [<c026d336>] ext4_file_write+0x480/0x4a6 [<c0198c1d>] ? __lock_acquire+0x41e/0xb6c [<c0180944>] ? sched_clock_cpu+0x11a/0x13e [<c01967e9>] ? trace_hardirqs_off+0xb/0xd [<c018099f>] ? local_clock+0x37/0x4e [<c0209f2c>] do_sync_write+0x67/0x9d [<c0209ec5>] ? wait_on_retry_sync_kiocb+0x44/0x44 [<c020a7b9>] vfs_write+0x7b/0xe6 [<c020a9a6>] sys_write+0x3b/0x64 [<c06dd4bd>] syscall_call+0x7/0xb Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org	2013-03-15 17:09:42 -07:00
Linus Torvalds	95f7147274	Ext4 bug fixes for 3.4 These are two low-risk bug fixes for ext4, fixing a compile warning and a potential deadlock. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iQIcBAABCAAGBQJPlgZ8AAoJENNvdpvBGATwewkP/ioo2U05O4tzmt05+HICw1ZK vh1x6oaO3bUMa21pKBzS60rDc+EDu61E+bjVrsasOmom8DZyOP92SiwaDnIsKn6p JBSNwzIOPmuPflEY3tnOsnOZ1umZcB16uhki1Rk1HE0nRPdKiyKJKZnbSzmUGWUW gJwHbHddxZKTmDrEy4CxfbwwKKVm2SQUO5crLohFst4JsXc1h6muEfkcAZvCfZ68 1PQIkTkJUXArQuTuxzP89r7L8tqHJv4iOz+PT0FlluGWvgJUWIOVvjdJfPuQTmLi UNzvtoQxuxjdZuCK/D16kNTkOEPzOhMlNW1djAntdCQohHIJG0Hd5bFju9bybSLz 838sTCEFxRS7rdBEXiksWsPCVDz/QVnPft0RG9jqXd6dRPFr/XJ1rAeDTjW2vmWw ZO28p99aolA5At02AlSf9IgMIME0gKejnvpRo703UW456BlFIXPK3e/nbtE7Eb5A HcZhvIwncWE4cbq2/AboielPSnyx6Z3SJS0hBIQ2wG40xcL/jxYL7K2/trkUr2KH H3/4RsrSlLDXqHRJ4cVW75zKMgyNvc+60HDlAxE62LqKFR7K93hdlHpnkySy/1St FaIiipH8Tmt+u6tqn6rlR82vRxd/dkLgQMpCWm4Et4THXlvisZkbxaDXrEGx79qg v8eEdmHeJuLcQesm9TrS =Ygid -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 bug fixes from Ted Ts'o: "These are two low-risk bug fixes for ext4, fixing a compile warning and a potential deadlock." * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: super.c: unused variable warning without CONFIG_QUOTA jbd2: use GFP_NOFS for blkdev_issue_flush	2012-04-23 19:52:00 -07:00
Eldad Zack	db7e5c668e	super.c: unused variable warning without CONFIG_QUOTA sb info is only checked with quota support. fs/ext4/super.c: In function ‘parse_options’: fs/ext4/super.c:1600:23: warning: unused variable ‘sbi’ [-Wunused-variable] Signed-off-by: Eldad Zack <eldad@fogrefinery.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2012-04-23 21:44:41 -04:00
Linus Torvalds	592fe89806	Ext4 regression fixes for 3.4 This fixes a scalability problem reported by Andi Kleen and Tim Chen; they were quite secretive about the precise nature of their workload, but they later admitted that it only showed up when they were using a large sparse file, so the amount of data I/O that was needed was close to zero. I'm not sure how realistic this is and it's only a regression if you consider changes made since 2.6.39 to be a "regression" vis-a-vis the policy regarding post-merge window bug fixes, but Linus agreed it was worth fixing, so I'm including it in this pull request. This also fixes the journalled quota mount options, which I accidentally broke while I was cleaning up the mount option handling. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iQIcBAABCAAGBQJPjbs2AAoJENNvdpvBGATw/4UQAOMsxRlzbMnOAmohIBmesJiB nTPX41dNw0QCXstMFjvSyRbUJI0NZPGg6ZDhbtoQ/c42/izZOVNd7gh7QQYHRCt2 Oqh6WS159P04ixcAe8NgCm5B5AV/C5Er/vSOUZENBaBo430vvrZasifsyUgQnx+b PxXlYsfPVzaQVeSxCu/68OeiBRLcLwmKZ6MicOaWt9GCNoCsWzgU+/LNskYuscPI 841yQL6/BE7redU2E9qoEn/xjxx57hJOj2iiIAuqGPmNmRQqq3VtvTqNZHldNBLp Hz5mB3zSZsPg0uwvp+OxrnpP37NQCCn1L64UFIXxqGF47mcCYw7schAoGBtqwGQS neGUfkzG4beKk7kojyDawvnrrvVn4iCMaIkR1ZjzjPOk+QBPagckrS2nmuObbYzJ l4lmHq1v8nOh0clxqJPioNp5/Y13sbpEOY4tAa6sLpzKLKXF330RNuwwrKKHB7zo ZhCvSwVEjmxacgPCPhFJnR3fxtjoXWR8WvJs7H+gZB/QaC8hjEYw6xvvrkw8mAiS RNe3cYdYAz6kOJdtJrJaMzp/1CYdHydf+0WvNYCQ/d1poPr7uU5wQY61hdm2gFxh owsbVAOiEFjZWJHqrRRdcg2irTpINJTS3iRe4g/ltcvYzzRSeOcZNWvkKFspq3i8 EUMHRBxLzPMRa+gU6Unm =jb3f -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 regression fixes from Ted Ts'o: "This fixes a scalability problem reported by Andi Kleen and Tim Chen; they were quite secretive about the precise nature of their workload, but they later admitted that it only showed up when they were using a large sparse file, so the amount of data I/O that was needed was close to zero. I'm not sure how realistic this is and it's only a regression if you consider changes made since 2.6.39 to be a "regression" vis-a-vis the policy regarding post-merge window bug fixes, but Linus agreed it was worth fixing, so I'm including it in this pull request. This also fixes the journalled quota mount options, which I accidentally broke while I was cleaning up the mount option handling." * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix handling of journalled quota options ext4: address scalability issue by removing extent cache statistics	2012-04-17 13:30:34 -07:00
Theodore Ts'o	57f73c2c89	ext4: fix handling of journalled quota options Commit `26092bf5` broke handling of journalled quota mount options by trying to parse argument of every mount option as a number. Fix this by dealing with the quota options before we call match_int(). Thanks to Jan Kara for discovering this regression. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>	2012-04-16 18:55:26 -04:00
Theodore Ts'o	9cd70b347e	ext4: address scalability issue by removing extent cache statistics Andi Kleen and Tim Chen have reported that under certain circumstances the extent cache statistics are causing scalability problems due to cache line bounces. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org	2012-04-16 12:16:20 -04:00
Al Viro	af1584f570	ext4: fix endianness breakage in ext4_split_extent_at() ->ee_len is __le16, so assigning cpu_to_le32() to it is going to do Bad Things(tm) on big-endian hosts... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-04-13 10:12:08 -04:00
Linus Torvalds	6268b325c3	Revert "ext4: don't release page refs in ext4_end_bio()" This reverts commit `b43d17f319`. Dave Jones reports that it causes lockups on his laptop, and his debug output showed a lot of processes hung waiting for page_writeback (or more commonly - processes hung waiting for a lock that was held during that writeback wait). The page_writeback hint made Ted suggest that Dave look at this commit, and Dave verified that reverting it makes his problems go away. Ted says: "That commit fixes a race which is seen when you write into fallocated (and hence uninitialized) disk blocks under very heavy memory pressure. Furthermore, although theoretically it could trigger under normal direct I/O writes, it only seems to trigger if you are issuing a huge number of AIO writes, such that a just-written page can get evicted from memory, and then read back into memory, before the workqueue has a chance to update the extent tree. This race has been around for a little over a year, and no one noticed until two months ago; it only happens under fairly exotic conditions, and in fact even after trying very hard to create a simple repro under lab conditions, we could only reproduce the problem and confirm the fix on production servers running MySQL on very fast PCIe-attached flash devices. Given that Dave was able to hit this problem pretty quickly, if we confirm that this commit is at fault, the only reasonable thing to do is to revert it IMO." Reported-and-tested-by: Dave Jones <davej@redhat.com> Acked-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-03-29 17:00:56 -07:00
Linus Torvalds	71db34fc43	Merge branch 'for-3.4' of git://linux-nfs.org/~bfields/linux Pull nfsd changes from Bruce Fields: Highlights: - Benny Halevy and Tigran Mkrtchyan implemented some more 4.1 features, moving us closer to a complete 4.1 implementation. - Bernd Schubert fixed a long-standing problem with readdir cookies on ext2/3/4. - Jeff Layton performed a long-overdue overhaul of the server reboot recovery code which will allow us to deprecate the current code (a rather unusual user of the vfs), and give us some needed flexibility for further improvements. - Like the client, we now support numeric uid's and gid's in the auth_sys case, allowing easier upgrades from NFSv2/v3 to v4.x. Plus miscellaneous bugfixes and cleanup. Thanks to everyone! There are also some delegation fixes waiting on vfs review that I suppose will have to wait for 3.5. With that done I think we'll finally turn off the "EXPERIMENTAL" dependency for v4 (though that's mostly symbolic as it's been on by default in distro's for a while). And the list of 4.1 todo's should be achievable for 3.5 as well: http://wiki.linux-nfs.org/wiki/index.php/Server_4.0_and_4.1_issues though we may still want a bit more experience with it before turning it on by default. * 'for-3.4' of git://linux-nfs.org/~bfields/linux: (55 commits) nfsd: only register cld pipe notifier when CONFIG_NFSD_V4 is enabled nfsd4: use auth_unix unconditionally on backchannel nfsd: fix NULL pointer dereference in cld_pipe_downcall nfsd4: memory corruption in numeric_name_to_id() sunrpc: skip portmap calls on sessions backchannel nfsd4: allow numeric idmapping nfsd: don't allow legacy client tracker init for anything but init_net nfsd: add notifier to handle mount/unmount of rpc_pipefs sb nfsd: add the infrastructure to handle the cld upcall nfsd: add a header describing upcall to nfsdcld nfsd: add a per-net-namespace struct for nfsd sunrpc: create nfsd dir in rpc_pipefs nfsd: add nfsd4_client_tracking_ops struct and a way to set it nfsd: convert nfs4_client->cl_cb_flags to a generic flags field NFSD: Fix nfs4_verifier memory alignment NFSD: Fix warnings when NFSD_DEBUG is not defined nfsd: vfs_llseek() with 32 or 64 bit offsets (hashes) nfsd: rename 'int access' to 'int may_flags' in nfsd_open() ext4: return 32/64-bit dir name hash according to usage type fs: add new FMODE flags: FMODE_32bithash and FMODE_64bithash ...	2012-03-29 14:53:25 -07:00
Linus Torvalds	69e1aaddd6	Ext4 commits for 3.3 merge window; mostly cleanups and bug fixes The changes to export dirty_writeback_interval are from Artem's s_dirt cleanup patch series. The same is true of the change to remove the s_dirt helper functions which never got used by anyone in-tree. I've run these changes by Al Viro, and am carrying them so that Artem can more easily fix up the rest of the file systems during the next merge window. (Originally we had hopped to remove the use of s_dirt from ext4 during this merge window, but his patches had some bugs, so I ultimately ended dropping them from the ext4 tree.) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iQIcBAABCAAGBQJPb39rAAoJENNvdpvBGATwVz8P/3V1NqSsk20VJOLbmEE45GxL GDzQJ6OsFG0UiQk6ISSrSdwxfav/KTCGySsU9UtAoOdPcBwnnsf8S7wc6OggwwuC hBFGwwFzk6YSQaZ58sUxWRGeOJuP/FPem6Id6buC4DQ1KIcznP/hEEgEnh/ir4Ec vrsfexY93TR8BE2Mi23v2epDVLU0B6bY/w9nDqbTXif3xN/gh/ypoHHouuM6Bs2n TyWHOwD15NwfnvRHd8PfDDqQM/D29x3QI0FMrWj9McpwIz4d4cBfhN4LQ/G+yLDY izv5DM10GbinwHPrsOTGVAW3KIdSS9rP3jCJGVuOrJZ9ufGXosvHuIYVhI7J3SBK JhBu6QEsN1IsvlVYpz9q8mqVKaDXQLsz2eaTw+i4yfmyOk1kOX7nIEOxYFF78G+V Of/W1SpIpJQaXvLHRcDj9fDj0fZTciUZA8v7/HOFS+co2dzIl0iZbcfBFp0/56RY sWdQoeRlx1ciVDPR+w2TQO5w3VWQw1gT5aqux0NiPj0XFoiUHScxgNGAYbqENMQw v9chvyDMlorqj0rF/Vey5SssgEDi7MTdYuYTi4YyMqr7pcvOJaO85pf+wH9g2eKW XhW33PhPGuwCJDP5Pg8Y0Z2Hp/Q3DCqhLqhGfTyAs/NG9+hR4wgp3VWb8CUqhA1t C/yzNeOYqScAefCzQx2V =+9zk -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates for 3.4 from Ted Ts'o: "Ext4 commits for 3.3 merge window; mostly cleanups and bug fixes The changes to export dirty_writeback_interval are from Artem's s_dirt cleanup patch series. The same is true of the change to remove the s_dirt helper functions which never got used by anyone in-tree. I've run these changes by Al Viro, and am carrying them so that Artem can more easily fix up the rest of the file systems during the next merge window. (Originally we had hopped to remove the use of s_dirt from ext4 during this merge window, but his patches had some bugs, so I ultimately ended dropping them from the ext4 tree.)" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (66 commits) vfs: remove unused superblock helpers mm: export dirty_writeback_interval ext4: remove useless s_dirt assignment ext4: write superblock only once on unmount ext4: do not mark superblock as dirty unnecessarily ext4: correct ext4_punch_hole return codes ext4: remove restrictive checks for EOFBLOCKS_FL ext4: always set then trimmed blocks count into len ext4: fix trimmed block count accunting ext4: fix start and len arguments handling in ext4_trim_fs() ext4: update s_free_{inodes,blocks}_count during online resize ext4: change some printk() calls to use ext4_msg() instead ext4: avoid output message interleaving in ext4_error_<foo>() ext4: remove trailing newlines from ext4_msg() and ext4_error() messages ext4: add no_printk argument validation, fix fallout ext4: remove redundant "EXT4-fs: " from uses of ext4_msg ext4: give more helpful error message in ext4_ext_rm_leaf() ext4: remove unused code from ext4_ext_map_blocks() ext4: rewrite punch hole to use ext4_ext_remove_space() jbd2: cleanup journal tail after transaction commit ...	2012-03-28 10:02:55 -07:00
Artem Bityutskiy	182f514f88	ext4: remove useless s_dirt assignment Clean-up ext4 a tiny bit by removing useless s_dirt assignment in 'ext4_fill_super()' because a bit later we anyway call 'ext4_setup_super()' which writes the superblock to the media unconditionally. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-03-21 22:30:06 -04:00
Artem Bityutskiy	a8e25a8324	ext4: write superblock only once on unmount In some rather rare cases it is possible that ext4 may the superblock to the media twice. This patch makes sure this does not happen. This should speed up unmounting in those cases. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-03-21 22:29:15 -04:00
Artem Bityutskiy	1b8b9750f0	ext4: do not mark superblock as dirty unnecessarily Commit `a0375156ca` cleaned up superblock dirtying handling, but missed one place. This patch does what was intended: if we have the journal, then we update the superblock through the journal rather than doing this directly. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-03-21 22:28:29 -04:00
Allison Henderson	7335519274	ext4: correct ext4_punch_hole return codes ext4_punch_hole returns -ENOTSUPP but it should be using -EOPNOTSUPP Signed-off-by: Allison Henderson <achender@linux.vnet.ibm.com> Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-03-21 22:23:31 -04:00
Lukas Czerner	afcff5d80a	ext4: remove restrictive checks for EOFBLOCKS_FL We are going to remove the EOFBLOCKS_FL flag in the future, so this is the first part of the removal. We can not remove it entirely just now, since the e2fsck is still checking for it and it might cause headache to some people. Instead, remove the restrictive checks now and the rest later, when the new e2fsck code is out and common enough. This is also needed because punch hole already breaks the EOFBLOCKS_FL semantics, so it might cause the some troubles. So simply remove it. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-03-21 21:47:55 -04:00
Lukas Czerner	a7967f055a	ext4: always set then trimmed blocks count into len Currently if the range to trim is too small, for example on 1K fs the request to trim the first block, then the 'range->len' is not set reporting wrong number of discarded block to the caller. Fix this by always setting the 'range->len' before we return. Note that when there is a failure (-EINVAL) caller can not depend on 'range->len' being set properly. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-03-21 21:26:22 -04:00
Lukas Czerner	21e7fd22a5	ext4: fix trimmed block count accunting Currently when there is not enough free blocks in the block group to discard (grp->bb_free < minlen) the 'trimmed' is bumped up anyway with the number of discarded blocks from the previous iteration. Fix this by bumping up 'trimmed' only if the ext4_trim_all_free() was actually run. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-03-21 21:24:22 -04:00
Lukas Czerner	913eed83ed	ext4: fix start and len arguments handling in ext4_trim_fs() The overflow can happen when we are calling get_group_no_and_offset() which stores the group number in the ext4_grpblk_t type which is actually int. However when the blocknr is big enough the group number might be bigger than ext4_grpblk_t resulting in overflow. This will most likely happen with FITRIM default argument len = ULLONG_MAX. Fix this by using "end" variable instead of "start+len" as it is easier to get right and specifically check that the end is not beyond the end of the file system, so we are sure that the result of get_group_no_and_offset() will not overflow. Otherwise truncate it to the size of the file system. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-03-21 21:22:22 -04:00
Al Viro	07c0c5d8b8	ext4: initialization of ext4_li_mtx needs to be done earlier Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-03-20 22:05:02 -04:00
Al Viro	48fde701af	switch open-coded instances of d_make_root() to new helper Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-03-20 21:29:35 -04:00
Darrick J. Wong	636d7e2e3b	ext4: update s_free_{inodes,blocks}_count during online resize When we're doing an online resize of an ext4 filesystem, we need to update the free inode and block counts in the superblock so that fsck doesn't complain. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-03-20 15:46:11 -04:00
Theodore Ts'o	92b9781658	ext4: change some printk() calls to use ext4_msg() instead Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-03-19 23:41:49 -04:00
Joe Perches	d9ee81da93	ext4: avoid output message interleaving in ext4_error_<foo>() Using KERN_CONT means that messages from multiple threads may be interleaved. Avoid this by using a single printk call in ext4_error_inode and ext4_error_file. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-03-19 23:15:43 -04:00
Theodore Ts'o	1084f252e3	ext4: remove trailing newlines from ext4_msg() and ext4_error() messages The functions ext4_msg() and ext4_error() already tack on a trailing newline, so remove the unnecessary extra newline. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-03-19 23:13:43 -04:00
Joe Perches	ace36ad431	ext4: add no_printk argument validation, fix fallout Add argument validation to debug functions. Use ##__VA_ARGS__. Fix format and argument mismatches. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-03-19 23:11:43 -04:00
Joe Perches	7f6a11e73d	ext4: remove redundant "EXT4-fs: " from uses of ext4_msg ext4_msg adds "EXT4-fs: " to the messsage output. Remove the redundant bits from uses. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-03-19 23:09:43 -04:00
Lukas Czerner	dc1841d6cf	ext4: give more helpful error message in ext4_ext_rm_leaf() The error message produced by the ext4_ext_rm_leaf() when we are removing blocks which accidentally ends up inside the existing extent, is not very helpful, because we would like to also know which extent did we collide with. This commit changes the error message to get us also the information about the extent we are colliding with. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-03-19 23:07:43 -04:00
Lukas Czerner	7877191c28	ext4: remove unused code from ext4_ext_map_blocks() Since the commit 'Rewrite punch hole to use ext4_ext_remove_space()' reworked the punch hole implementation to use ext4_ext_remove_space() instead of ext4_ext_map_blocks(), we can remove the code which is no longer needed from the ext4_ext_map_blocks(). Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-03-19 23:05:43 -04:00
Lukas Czerner	5f95d21fb6	ext4: rewrite punch hole to use ext4_ext_remove_space() This commit rewrites ext4 punch hole implementation to use ext4_ext_remove_space() instead of its home gown way of doing this via ext4_ext_map_blocks(). There are several reasons for changing this. Firstly it is quite non obvious that punching hole needs to ext4_ext_map_blocks() to punch a hole, especially given that this function should map blocks, not unmap it. It also required a lot of new code in ext4_ext_map_blocks(). Secondly the design of it is not very effective. The reason is that we are trying to punch out blocks in ext4_ext_punch_hole() in opposite direction than in ext4_ext_rm_leaf() which causes the ext4_ext_rm_leaf() to iterate through the whole tree from the end to the start to find the requested extent for every extent we are going to punch out. And finally the current implementation does not use the existing code, but bring a lot of new code, which is IMO unnecessary since there already is some infrastructure we can use. Specifically ext4_ext_remove_space(). This commit changes ext4_ext_remove_space() to accept 'end' parameter so we can not only truncate to the end of file, but also remove the space in the middle of the file (punch a hole). Moreover, because the last block to punch out, might be in the middle of the extent, we have to split the extent at 'end + 1' so ext4_ext_rm_leaf() can easily either remove the whole fist part of split extent, or change its size. ext4_ext_remove_space() is then used to actually remove the space (extents) from within the hole, instead of ext4_ext_map_blocks(). Note that this also fix the issue with punch hole, where we would forget to remove empty index blocks from the extent tree, resulting in double free block error and file system corruption. This is simply because we now use different code path, where this problem does not exist. This has been tested with fsx running for several days and xfstests, plus xfstest #251 with '-o discard' run on the loop image (which converts discard requestes into punch hole to the backing file). All of it on 1K and 4K file system block size. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-03-19 23:03:19 -04:00
Fan Yong	d1f5273e9a	ext4: return 32/64-bit dir name hash according to usage type Traditionally ext2/3/4 has returned a 32-bit hash value from llseek() to appease NFSv2, which can only handle a 32-bit cookie for seekdir() and telldir(). However, this causes problems if there are 32-bit hash collisions, since the NFSv2 server can get stuck resending the same entries from the directory repeatedly. Allow ext4 to return a full 64-bit hash (both major and minor) for telldir to decrease the chance of hash collisions. This still needs integration on the NFS side. Patch-updated-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de> (blame me if something is not correct) Signed-off-by: Fan Yong <yong.fan@whamcloud.com> Signed-off-by: Andreas Dilger <adilger@whamcloud.com> Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-03-18 22:44:40 -04:00
Theodore Ts'o	31d4f3a2f3	ext4: check for zero length extent Explicitly test for an extent whose length is zero, and flag that as a corrupted extent. This avoids a kernel BUG_ON assertion failure. Tested: Without this patch, the file system image found in tests/f_ext_zero_len/image.gz in the latest e2fsprogs sources causes a kernel panic. With this patch, an ext4 file system error is noted instead, and the file system is marked as being corrupted. https://bugzilla.kernel.org/show_bug.cgi?id=42859 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org	2012-03-11 23:30:16 -04:00
Curt Wohlgemuth	4188188bdc	ext4: add comments to definition of ext4_io_end_t This should make it more clear what this structure is used for, and how some of the (mutually exclusive) fields are used to keep page cache references. Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-03-05 10:40:22 -05:00
Curt Wohlgemuth	b43d17f319	ext4: don't release page refs in ext4_end_bio() We can clear PageWriteback on each page when the IO completes, but we can't release the references on the page until we convert any uninitialized extents. Without this patch, the use of the dioread_nolock mount option can break buffered writes, because extents may not be converted by the time a subsequent buffered read comes in; if the page is not in the page cache, a read will return zeros if the extent is still uninitialized. I tested this with a (temporary) patch that adds a call to msleep(1000) at the start of ext4_end_io_work(), to delay processing of each DIO-unwritten work queue item. With this msleep(), a simple workload of fallocate write fadvise read will fail without this patch, succeeds with it. Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-03-05 10:40:15 -05:00
Jeff Moyer	491caa4363	ext4: fix race between sync and completed io work The following command line will leave the aio-stress process unkillable on an ext4 file system (in my case, mounted on /mnt/test): aio-stress -t 20 -s 10 -O -S -o 2 -I 1000 /mnt/test/aiostress.3561.4 /mnt/test/aiostress.3561.4.20 /mnt/test/aiostress.3561.4.19 /mnt/test/aiostress.3561.4.18 /mnt/test/aiostress.3561.4.17 /mnt/test/aiostress.3561.4.16 /mnt/test/aiostress.3561.4.15 /mnt/test/aiostress.3561.4.14 /mnt/test/aiostress.3561.4.13 /mnt/test/aiostress.3561.4.12 /mnt/test/aiostress.3561.4.11 /mnt/test/aiostress.3561.4.10 /mnt/test/aiostress.3561.4.9 /mnt/test/aiostress.3561.4.8 /mnt/test/aiostress.3561.4.7 /mnt/test/aiostress.3561.4.6 /mnt/test/aiostress.3561.4.5 /mnt/test/aiostress.3561.4.4 /mnt/test/aiostress.3561.4.3 /mnt/test/aiostress.3561.4.2 This is using the aio-stress program from the xfstests test suite. That particular command line tells aio-stress to do random writes to 20 files from 20 threads (one thread per file). The files are NOT preallocated, so you will get writes to random offsets within the file, thus creating holes and extending i_size. It also opens the file with O_DIRECT and O_SYNC. On to the problem. When an I/O requires unwritten extent conversion, it is queued onto the completed_io_list for the ext4 inode. Two code paths will pull work items from this list. The first is the ext4_end_io_work routine, and the second is ext4_flush_completed_IO, which is called via the fsync path (and O_SYNC handling, as well). There are two issues I've found in these code paths. First, if the fsync path beats the work routine to a particular I/O, the work routine will free the io_end structure! It does not take into account the fact that the io_end may still be in use by the fsync path. I've fixed this issue by adding yet another IO_END flag, indicating that the io_end is being processed by the fsync path. The second problem is that the work routine will make an assignment to io->flag outside of the lock. I have witnessed this result in a hang at umount. Moving the flag setting inside the lock resolved that problem. The problem was introduced by commit `b82e384c7b` ("ext4: optimize locking for end_io extent conversion"), which first appeared in 3.2. As such, the fix should be backported to that release (probably along with the unwritten extent conversion race fix). Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> CC: stable@kernel.org	2012-03-05 10:29:52 -05:00
Jeff Moyer	93ef8541d5	ext4: clean up the flags passed to __blockdev_direct_IO For extent-based files, you can perform DIO to holes, as mentioned in the comments in ext4_ext_direct_IO. However, that function passes DIO_SKIP_HOLES to __blockdev_direct_IO, which is really confusing to the uninitiated reader. The key, here, is that the get_block function passed in, ext4_get_block_write, completely ignores the create flag that is passed to it (the create flag is passed in from the direct I/O code, which uses the DIO_SKIP_HOLES flag to determine whether or not it should be cleared). This is a long-winded way of saying that the DIO_SKIP_HOLES flag is ultimately ignored. So let's remove it. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-03-05 10:19:52 -05:00
Theodore Ts'o	f70486055e	ext4: try to deprecate noacl and noxattr_user mount options No other file system allows ACL's and extended attributes to be enabled or disabled via a mount option. So let's try to deprecate these options from ext4. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-03-04 22:06:20 -05:00
Theodore Ts'o	c7198b9c1e	ext4: ignore mount options supported by ext2/3 (but have since been removed) Users who tried to use the ext4 file system driver is being used for the ext2 or ext3 file systems (via the CONFIG_EXT4_USE_FOR_EXT23 option) could have failed mounts if their /etc/fstab contains options recognized by ext2 or ext3 but which have since been removed in ext4. So teach ext4 to recognize them and give a warning that the mount option was removed. Report: https://bbs.archlinux.org/profile.php?id=33804 Signed-off-by: Tom Gundersen <teg@jklm.no> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Thomas Baechler <thomas@archlinux.org> Cc: Tobias Powalowski <tobias.powalowski@googlemail.com> Cc: Dave Reisner <d@falconindy.com>	2012-03-04 22:00:53 -05:00
Theodore Ts'o	66acdcf4ea	ext4: add debugging /proc file showing file system options Now that /proc/mounts is consistently showing only those mount options which need to be specified in /etc/fstab or on the mount command line, it is useful to have file which shows exactly which file system options are enabled. This can be useful when debugging a user problem. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-03-04 20:21:38 -05:00
Theodore Ts'o	5a916be1b3	ext4: make ext4_show_options() be table-driven Consistently show mount options which are the non-default, so that /proc/mounts accurately shows the mount options that would be necessary to mount the file system in its current mode of operation. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-03-04 19:27:31 -05:00
Theodore Ts'o	2adf6da837	ext4: move ext4_show_options() after parse_options() This commit is strictly a code movement so in preparation of changing ext4_show_options to be table driven. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-03-03 23:20:50 -05:00
Theodore Ts'o	26092bf524	ext4: use a table-driven handler for mount options By using a table-drive approach, we shave about 100 lines of code from ext4, and make the code a bit more regular and factored out. This will also make it possible in a future patch to use this table for displaying the mount options that were specified in /proc/mounts. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-03-03 23:20:47 -05:00
Theodore Ts'o	72578c33c4	ext4: unify handling of mount options which have been removed Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-03-03 18:04:40 -05:00
Theodore Ts'o	39ef17f1b0	ext4: simplify handling of the errors=* mount options Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-03-03 17:56:23 -05:00

1 2 3 4 5 ...

1555 commits