android_kernel_google_msm/fs
Jerry Hoemann 917a35f64e fsnotify: next_i is freed during fsnotify_unmount_inodes.
commit 6424babfd6 upstream.

During file system stress testing on 3.10 and 3.12 based kernels, the
umount command occasionally hung in fsnotify_unmount_inodes in the
section of code:

                spin_lock(&inode->i_lock);
                if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) {
                        spin_unlock(&inode->i_lock);
                        continue;
                }

As this section of code holds the global inode_sb_list_lock, eventually
the system hangs trying to acquire the lock.

Multiple crash dumps showed:

The inode->i_state == 0x60 and i_count == 0 and i_sb_list would point
back at itself.  As this is not the value of list upon entry to the
function, the kernel never exits the loop.

To help narrow down problem, the call to list_del_init in
inode_sb_list_del was changed to list_del.  This poisons the pointers in
the i_sb_list and causes a kernel to panic if it transverse a freed
inode.

Subsequent stress testing paniced in fsnotify_unmount_inodes at the
bottom of the list_for_each_entry_safe loop showing next_i had become
free.

We believe the root cause of the problem is that next_i is being freed
during the window of time that the list_for_each_entry_safe loop
temporarily releases inode_sb_list_lock to call fsnotify and
fsnotify_inode_delete.

The code in fsnotify_unmount_inodes attempts to prevent the freeing of
inode and next_i by calling __iget.  However, the code doesn't do the
__iget call on next_i

	if i_count == 0 or
	if i_state & (I_FREEING | I_WILL_FREE)

The patch addresses this issue by advancing next_i in the above two cases
until we either find a next_i which we can __iget or we reach the end of
the list.  This makes the handling of next_i more closely match the
handling of the variable "inode."

The time to reproduce the hang is highly variable (from hours to days.) We
ran the stress test on a 3.10 kernel with the proposed patch for a week
without failure.

During list_for_each_entry_safe, next_i is becoming free causing
the loop to never terminate.  Advance next_i in those cases where
__iget is not done.

Signed-off-by: Jerry Hoemann <jerry.hoemann@hp.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Ken Helias <kenhelias@firemail.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-04-14 17:34:03 +08:00
..
9p move d_rcu from overlapping d_child to overlapping d_alias 2015-04-14 17:33:58 +08:00
adfs switch open-coded instances of d_make_root() to new helper 2012-03-20 21:29:35 -04:00
affs move d_rcu from overlapping d_child to overlapping d_alias 2015-04-14 17:33:58 +08:00
afs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-03-21 13:36:41 -07:00
autofs4 move d_rcu from overlapping d_child to overlapping d_alias 2015-04-14 17:33:58 +08:00
befs switch open-coded instances of d_make_root() to new helper 2012-03-20 21:29:35 -04:00
bfs switch open-coded instances of d_make_root() to new helper 2012-03-20 21:29:35 -04:00
btrfs Btrfs: fix fs corruption on transaction abort if device supports discard 2015-04-14 17:33:45 +08:00
cachefiles fs: cachefiles: add support for large files in filesystem caching 2014-06-07 16:02:04 -07:00
ceph move d_rcu from overlapping d_child to overlapping d_alias 2015-04-14 17:33:58 +08:00
cifs move d_rcu from overlapping d_child to overlapping d_alias 2015-04-14 17:33:58 +08:00
coda move d_rcu from overlapping d_child to overlapping d_alias 2015-04-14 17:33:58 +08:00
configfs configfs: fix race between dentry put and lookup 2013-11-29 10:50:37 -08:00
cramfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-03-21 13:36:41 -07:00
debugfs move d_rcu from overlapping d_child to overlapping d_alias 2015-04-14 17:33:58 +08:00
devpts devpts: plug the memory leak in kill_sb 2013-12-04 10:50:14 -08:00
dlm dlm fixes for 3.4 2012-04-23 18:22:42 -07:00
ecryptfs eCryptfs: Remove buggy and unnecessary write in file name decode routine 2015-04-14 17:33:43 +08:00
efs switch open-coded instances of d_make_root() to new helper 2012-03-20 21:29:35 -04:00
exofs ore: Fix wrong math in allocation of per device BIO 2014-02-13 11:51:11 -08:00
exportfs move d_rcu from overlapping d_child to overlapping d_alias 2015-04-14 17:33:58 +08:00
ext2 ext2: Fix fs corruption in ext2_get_xip_mem() 2014-09-25 11:49:19 +08:00
ext3 ext3: Don't check quota format when there are no quota files 2015-02-02 17:05:00 +08:00
ext4 move d_rcu from overlapping d_child to overlapping d_alias 2015-04-14 17:33:58 +08:00
fat fat: fix possible overflow for fat_clusters 2013-06-07 12:49:12 -07:00
freevxfs switch open-coded instances of d_make_root() to new helper 2012-03-20 21:29:35 -04:00
fscache fs/fscache/stats.c: fix memory leak 2013-05-07 19:51:55 -07:00
fuse fuse: hotfix truncate_pagecache() issue 2014-03-11 16:10:04 -07:00
gfs2 GFS2: Fix incorrect invalidation for DIO/buffered I/O 2014-01-08 09:42:12 -08:00
hfs switch open-coded instances of d_make_root() to new helper 2012-03-20 21:29:35 -04:00
hfsplus hfsplus: fix potential overflow in hfsplus_file_truncate() 2013-04-25 21:19:54 -07:00
hostfs Merge branch 'for-linus-3.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml 2012-03-27 18:29:53 -07:00
hpfs hpfs: deadlock and race in directory lseek() 2014-02-13 11:51:18 -08:00
hppfs switch open-coded instances of d_make_root() to new helper 2012-03-20 21:29:35 -04:00
hugetlbfs hugetlbfs: fix mmap failure in unaligned size request 2013-05-19 10:54:48 -07:00
isofs isofs: Fix unchecked printing of ER records 2015-04-14 17:33:47 +08:00
jbd jbd: Fix lock ordering bug in journal_unmap_buffer() 2012-12-03 11:47:10 -08:00
jbd2 ext4/jbd2: don't wait (forever) for stale tid caused by wraparound 2014-03-11 16:10:05 -07:00
jffs2 jffs2: remove from wait queue after schedule() 2014-04-26 17:13:20 -07:00
jfs jfs: fix readdir regression 2015-04-14 17:34:02 +08:00
lockd lockd: Try to reconnect if statd has moved 2015-02-02 17:04:42 +08:00
logfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-03-21 13:36:41 -07:00
minix Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-03-21 13:36:41 -07:00
ncpfs move d_rcu from overlapping d_child to overlapping d_alias 2015-04-14 17:33:58 +08:00
nfs move d_rcu from overlapping d_child to overlapping d_alias 2015-04-14 17:33:58 +08:00
nfs_common
nfsd nfsd: Fix slot wake up race in the nfsv4.1 callback code 2015-04-14 17:33:37 +08:00
nilfs2 nilfs2: fix deadlock of segment constructor over I_SYNC flag 2015-04-14 17:34:00 +08:00
nls
notify fsnotify: next_i is freed during fsnotify_unmount_inodes. 2015-04-14 17:34:03 +08:00
ntfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-03-21 13:36:41 -07:00
ocfs2 move d_rcu from overlapping d_child to overlapping d_alias 2015-04-14 17:33:58 +08:00
omfs switch open-coded instances of d_make_root() to new helper 2012-03-20 21:29:35 -04:00
openpromfs switch open-coded instances of d_make_root() to new helper 2012-03-20 21:29:35 -04:00
proc pagemap: do not leak physical addresses to non-privileged userspace 2015-04-14 17:34:02 +08:00
pstore pstore: Avoid deadlock in panic and emergency-restart path 2013-03-04 06:06:43 +08:00
qnx4 qnx4: new helper - try_extent() 2012-03-20 21:29:52 -04:00
qnx6 fs: initial qnx6fs addition 2012-03-20 21:29:38 -04:00
quota quota: Fix race between dqput() and dquot_scan_active() 2014-03-11 16:10:02 -07:00
ramfs fs: ramfs: file-nommu: add SetPageUptodate() 2012-07-16 09:04:45 -07:00
reiserfs reiserfs: fix race in readdir 2014-05-06 07:51:44 -07:00
romfs MTD merge for 3.4 2012-03-30 17:31:56 -07:00
squashfs Add an extra mount time sanity check, plus some code cleanups and bug fixes. 2012-03-28 18:05:54 -07:00
sysfs sysfs: fix use after free in case of concurrent read/write and readdir 2013-05-07 19:51:54 -07:00
sysv switch open-coded instances of d_make_root() to new helper 2012-03-20 21:29:35 -04:00
ubifs UBIFS: fix free log space calculation 2015-02-02 17:04:36 +08:00
udf udf: Check component length before reading it 2015-04-14 17:33:48 +08:00
ufs Remove all #inclusions of asm/system.h 2012-03-28 18:30:03 +01:00
xfs xfs: underflow bug in xfs_attrlist_by_handle() 2013-12-20 07:34:19 -08:00
aio.c aio: fix possible invalid memory access when DEBUG is enabled 2013-05-01 09:41:03 -07:00
anon_inodes.c anon_inodes: move allocation of anon_inode into ->mount() 2012-03-20 21:29:45 -04:00
attr.c vfs: increment iversion when a file is truncated 2012-06-10 00:36:12 +09:00
bad_inode.c fs: reduce the use of module.h wherever possible 2012-02-28 19:31:58 -05:00
binfmt_aout.c VM: add "vm_mmap()" helper function 2012-04-20 17:29:13 -07:00
binfmt_elf.c x86, mm/ASLR: Fix stack randomization on 64-bit systems 2015-04-14 17:33:58 +08:00
binfmt_elf_fdpic.c VM: add "vm_mmap()" helper function 2012-04-20 17:29:13 -07:00
binfmt_em86.c exec: use -ELOOP for max recursion depth 2013-03-28 12:12:28 -07:00
binfmt_flat.c VM: add "vm_mmap()" helper function 2012-04-20 17:29:13 -07:00
binfmt_misc.c exec: use -ELOOP for max recursion depth 2013-03-28 12:12:28 -07:00
binfmt_script.c exec: use -ELOOP for max recursion depth 2013-03-28 12:12:28 -07:00
binfmt_som.c VM: add "vm_mmap()" helper function 2012-04-20 17:29:13 -07:00
bio-integrity.c fs: remove the second argument of k[un]map_atomic() 2012-03-20 21:48:21 +08:00
bio.c SCSI: sg: Fix user memory corruption when SG_IO is interrupted by a signal 2013-09-07 21:58:16 -07:00
block_dev.c writeback: Fix periodic writeback after fs mount 2013-07-28 16:26:08 -07:00
buffer.c vfs: fix data corruption when blocksize < pagesize for mmaped data 2015-02-02 17:04:52 +08:00
char_dev.c
compat.c Fix: compat_rw_copy_check_uvector() misuse in aio, readv, writev, and security keys 2013-03-14 11:29:51 -07:00
compat_binfmt_elf.c
compat_ioctl.c fs/compat_ioctl.c: VIDEO_SET_SPU_PALETTE missing error check 2012-10-31 10:02:55 -07:00
dcache.c deal with deadlock in d_walk() 2015-04-14 17:33:58 +08:00
dcookies.c fs: reduce the use of module.h wherever possible 2012-02-28 19:31:58 -05:00
direct-io.c fs: Fix possible use-after-free with AIO 2013-03-04 06:06:41 +08:00
drop_caches.c
eventfd.c fs: reduce the use of module.h wherever possible 2012-02-28 19:31:58 -05:00
eventpoll.c epoll: prevent missed events on EPOLL_CTL_MOD 2013-01-17 08:50:54 -08:00
exec.c exec/ptrace: fix get_dumpable() incorrect tests 2013-11-29 10:50:34 -08:00
fcntl.c Wrap accesses to the fd_sets in struct fdtable 2012-02-19 10:30:52 -08:00
fhandle.c
fifo.c fifo: Do not restart open() if it already found a partner 2012-07-19 08:58:56 -07:00
file.c fs/file.c:fdtable: avoid triggering OOMs from alloc_fdmem 2014-02-22 10:32:45 -08:00
file_table.c vfs: drop_file_write_access() made static 2012-03-20 21:29:32 -04:00
filesystems.c
fs-writeback.c writeback: fix a subtle race condition in I_DIRTY clearing 2015-04-14 17:33:41 +08:00
fs_struct.c The following text was taken from the original review request: 2012-03-24 10:24:31 -07:00
generic_acl.c
inode.c vfs: Revert spurious fix to spinning prevention in prune_icache_sb 2013-04-16 21:27:26 -07:00
internal.h
ioctl.c fs: reduce the use of module.h wherever possible 2012-02-28 19:31:58 -05:00
ioprio.c block: Fix computation of merged request priority 2015-02-02 17:05:17 +08:00
Kconfig Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-03-21 13:36:41 -07:00
Kconfig.binfmt
libfs.c move d_rcu from overlapping d_child to overlapping d_alias 2015-04-14 17:33:58 +08:00
locks.c locks: allow __break_lease to sleep even when break_time is 0 2014-05-13 14:11:31 +02:00
Makefile fs: initial qnx6fs addition 2012-03-20 21:29:38 -04:00
mbcache.c
mount.h
mpage.c fs: reduce the use of module.h wherever possible 2012-02-28 19:31:58 -05:00
namei.c don't bugger nd->seq on set_root_rcu() from follow_dotdot_rcu() 2014-12-01 18:02:31 +08:00
namespace.c mnt: Prevent pivot_root from creating a loop in the mount tree 2015-02-02 17:04:50 +08:00
no-block.c
open.c vfs: canonicalize create mode in build_open_flags() 2012-09-14 10:00:05 -07:00
pipe.c vfs: fix pipe counter breakage 2013-03-14 11:29:51 -07:00
pnode.c get rid of propagate_umount() mistakenly treating slaves as busy. 2014-12-01 18:02:21 +08:00
pnode.h
posix_acl.c posix_acl: handle NULL ACL in posix_acl_equiv_mode 2014-06-07 16:02:02 -07:00
proc_namespace.c
read_write.c fs: reduce the use of module.h wherever possible 2012-02-28 19:31:58 -05:00
read_write.h
readdir.c fs: reduce the use of module.h wherever possible 2012-02-28 19:31:58 -05:00
select.c posix_types.h: Cleanup stale __NFDBITS and related definitions 2012-08-09 08:31:39 -07:00
seq_file.c The following text was taken from the original review request: 2012-03-24 10:24:31 -07:00
signalfd.c epoll: ep_unregister_pollwait() can use the freed pwq->whead 2012-02-24 11:42:50 -08:00
splice.c tcp: fix MSG_SENDPAGE_NOTLAST logic 2013-01-11 09:07:14 -08:00
stack.c fs: reduce the use of module.h wherever possible 2012-02-28 19:31:58 -05:00
stat.c VFS: make vfs_fstat() use f[get|put]_light() 2014-06-07 16:02:04 -07:00
statfs.c vfs: allow O_PATH file descriptors for fstatfs() 2013-10-22 09:02:25 +01:00
super.c fs: Fix theoretical division by 0 in super_cache_scan(). 2015-02-02 17:04:48 +08:00
sync.c fs: reduce the use of module.h wherever possible 2012-02-28 19:31:58 -05:00
timerfd.c
utimes.c
xattr.c fs/xattr.c:setxattr(): improve handling of allocation failures 2012-04-05 15:25:50 -07:00
xattr_acl.c fs: reduce the use of module.h wherever possible 2012-02-28 19:31:58 -05:00