android_kernel_samsung_msm8976/fs/btrfs
Filipe Manana 590a2f0b8c Btrfs: fix hang on extent buffer lock caused by the inode_paths ioctl
commit 0c0fe3b0fa45082cd752553fdb3a4b42503a118e upstream.

While doing some tests I ran into an hang on an extent buffer's rwlock
that produced the following trace:

[39389.800012] NMI watchdog: BUG: soft lockup - CPU#15 stuck for 22s! [fdm-stress:32166]
[39389.800016] NMI watchdog: BUG: soft lockup - CPU#14 stuck for 22s! [fdm-stress:32165]
[39389.800016] Modules linked in: btrfs dm_mod ppdev xor sha256_generic hmac raid6_pq drbg ansi_cprng aesni_intel i2c_piix4 acpi_cpufreq aes_x86_64 ablk_helper tpm_tis parport_pc i2c_core sg cryptd evdev psmouse lrw tpm parport gf128mul serio_raw pcspkr glue_helper processor button loop autofs4 ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel scsi_mod e1000 virtio floppy [last unloaded: btrfs]
[39389.800016] irq event stamp: 0
[39389.800016] hardirqs last  enabled at (0): [<          (null)>]           (null)
[39389.800016] hardirqs last disabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35
[39389.800016] softirqs last  enabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35
[39389.800016] softirqs last disabled at (0): [<          (null)>]           (null)
[39389.800016] CPU: 14 PID: 32165 Comm: fdm-stress Not tainted 4.4.0-rc6-btrfs-next-18+ #1
[39389.800016] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
[39389.800016] task: ffff880175b1ca40 ti: ffff8800a185c000 task.ti: ffff8800a185c000
[39389.800016] RIP: 0010:[<ffffffff810902af>]  [<ffffffff810902af>] queued_spin_lock_slowpath+0x57/0x158
[39389.800016] RSP: 0018:ffff8800a185fb80  EFLAGS: 00000202
[39389.800016] RAX: 0000000000000101 RBX: ffff8801710c4e9c RCX: 0000000000000101
[39389.800016] RDX: 0000000000000100 RSI: 0000000000000001 RDI: 0000000000000001
[39389.800016] RBP: ffff8800a185fb98 R08: 0000000000000001 R09: 0000000000000000
[39389.800016] R10: ffff8800a185fb68 R11: 6db6db6db6db6db7 R12: ffff8801710c4e98
[39389.800016] R13: ffff880175b1ca40 R14: ffff8800a185fc10 R15: ffff880175b1ca40
[39389.800016] FS:  00007f6d37fff700(0000) GS:ffff8802be9c0000(0000) knlGS:0000000000000000
[39389.800016] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[39389.800016] CR2: 00007f6d300019b8 CR3: 0000000037c93000 CR4: 00000000001406e0
[39389.800016] Stack:
[39389.800016]  ffff8801710c4e98 ffff8801710c4e98 ffff880175b1ca40 ffff8800a185fbb0
[39389.800016]  ffffffff81091e11 ffff8801710c4e98 ffff8800a185fbc8 ffffffff81091895
[39389.800016]  ffff8801710c4e98 ffff8800a185fbe8 ffffffff81486c5c ffffffffa067288c
[39389.800016] Call Trace:
[39389.800016]  [<ffffffff81091e11>] queued_read_lock_slowpath+0x46/0x60
[39389.800016]  [<ffffffff81091895>] do_raw_read_lock+0x3e/0x41
[39389.800016]  [<ffffffff81486c5c>] _raw_read_lock+0x3d/0x44
[39389.800016]  [<ffffffffa067288c>] ? btrfs_tree_read_lock+0x54/0x125 [btrfs]
[39389.800016]  [<ffffffffa067288c>] btrfs_tree_read_lock+0x54/0x125 [btrfs]
[39389.800016]  [<ffffffffa0622ced>] ? btrfs_find_item+0xa7/0xd2 [btrfs]
[39389.800016]  [<ffffffffa069363f>] btrfs_ref_to_path+0xd6/0x174 [btrfs]
[39389.800016]  [<ffffffffa0693730>] inode_to_path+0x53/0xa2 [btrfs]
[39389.800016]  [<ffffffffa0693e2e>] paths_from_inode+0x117/0x2ec [btrfs]
[39389.800016]  [<ffffffffa0670cff>] btrfs_ioctl+0xd5b/0x2793 [btrfs]
[39389.800016]  [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc
[39389.800016]  [<ffffffff81276727>] ? __this_cpu_preempt_check+0x13/0x15
[39389.800016]  [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc
[39389.800016]  [<ffffffff8118b3d4>] ? rcu_read_unlock+0x3e/0x5d
[39389.800016]  [<ffffffff811822f8>] do_vfs_ioctl+0x42b/0x4ea
[39389.800016]  [<ffffffff8118b4f3>] ? __fget_light+0x62/0x71
[39389.800016]  [<ffffffff8118240e>] SyS_ioctl+0x57/0x79
[39389.800016]  [<ffffffff814872d7>] entry_SYSCALL_64_fastpath+0x12/0x6f
[39389.800016] Code: b9 01 01 00 00 f7 c6 00 ff ff ff 75 32 83 fe 01 89 ca 89 f0 0f 45 d7 f0 0f b1 13 39 f0 74 04 89 c6 eb e2 ff ca 0f 84 fa 00 00 00 <8b> 03 84 c0 74 04 f3 90 eb f6 66 c7 03 01 00 e9 e6 00 00 00 e8
[39389.800012] Modules linked in: btrfs dm_mod ppdev xor sha256_generic hmac raid6_pq drbg ansi_cprng aesni_intel i2c_piix4 acpi_cpufreq aes_x86_64 ablk_helper tpm_tis parport_pc i2c_core sg cryptd evdev psmouse lrw tpm parport gf128mul serio_raw pcspkr glue_helper processor button loop autofs4 ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel scsi_mod e1000 virtio floppy [last unloaded: btrfs]
[39389.800012] irq event stamp: 0
[39389.800012] hardirqs last  enabled at (0): [<          (null)>]           (null)
[39389.800012] hardirqs last disabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35
[39389.800012] softirqs last  enabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35
[39389.800012] softirqs last disabled at (0): [<          (null)>]           (null)
[39389.800012] CPU: 15 PID: 32166 Comm: fdm-stress Tainted: G             L  4.4.0-rc6-btrfs-next-18+ #1
[39389.800012] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
[39389.800012] task: ffff880179294380 ti: ffff880034a60000 task.ti: ffff880034a60000
[39389.800012] RIP: 0010:[<ffffffff81091e8d>]  [<ffffffff81091e8d>] queued_write_lock_slowpath+0x62/0x72
[39389.800012] RSP: 0018:ffff880034a639f0  EFLAGS: 00000206
[39389.800012] RAX: 0000000000000101 RBX: ffff8801710c4e98 RCX: 0000000000000000
[39389.800012] RDX: 00000000000000ff RSI: 0000000000000000 RDI: ffff8801710c4e9c
[39389.800012] RBP: ffff880034a639f8 R08: 0000000000000001 R09: 0000000000000000
[39389.800012] R10: ffff880034a639b0 R11: 0000000000001000 R12: ffff8801710c4e98
[39389.800012] R13: 0000000000000001 R14: ffff880172cbc000 R15: ffff8801710c4e00
[39389.800012] FS:  00007f6d377fe700(0000) GS:ffff8802be9e0000(0000) knlGS:0000000000000000
[39389.800012] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[39389.800012] CR2: 00007f6d3d3c1000 CR3: 0000000037c93000 CR4: 00000000001406e0
[39389.800012] Stack:
[39389.800012]  ffff8801710c4e98 ffff880034a63a10 ffffffff81091963 ffff8801710c4e98
[39389.800012]  ffff880034a63a30 ffffffff81486f1b ffffffffa0672cb3 ffff8801710c4e00
[39389.800012]  ffff880034a63a78 ffffffffa0672cb3 ffff8801710c4e00 ffff880034a63a58
[39389.800012] Call Trace:
[39389.800012]  [<ffffffff81091963>] do_raw_write_lock+0x72/0x8c
[39389.800012]  [<ffffffff81486f1b>] _raw_write_lock+0x3a/0x41
[39389.800012]  [<ffffffffa0672cb3>] ? btrfs_tree_lock+0x119/0x251 [btrfs]
[39389.800012]  [<ffffffffa0672cb3>] btrfs_tree_lock+0x119/0x251 [btrfs]
[39389.800012]  [<ffffffffa061aeba>] ? rcu_read_unlock+0x5b/0x5d [btrfs]
[39389.800012]  [<ffffffffa061ce13>] ? btrfs_root_node+0xda/0xe6 [btrfs]
[39389.800012]  [<ffffffffa061ce83>] btrfs_lock_root_node+0x22/0x42 [btrfs]
[39389.800012]  [<ffffffffa062046b>] btrfs_search_slot+0x1b8/0x758 [btrfs]
[39389.800012]  [<ffffffff810fc6b0>] ? time_hardirqs_on+0x15/0x28
[39389.800012]  [<ffffffffa06365db>] btrfs_lookup_inode+0x31/0x95 [btrfs]
[39389.800012]  [<ffffffff8108d62f>] ? trace_hardirqs_on+0xd/0xf
[39389.800012]  [<ffffffff8148482b>] ? mutex_lock_nested+0x397/0x3bc
[39389.800012]  [<ffffffffa068821b>] __btrfs_update_delayed_inode+0x59/0x1c0 [btrfs]
[39389.800012]  [<ffffffffa068858e>] __btrfs_commit_inode_delayed_items+0x194/0x5aa [btrfs]
[39389.800012]  [<ffffffff81486ab7>] ? _raw_spin_unlock+0x31/0x44
[39389.800012]  [<ffffffffa0688a48>] __btrfs_run_delayed_items+0xa4/0x15c [btrfs]
[39389.800012]  [<ffffffffa0688d62>] btrfs_run_delayed_items+0x11/0x13 [btrfs]
[39389.800012]  [<ffffffffa064048e>] btrfs_commit_transaction+0x234/0x96e [btrfs]
[39389.800012]  [<ffffffffa0618d10>] btrfs_sync_fs+0x145/0x1ad [btrfs]
[39389.800012]  [<ffffffffa0671176>] btrfs_ioctl+0x11d2/0x2793 [btrfs]
[39389.800012]  [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc
[39389.800012]  [<ffffffff81140261>] ? __might_fault+0x4c/0xa7
[39389.800012]  [<ffffffff81140261>] ? __might_fault+0x4c/0xa7
[39389.800012]  [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc
[39389.800012]  [<ffffffff8118b3d4>] ? rcu_read_unlock+0x3e/0x5d
[39389.800012]  [<ffffffff811822f8>] do_vfs_ioctl+0x42b/0x4ea
[39389.800012]  [<ffffffff8118b4f3>] ? __fget_light+0x62/0x71
[39389.800012]  [<ffffffff8118240e>] SyS_ioctl+0x57/0x79
[39389.800012]  [<ffffffff814872d7>] entry_SYSCALL_64_fastpath+0x12/0x6f
[39389.800012] Code: f0 0f b1 13 85 c0 75 ef eb 2a f3 90 8a 03 84 c0 75 f8 f0 0f b0 13 84 c0 75 f0 ba ff 00 00 00 eb 0a f0 0f b1 13 ff c8 74 0b f3 90 <8b> 03 83 f8 01 75 f7 eb ed c6 43 04 00 5b 5d c3 0f 1f 44 00 00

This happens because in the code path executed by the inode_paths ioctl we
end up nesting two calls to read lock a leaf's rwlock when after the first
call to read_lock() and before the second call to read_lock(), another
task (running the delayed items as part of a transaction commit) has
already called write_lock() against the leaf's rwlock. This situation is
illustrated by the following diagram:

         Task A                       Task B

  btrfs_ref_to_path()               btrfs_commit_transaction()
    read_lock(&eb->lock);

                                      btrfs_run_delayed_items()
                                        __btrfs_commit_inode_delayed_items()
                                          __btrfs_update_delayed_inode()
                                            btrfs_lookup_inode()

                                              write_lock(&eb->lock);
                                                --> task waits for lock

    read_lock(&eb->lock);
    --> makes this task hang
        forever (and task B too
	of course)

So fix this by avoiding doing the nested read lock, which is easily
avoidable. This issue does not happen if task B calls write_lock() after
task A does the second call to read_lock(), however there does not seem
to exist anything in the documentation that mentions what is the expected
behaviour for recursive locking of rwlocks (leaving the idea that doing
so is not a good usage of rwlocks).

Also, as a side effect necessary for this fix, make sure we do not
needlessly read lock extent buffers when the input path has skip_locking
set (used when called from send).

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-25 11:57:46 -08:00
..
acl.c Btrfs: fix incorrect inode acl reset 2013-12-20 07:45:12 -08:00
async-thread.c Btrfs: call the ordered free operation without any locks held 2012-07-25 16:15:07 -04:00
async-thread.h
backref.c Btrfs: fix hang on extent buffer lock caused by the inode_paths ioctl 2016-02-25 11:57:46 -08:00
backref.h Btrfs: fix scrub_print_warning to handle skinny metadata extents 2014-06-30 20:09:46 -07:00
btrfs_inode.h btrfs: fix minor typo in comment 2013-05-06 15:54:49 -04:00
check-integrity.c Btrfs: use a btrfs bioset instead of abusing bio internals 2013-05-17 21:52:52 -04:00
check-integrity.h
compat.h
compression.c Btrfs: fix data corruption when reading/updating compressed extents 2014-03-23 21:38:20 -07:00
compression.h btrfs: make static code static & remove dead code 2013-05-06 15:55:23 -04:00
ctree.c Btrfs: make xattr replace operations atomic 2015-07-03 19:48:09 -07:00
ctree.h Btrfs: make xattr replace operations atomic 2015-07-03 19:48:09 -07:00
delayed-inode.c Btrfs: don't delay inode ref updates during log replay 2015-01-16 06:59:02 -08:00
delayed-inode.h Btrfs: improve the delayed inode throttling 2013-03-07 07:52:40 -05:00
delayed-ref.c Btrfs: separate sequence numbers for delayed ref tracking and tree mod log 2013-05-06 15:55:17 -04:00
delayed-ref.h Btrfs: handle running extent ops with skinny metadata 2013-05-17 21:40:15 -04:00
dev-replace.c Btrfs: don't allow device replace on RAID5/RAID6 2013-05-17 21:40:16 -04:00
dev-replace.h Btrfs: add new sources for device replace code 2012-12-12 17:15:41 -05:00
dir-item.c Btrfs: make xattr replace operations atomic 2015-07-03 19:48:09 -07:00
disk-io.c Btrfs: fix fs corruption on transaction abort if device supports discard 2015-01-08 09:58:17 -08:00
disk-io.h btrfs: make static code static & remove dead code 2013-05-06 15:55:23 -04:00
export.c fs: encode_fh: return FILEID_INVALID if invalid fid_type 2013-02-26 02:46:10 -05:00
export.h
extent-tree.c Btrfs: fix log tree corruption when fs mounted with -o discard 2015-05-06 21:56:21 +02:00
extent_io.c btrfs: incorrect handling for fiemap_fill_next_extent return 2015-06-22 16:55:55 -07:00
extent_io.h Btrfs: use a btrfs bioset instead of abusing bio internals 2013-05-17 21:52:52 -04:00
extent_map.c Btrfs: do not move em to modified list when unpinning 2015-01-08 09:58:17 -08:00
extent_map.h Btrfs: fix bad extent logging 2013-05-06 15:54:34 -04:00
file-item.c Btrfs: fix kfree on list_head in btrfs_lookup_csums_range error cleanup 2014-11-14 08:48:01 -08:00
file.c Btrfs: fix data loss in the fast fsync path 2015-03-18 13:22:29 +01:00
free-space-cache.c Btrfs: output warning instead of error when loading free space cache failed 2014-06-30 20:09:45 -07:00
free-space-cache.h Btrfs: don't use global block reservation for inode cache truncation 2013-05-17 21:40:22 -04:00
hash.h btrfs: extended inode refs 2012-10-09 09:14:45 -04:00
inode-item.c btrfs: make static code static & remove dead code 2013-05-06 15:55:23 -04:00
inode-map.c Btrfs: use kmem_cache_free when freeing entry in inode cache 2015-08-03 09:29:46 -07:00
inode-map.h
inode.c Btrfs: fix race leading to BUG_ON when running delalloc for nodatacow 2016-01-22 19:47:52 -08:00
ioctl.c Btrfs: fix inode eviction infinite loop after cloning into it 2015-05-06 21:56:21 +02:00
Kconfig btrfs: move leak debug code to functions 2013-05-06 15:55:16 -04:00
locking.c btrfs: make static code static & remove dead code 2013-05-06 15:55:23 -04:00
locking.h Btrfs: remove btrfs_try_spin_lock 2013-03-14 14:57:10 -04:00
lzo.c
Makefile Btrfs: RAID5 and RAID6 2013-02-01 14:24:23 -05:00
math.h Btrfs: cleanup duplicated division functions 2012-12-11 13:31:30 -05:00
ordered-data.c Btrfs: improve the performance of the csums lookup 2013-05-06 15:54:35 -04:00
ordered-data.h Btrfs: improve the performance of the csums lookup 2013-05-06 15:54:35 -04:00
orphan.c
print-tree.c Btrfs: Include the device in most error printk()s 2013-05-06 15:54:23 -04:00
print-tree.h btrfs: make static code static & remove dead code 2013-05-06 15:55:23 -04:00
qgroup.c Btrfs: automatic rescan after "quota enable" command 2013-05-06 15:55:20 -04:00
raid56.c Btrfs: use a btrfs bioset instead of abusing bio internals 2013-05-17 21:52:52 -04:00
raid56.h Btrfs: RAID5 and RAID6 2013-02-01 14:24:23 -05:00
rcu-string.h Btrfs: use rcu to protect device->name 2012-06-14 21:29:16 -04:00
reada.c Btrfs: fix reada debug code compilation 2013-05-06 15:54:55 -04:00
relocation.c Btrfs: fix build_backref_tree issue with multiple shared blocks 2014-10-30 09:35:09 -07:00
root-tree.c Btrfs: delete unused parameter to btrfs_read_root_item() 2013-05-06 15:55:14 -04:00
scrub.c Btrfs: fix scrub_print_warning to handle skinny metadata extents 2014-06-30 20:09:46 -07:00
send.c Btrfs: send, don't error in the presence of subvols/snapshots 2014-06-30 20:09:46 -07:00
send.h btrfs: make static code static & remove dead code 2013-05-06 15:55:23 -04:00
struct-funcs.c Btrfs: rewrite BTRFS_SETGET_FUNCS 2012-07-23 16:28:06 -04:00
super.c btrfs: cleanup orphans while looking up default subvolume 2015-06-22 16:55:55 -07:00
sysfs.c btrfs: fixup/remove module.h usage as required 2013-03-01 15:01:01 -05:00
transaction.c Btrfs: fix race in WAIT_SYNC ioctl 2014-10-30 09:35:09 -07:00
transaction.h btrfs: make static code static & remove dead code 2013-05-06 15:55:23 -04:00
tree-defrag.c btrfs: remove cache only arguments from defrag path 2013-02-20 12:59:36 -05:00
tree-log.c Btrfs:__add_inode_ref: out of bounds memory read when looking for extended ref. 2015-03-18 13:22:29 +01:00
tree-log.h btrfs: make static code static & remove dead code 2013-05-06 15:55:23 -04:00
ulist.c Btrfs: fix crash regarding to ulist_add_merge 2013-08-11 18:35:24 -07:00
ulist.h Btrfs: add a rb_tree to improve performance of ulist search 2013-05-06 15:54:44 -04:00
version.h
volumes.c fs: btrfs: volumes.c: Fix for possible null pointer dereference 2014-06-30 20:09:46 -07:00
volumes.h Btrfs: use a btrfs bioset instead of abusing bio internals 2013-05-17 21:52:52 -04:00
xattr.c Btrfs: make xattr replace operations atomic 2015-07-03 19:48:09 -07:00
xattr.h
zlib.c btrfs: fix message printing 2012-10-09 09:19:57 -04:00