Commit Graph

446542 Commits

Author SHA1 Message Date
Christoph Paasch 9690e10794 net: Prevent invalid access to skb->prev in __qdisc_drop_all
[ Upstream commit 9410d386d0a829ace9558336263086c2fbbe8aed ]

__qdisc_drop_all() accesses skb->prev to get to the tail of the
segment-list.

With commit 68d2f84a1368 ("net: gro: properly remove skb from list")
the skb-list handling has been changed to set skb->next to NULL and set
the list-poison on skb->prev.

With that change, __qdisc_drop_all() will panic when it tries to
dereference skb->prev.

Since commit 992cba7e276d ("net: Add and use skb_list_del_init().")
__list_del_entry is used, leaving skb->prev unchanged (thus,
pointing to the list-head if it's the first skb of the list).
This will make __qdisc_drop_all modify the next-pointer of the list-head
and result in a panic later on:

[   34.501053] general protection fault: 0000 [#1] SMP KASAN PTI
[   34.501968] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.20.0-rc2.mptcp #108
[   34.502887] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.5.1 01/01/2011
[   34.504074] RIP: 0010:dev_gro_receive+0x343/0x1f90
[   34.504751] Code: e0 48 c1 e8 03 42 80 3c 30 00 0f 85 4a 1c 00 00 4d 8b 24 24 4c 39 65 d0 0f 84 0a 04 00 00 49 8d 7c 24 38 48 89 f8 48 c1 e8 03 <42> 0f b6 04 30 84 c0 74 08 3c 04
[   34.507060] RSP: 0018:ffff8883af507930 EFLAGS: 00010202
[   34.507761] RAX: 0000000000000007 RBX: ffff8883970b2c80 RCX: 1ffff11072e165a6
[   34.508640] RDX: 1ffff11075867008 RSI: ffff8883ac338040 RDI: 0000000000000038
[   34.509493] RBP: ffff8883af5079d0 R08: ffff8883970b2d40 R09: 0000000000000062
[   34.510346] R10: 0000000000000034 R11: 0000000000000000 R12: 0000000000000000
[   34.511215] R13: 0000000000000000 R14: dffffc0000000000 R15: ffff8883ac338008
[   34.512082] FS:  0000000000000000(0000) GS:ffff8883af500000(0000) knlGS:0000000000000000
[   34.513036] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   34.513741] CR2: 000055ccc3e9d020 CR3: 00000003abf32000 CR4: 00000000000006e0
[   34.514593] Call Trace:
[   34.514893]  <IRQ>
[   34.515157]  napi_gro_receive+0x93/0x150
[   34.515632]  receive_buf+0x893/0x3700
[   34.516094]  ? __netif_receive_skb+0x1f/0x1a0
[   34.516629]  ? virtnet_probe+0x1b40/0x1b40
[   34.517153]  ? __stable_node_chain+0x4d0/0x850
[   34.517684]  ? kfree+0x9a/0x180
[   34.518067]  ? __kasan_slab_free+0x171/0x190
[   34.518582]  ? detach_buf+0x1df/0x650
[   34.519061]  ? lapic_next_event+0x5a/0x90
[   34.519539]  ? virtqueue_get_buf_ctx+0x280/0x7f0
[   34.520093]  virtnet_poll+0x2df/0xd60
[   34.520533]  ? receive_buf+0x3700/0x3700
[   34.521027]  ? qdisc_watchdog_schedule_ns+0xd5/0x140
[   34.521631]  ? htb_dequeue+0x1817/0x25f0
[   34.522107]  ? sch_direct_xmit+0x142/0xf30
[   34.522595]  ? virtqueue_napi_schedule+0x26/0x30
[   34.523155]  net_rx_action+0x2f6/0xc50
[   34.523601]  ? napi_complete_done+0x2f0/0x2f0
[   34.524126]  ? kasan_check_read+0x11/0x20
[   34.524608]  ? _raw_spin_lock+0x7d/0xd0
[   34.525070]  ? _raw_spin_lock_bh+0xd0/0xd0
[   34.525563]  ? kvm_guest_apic_eoi_write+0x6b/0x80
[   34.526130]  ? apic_ack_irq+0x9e/0xe0
[   34.526567]  __do_softirq+0x188/0x4b5
[   34.527015]  irq_exit+0x151/0x180
[   34.527417]  do_IRQ+0xdb/0x150
[   34.527783]  common_interrupt+0xf/0xf
[   34.528223]  </IRQ>

This patch makes sure that skb->prev is set to NULL when entering
netem_enqueue.

Cc: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Fixes: 68d2f84a1368 ("net: gro: properly remove skb from list")
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-27 21:53:24 +02:00
Nicolas Dichtel fd0c4d6666 tun: forbid iface creation with rtnl ops
[ Upstream commit 35b827b6d06199841a83839e8bb69c0cd13a28be ]

It's not supported right now (the goal of the initial patch was to support
'ip link del' only).

Before the patch:
$ ip link add foo type tun
[  239.632660] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[snip]
[  239.636410] RIP: 0010:register_netdevice+0x8e/0x3a0

This panic occurs because dev->netdev_ops is not set by tun_setup(). But to
have something usable, it will require more than just setting
netdev_ops.

Fixes: f019a7a594 ("tun: Implement ip link del tunXXX")
CC: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-27 21:53:23 +02:00
Serhey Popovych 5c470b51cd tun: Consistently configure generic netdev params via rtnetlink
[ Upstream commit df52eab23d703142c766ac00bdb8db19d71238d0 ]

Configuring generic network device parameters on tun will fail in
presence of IFLA_INFO_KIND attribute in IFLA_LINKINFO nested attribute
since tun_validate() always return failure.

This can be visualized with following ip-link(8) command sequences:

  # ip link set dev tun0 group 100
  # ip link set dev tun0 group 100 type tun
  RTNETLINK answers: Invalid argument

with contrast to dummy and veth drivers:

  # ip link set dev dummy0 group 100
  # ip link set dev dummy0 type dummy

  # ip link set dev veth0 group 100
  # ip link set dev veth0 group 100 type veth

Fix by returning zero in tun_validate() when @data is NULL that is
always in case since rtnl_link_ops->maxtype is zero in tun driver.

Fixes: f019a7a594 ("tun: Implement ip link del tunXXX")
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-27 21:53:23 +02:00
Eric Dumazet 658fa93b69 rtnetlink: ndo_dflt_fdb_dump() only work for ARPHRD_ETHER devices
[ Upstream commit 688838934c231bb08f46db687e57f6d8bf82709c ]

kmsan was able to trigger a kernel-infoleak using a gre device [1]

nlmsg_populate_fdb_fill() has a hard coded assumption
that dev->addr_len is ETH_ALEN, as normally guaranteed
for ARPHRD_ETHER devices.

A similar issue was fixed recently in commit da71577545a5
("rtnetlink: Disallow FDB configuration for non-Ethernet device")

[1]
BUG: KMSAN: kernel-infoleak in copyout lib/iov_iter.c:143 [inline]
BUG: KMSAN: kernel-infoleak in _copy_to_iter+0x4c0/0x2700 lib/iov_iter.c:576
CPU: 0 PID: 6697 Comm: syz-executor310 Not tainted 4.20.0-rc3+ #95
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x32d/0x480 lib/dump_stack.c:113
 kmsan_report+0x12c/0x290 mm/kmsan/kmsan.c:683
 kmsan_internal_check_memory+0x32a/0xa50 mm/kmsan/kmsan.c:743
 kmsan_copy_to_user+0x78/0xd0 mm/kmsan/kmsan_hooks.c:634
 copyout lib/iov_iter.c:143 [inline]
 _copy_to_iter+0x4c0/0x2700 lib/iov_iter.c:576
 copy_to_iter include/linux/uio.h:143 [inline]
 skb_copy_datagram_iter+0x4e2/0x1070 net/core/datagram.c:431
 skb_copy_datagram_msg include/linux/skbuff.h:3316 [inline]
 netlink_recvmsg+0x6f9/0x19d0 net/netlink/af_netlink.c:1975
 sock_recvmsg_nosec net/socket.c:794 [inline]
 sock_recvmsg+0x1d1/0x230 net/socket.c:801
 ___sys_recvmsg+0x444/0xae0 net/socket.c:2278
 __sys_recvmsg net/socket.c:2327 [inline]
 __do_sys_recvmsg net/socket.c:2337 [inline]
 __se_sys_recvmsg+0x2fa/0x450 net/socket.c:2334
 __x64_sys_recvmsg+0x4a/0x70 net/socket.c:2334
 do_syscall_64+0xcf/0x110 arch/x86/entry/common.c:291
 entry_SYSCALL_64_after_hwframe+0x63/0xe7
RIP: 0033:0x441119
Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 db 0a fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fffc7f008a8 EFLAGS: 00000207 ORIG_RAX: 000000000000002f
RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000441119
RDX: 0000000000000040 RSI: 00000000200005c0 RDI: 0000000000000003
RBP: 00000000006cc018 R08: 0000000000000100 R09: 0000000000000100
R10: 0000000000000100 R11: 0000000000000207 R12: 0000000000402080
R13: 0000000000402110 R14: 0000000000000000 R15: 0000000000000000

Uninit was stored to memory at:
 kmsan_save_stack_with_flags mm/kmsan/kmsan.c:246 [inline]
 kmsan_save_stack mm/kmsan/kmsan.c:261 [inline]
 kmsan_internal_chain_origin+0x13d/0x240 mm/kmsan/kmsan.c:469
 kmsan_memcpy_memmove_metadata+0x1a9/0xf70 mm/kmsan/kmsan.c:344
 kmsan_memcpy_metadata+0xb/0x10 mm/kmsan/kmsan.c:362
 __msan_memcpy+0x61/0x70 mm/kmsan/kmsan_instr.c:162
 __nla_put lib/nlattr.c:744 [inline]
 nla_put+0x20a/0x2d0 lib/nlattr.c:802
 nlmsg_populate_fdb_fill+0x444/0x810 net/core/rtnetlink.c:3466
 nlmsg_populate_fdb net/core/rtnetlink.c:3775 [inline]
 ndo_dflt_fdb_dump+0x73a/0x960 net/core/rtnetlink.c:3807
 rtnl_fdb_dump+0x1318/0x1cb0 net/core/rtnetlink.c:3979
 netlink_dump+0xc79/0x1c90 net/netlink/af_netlink.c:2244
 __netlink_dump_start+0x10c4/0x11d0 net/netlink/af_netlink.c:2352
 netlink_dump_start include/linux/netlink.h:216 [inline]
 rtnetlink_rcv_msg+0x141b/0x1540 net/core/rtnetlink.c:4910
 netlink_rcv_skb+0x394/0x640 net/netlink/af_netlink.c:2477
 rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4965
 netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
 netlink_unicast+0x1699/0x1740 net/netlink/af_netlink.c:1336
 netlink_sendmsg+0x13c7/0x1440 net/netlink/af_netlink.c:1917
 sock_sendmsg_nosec net/socket.c:621 [inline]
 sock_sendmsg net/socket.c:631 [inline]
 ___sys_sendmsg+0xe3b/0x1240 net/socket.c:2116
 __sys_sendmsg net/socket.c:2154 [inline]
 __do_sys_sendmsg net/socket.c:2163 [inline]
 __se_sys_sendmsg+0x305/0x460 net/socket.c:2161
 __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2161
 do_syscall_64+0xcf/0x110 arch/x86/entry/common.c:291
 entry_SYSCALL_64_after_hwframe+0x63/0xe7

Uninit was created at:
 kmsan_save_stack_with_flags mm/kmsan/kmsan.c:246 [inline]
 kmsan_internal_poison_shadow+0x6d/0x130 mm/kmsan/kmsan.c:170
 kmsan_kmalloc+0xa1/0x100 mm/kmsan/kmsan_hooks.c:186
 __kmalloc+0x14c/0x4d0 mm/slub.c:3825
 kmalloc include/linux/slab.h:551 [inline]
 __hw_addr_create_ex net/core/dev_addr_lists.c:34 [inline]
 __hw_addr_add_ex net/core/dev_addr_lists.c:80 [inline]
 __dev_mc_add+0x357/0x8a0 net/core/dev_addr_lists.c:670
 dev_mc_add+0x6d/0x80 net/core/dev_addr_lists.c:687
 ip_mc_filter_add net/ipv4/igmp.c:1128 [inline]
 igmp_group_added+0x4d4/0xb80 net/ipv4/igmp.c:1311
 __ip_mc_inc_group+0xea9/0xf70 net/ipv4/igmp.c:1444
 ip_mc_inc_group net/ipv4/igmp.c:1453 [inline]
 ip_mc_up+0x1c3/0x400 net/ipv4/igmp.c:1775
 inetdev_event+0x1d03/0x1d80 net/ipv4/devinet.c:1522
 notifier_call_chain kernel/notifier.c:93 [inline]
 __raw_notifier_call_chain kernel/notifier.c:394 [inline]
 raw_notifier_call_chain+0x13d/0x240 kernel/notifier.c:401
 __dev_notify_flags+0x3da/0x860 net/core/dev.c:1733
 dev_change_flags+0x1ac/0x230 net/core/dev.c:7569
 do_setlink+0x165f/0x5ea0 net/core/rtnetlink.c:2492
 rtnl_newlink+0x2ad7/0x35a0 net/core/rtnetlink.c:3111
 rtnetlink_rcv_msg+0x1148/0x1540 net/core/rtnetlink.c:4947
 netlink_rcv_skb+0x394/0x640 net/netlink/af_netlink.c:2477
 rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4965
 netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
 netlink_unicast+0x1699/0x1740 net/netlink/af_netlink.c:1336
 netlink_sendmsg+0x13c7/0x1440 net/netlink/af_netlink.c:1917
 sock_sendmsg_nosec net/socket.c:621 [inline]
 sock_sendmsg net/socket.c:631 [inline]
 ___sys_sendmsg+0xe3b/0x1240 net/socket.c:2116
 __sys_sendmsg net/socket.c:2154 [inline]
 __do_sys_sendmsg net/socket.c:2163 [inline]
 __se_sys_sendmsg+0x305/0x460 net/socket.c:2161
 __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2161
 do_syscall_64+0xcf/0x110 arch/x86/entry/common.c:291
 entry_SYSCALL_64_after_hwframe+0x63/0xe7

Bytes 36-37 of 105 are uninitialized
Memory access of size 105 starts at ffff88819686c000
Data copied to user address 0000000020000380

Fixes: d83b060360 ("net: add fdb generic dump routine")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Ido Schimmel <idosch@mellanox.com>
Cc: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-27 21:53:23 +02:00
Stefano Brivio e6eb2e2e94 ipv6: Check available headroom in ip6_xmit() even without options
[ Upstream commit 66033f47ca60294a95fc85ec3a3cc909dab7b765 ]

Even if we send an IPv6 packet without options, MAX_HEADER might not be
enough to account for the additional headroom required by alignment of
hardware headers.

On a configuration without HYPERV_NET, WLAN, AX25, and with IPV6_TUNNEL,
sending short SCTP packets over IPv4 over L2TP over IPv6, we start with
100 bytes of allocated headroom in sctp_packet_transmit(), end up with 54
bytes after l2tp_xmit_skb(), and 14 bytes in ip6_finish_output2().

Those would be enough to append our 14 bytes header, but we're going to
align that to 16 bytes, and write 2 bytes out of the allocated slab in
neigh_hh_output().

KASan says:

[  264.967848] ==================================================================
[  264.967861] BUG: KASAN: slab-out-of-bounds in ip6_finish_output2+0x1aec/0x1c70
[  264.967866] Write of size 16 at addr 000000006af1c7fe by task netperf/6201
[  264.967870]
[  264.967876] CPU: 0 PID: 6201 Comm: netperf Not tainted 4.20.0-rc4+ #1
[  264.967881] Hardware name: IBM 2827 H43 400 (z/VM 6.4.0)
[  264.967887] Call Trace:
[  264.967896] ([<00000000001347d6>] show_stack+0x56/0xa0)
[  264.967903]  [<00000000017e379c>] dump_stack+0x23c/0x290
[  264.967912]  [<00000000007bc594>] print_address_description+0xf4/0x290
[  264.967919]  [<00000000007bc8fc>] kasan_report+0x13c/0x240
[  264.967927]  [<000000000162f5e4>] ip6_finish_output2+0x1aec/0x1c70
[  264.967935]  [<000000000163f890>] ip6_finish_output+0x430/0x7f0
[  264.967943]  [<000000000163fe44>] ip6_output+0x1f4/0x580
[  264.967953]  [<000000000163882a>] ip6_xmit+0xfea/0x1ce8
[  264.967963]  [<00000000017396e2>] inet6_csk_xmit+0x282/0x3f8
[  264.968033]  [<000003ff805fb0ba>] l2tp_xmit_skb+0xe02/0x13e0 [l2tp_core]
[  264.968037]  [<000003ff80631192>] l2tp_eth_dev_xmit+0xda/0x150 [l2tp_eth]
[  264.968041]  [<0000000001220020>] dev_hard_start_xmit+0x268/0x928
[  264.968069]  [<0000000001330e8e>] sch_direct_xmit+0x7ae/0x1350
[  264.968071]  [<000000000122359c>] __dev_queue_xmit+0x2b7c/0x3478
[  264.968075]  [<00000000013d2862>] ip_finish_output2+0xce2/0x11a0
[  264.968078]  [<00000000013d9b14>] ip_finish_output+0x56c/0x8c8
[  264.968081]  [<00000000013ddd1e>] ip_output+0x226/0x4c0
[  264.968083]  [<00000000013dbd6c>] __ip_queue_xmit+0x894/0x1938
[  264.968100]  [<000003ff80bc3a5c>] sctp_packet_transmit+0x29d4/0x3648 [sctp]
[  264.968116]  [<000003ff80b7bf68>] sctp_outq_flush_ctrl.constprop.5+0x8d0/0xe50 [sctp]
[  264.968131]  [<000003ff80b7c716>] sctp_outq_flush+0x22e/0x7d8 [sctp]
[  264.968146]  [<000003ff80b35c68>] sctp_cmd_interpreter.isra.16+0x530/0x6800 [sctp]
[  264.968161]  [<000003ff80b3410a>] sctp_do_sm+0x222/0x648 [sctp]
[  264.968177]  [<000003ff80bbddac>] sctp_primitive_ASSOCIATE+0xbc/0xf8 [sctp]
[  264.968192]  [<000003ff80b93328>] __sctp_connect+0x830/0xc20 [sctp]
[  264.968208]  [<000003ff80bb11ce>] sctp_inet_connect+0x2e6/0x378 [sctp]
[  264.968212]  [<0000000001197942>] __sys_connect+0x21a/0x450
[  264.968215]  [<000000000119aff8>] sys_socketcall+0x3d0/0xb08
[  264.968218]  [<000000000184ea7a>] system_call+0x2a2/0x2c0

[...]

Just like ip_finish_output2() does for IPv4, check that we have enough
headroom in ip6_xmit(), and reallocate it if we don't.

This issue is older than git history.

Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-27 21:53:22 +02:00
Stefano Brivio aac8a8ed57 neighbour: Avoid writing before skb->head in neigh_hh_output()
[ Upstream commit e6ac64d4c4d095085d7dd71cbd05704ac99829b2 ]

While skb_push() makes the kernel panic if the skb headroom is less than
the unaligned hardware header size, it will proceed normally in case we
copy more than that because of alignment, and we'll silently corrupt
adjacent slabs.

In the case fixed by the previous patch,
"ipv6: Check available headroom in ip6_xmit() even without options", we
end up in neigh_hh_output() with 14 bytes headroom, 14 bytes hardware
header and write 16 bytes, starting 2 bytes before the allocated buffer.

Always check we're not writing before skb->head and, if the headroom is
not enough, warn and drop the packet.

v2:
 - instead of panicking with BUG_ON(), WARN_ON_ONCE() and drop the packet
   (Eric Dumazet)
 - if we avoid the panic, though, we need to explicitly check the headroom
   before the memcpy(), otherwise we'll have corrupted slabs on a running
   kernel, after we warn
 - use __skb_push() instead of skb_push(), as the headroom check is
   already implemented here explicitly (Eric Dumazet)

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-27 21:53:22 +02:00
Theodore Ts'o 094d1156ef ext4: use prandom_u32() instead of get_random_bytes()
Many of the uses of get_random_bytes() do not actually need
cryptographically secure random numbers.  Replace those uses with a
call to prandom_u32(), which is faster and which doesn't consume
entropy from the /dev/random driver.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2019-07-27 21:53:22 +02:00
Thomas Gleixner 5a375bae39 posix-timers: Sanitize overrun handling
commit 78c9c4dfbf8c04883941445a195276bb4bb92c76 upstream.

The posix timer overrun handling is broken because the forwarding functions
can return a huge number of overruns which does not fit in an int. As a
consequence timer_getoverrun(2) and siginfo::si_overrun can turn into
random number generators.

The k_clock::timer_forward() callbacks return a 64 bit value now. Make
k_itimer::ti_overrun[_last] 64bit as well, so the kernel internal
accounting is correct. 3Remove the temporary (int) casts.

Add a helper function which clamps the overrun value returned to user space
via timer_getoverrun(2) or siginfo::si_overrun limited to a positive value
between 0 and INT_MAX. INT_MAX is an indicator for user space that the
overrun value has been clamped.

Reported-by: Team OWL337 <icytxw@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: John Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Link: https://lkml.kernel.org/r/20180626132705.018623573@linutronix.de
[bwh: Backported to 3.16: adjust filenames, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:53:21 +02:00
Alexey Dobriyan a546302c00 posix_cpu_timer: Exit early when process has been reaped
commit 2c13ce8f6b2f6fd9ba2f9261b1939fc0f62d1307 upstream.

Variable "now" seems to be genuinely used unintialized
if branch

	if (CPUCLOCK_PERTHREAD(timer->it_clock)) {

is not taken and branch

	if (unlikely(sighand == NULL)) {

is taken. In this case the process has been reaped and the timer is marked as
disarmed anyway. So none of the postprocessing of the sample is
required. Return right away.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Link: http://lkml.kernel.org/r/20160707223911.GA26483@p183.telecom.by
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bwh: Backported to 3.16: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:53:21 +02:00
Frederic Weisbecker bb914eb2f0 posix-timers: Convert abuses of BUG_ON to WARN_ON
The posix cpu timers code makes a heavy use of BUG_ON()
but none of these concern fatal issues that require
to stop the machine. So let's just warn the user when
some internal state slips out of our hands.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
2019-07-27 21:53:21 +02:00
Frederic Weisbecker 71fa4dee41 posix-timers: Remove remaining uses of tasklist_lock
The remaining uses of tasklist_lock were mostly about synchronizing
against sighand modifications, getting coherent and safe group samples
and also thread/process wide timers list handling.

All of this is already safely synchronizable with the target's
sighand lock. Let's use it on these places instead.

Also update the comments about locking.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
2019-07-27 21:53:20 +02:00
Frederic Weisbecker 780a6a6e21 posix-timers: Use sighand lock instead of tasklist_lock on timer deletion
Timer deletion doesn't need the tasklist lock.
We need to protect against:

* concurrent access to the lists p->cputime_expires and
  p->sighand->cputime_expires

* task reaping that may also delete the timer list entry

* timer firing

We already hold the timer lock which protects us against concurrent
timer firing.

The rest only need the targets sighand to be locked.
So hold it and drop the use of tasklist_lock there.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
2019-07-27 21:53:20 +02:00
Frederic Weisbecker e9f47465d0 posix-timers: Use sighand lock instead of tasklist_lock for task clock sample
There is no need for the tasklist_lock just to take a process
wide clock sample.

All we need is to get a coherent sample that doesn't race with
exit() and exec():

* exit() may be concurrently reaping a task and flushing its time

* sighand is unstable under exit() and exec(), and the latter also
  result in group leader that can change

To protect against these, locking the target's sighand is enough.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
2019-07-27 21:53:20 +02:00
Frederic Weisbecker 82ec39fc47 posix-timers: Consolidate posix_cpu_clock_get()
Consolidate the clock sampling common code used for both local
and remote targets.

Note that this introduces a tiny user ABI change: if a
PID is passed to clock_gettime() along the clockid,
we used to forbid a process wide clock sample when that
PID doesn't belong to a group leader. Now after this patch
we allow process wide clock samples if that PID belongs to
the current task, even if the current task is not the
group leader.

But local process wide clock samples are allowed if PID == 0
(current task) even if the current task is not the group leader.
So in the end this should be no big deal as this actually harmonize
the behaviour when the remote sample is actually a local one.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
2019-07-27 21:53:19 +02:00
Frederic Weisbecker 656f9d43fd posix-timers: Remove useless clock sample on timers cleanup
a0b2062b0904ef07944c4a6e4d0f88ee44f1e9f2
("posix_timers: fix racy timer delta caching on task exit") forgot
to remove the arguments used for timer caching.

Fix this leftover.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
2019-07-27 21:53:19 +02:00
Frederic Weisbecker 5ed9110403 posix-timers: Remove dead task special case
Now that we've removed all the optimizations that could
result in NULL timer's targets, we can remove all the
associated special case handling.

Also add some warnings on NULL targets to spot any possible
leftover.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
2019-07-27 21:53:18 +02:00
Frederic Weisbecker 336acd1503 posix-timers: Cleanup reaped target handling
When a timer's target is seen to be buried, for example on calls
to timer_gettime(), the posix cpu timers code behaves a bit
like a garbage collector and releases early the reference to the
task.

Then again, this optimization complicates the code for no much
value: it's up to the user to release the timer and its associated
ressources by calling timer_delete() after it buries the target
tasks.

Remove this to simplify the code.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
2019-07-27 21:53:18 +02:00
Frederic Weisbecker 10d3aa607e posix-timers: Remove dead process posix cpu timers caching
Now that we removed dead thread posix cpu timers caching,
lets remove the dead process wide version. This caching
is similar to the per thread version but it should be even
more rare:

* If the process id dead, we are not reading its timers
status from a thread belonging to its group since they
are all dead. So this caching only concern remote process
timers reads. Now posix cpu timers using itimers or timer_settime()
can't do remote process timers anyway so it's not even clear if there
is actually a user for this caching.

* Unlike per thread timers caching, this only applies to
zombies targets. Buried targets' process wide timers return
0 values. But then again, timer_gettime() can't read remote
process timers, so if the process is dead, there can't be
any reader left anyway.

Then again this caching seem to complicate the code for
corner cases that are probably not worth it. So lets get
rid of it.

Also remove the sample snapshot on dying process timer
that is now useless, as suggested by Kosaki.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
2019-07-27 21:53:18 +02:00
Frederic Weisbecker f759669a4a posix-timers: Remove dead thread posix cpu timers caching
When a task is exiting or has exited, its posix cpu timers
don't tick anymore and won't elapse further. It's too late
for them to expire.

So any further call to timer_gettime() on these timers will
return the same remaining expiry time.

The current code optimize this by caching the remaining delta
and storing it where we use to save the absolute expiration time.
This way, the future calls to timer_gettime() won't need to
compute the difference between the absolute expiration time and
the current time anymore.

Now this optimization doesn't seem to bring much value. Computing
the timer remaining delta is not very costly. Fetching the timer
value OTOH can be costly in two ways:

* CPUCLOCK_SCHED read requires to lock the target's rq. But some
optimizations are on the way to make task_sched_runtime() not holding
the rq lock of a non-running target.

* CPUCLOCK_VIRT/CPUCLOCK_PROF read simply consist in fetching
current->utime/current->stime except when the system uses full
dynticks cputime accounting. The latter requires a per task lock
in order to correctly compute user and system time. But once the
target is dead, this lock shouldn't be contended anyway.

All in one this caching doesn't seem to be justified.
Given that it complicates the code significantly for
few wins, let's remove it on single thread timers.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
2019-07-27 21:53:17 +02:00
Frederic Weisbecker 7cf46e5532 posix-timers: Fix full dynticks CPUs kick on timer rescheduling
A posix CPU timer can be rearmed while it is firing or after it is
notified with a signal. This can happen for example with timers that
were set with a non zero interval in timer_settime().

This rearming can happen in two places:

1) On timer firing time, which happens on the target's tick. If the timer
can't trigger a signal because it is ignored, it reschedules itself
to honour the timer interval.

2) On signal handling from the timer's notification target. This one
can be a different task than the timer's target itself. Once the
signal is notified, the notification target rearms the timer, again
to honour the timer interval.

When a timer is rearmed, we need to notify the full dynticks CPUs
such that they restart their tick in case they are running tasks that
may have a share in elapsing this timer.

Now the 1st case above handles full dynticks CPUs with a call to
posix_cpu_timer_kick_nohz() from the posix cpu timer firing code. But
the second case ignores the fact that some CPUs may run non-idle tasks
with their tick off. As a result, when a timer is resheduled after its signal
notification, the full dynticks CPUs may completely ignore it and not
tick on the timer as expected

This patch fixes this bug by handling both cases in one. All we need
is to move the kick to the rearming common code in posix_cpu_timer_schedule().

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Olivier Langlois <olivier@olivierlanglois.net>
2019-07-27 21:53:17 +02:00
Frederic Weisbecker 667fa8f7db posix-timers: Spare workqueue if there is no full dynticks CPU to kick
After a posix cpu timer is set, a workqueue is scheduled in order to
kick the full dynticks CPUs and let them restart their tick if
necessary in case the task they are running is concerned by the
new timer.

This kick is implemented by way of IPIs, which require interrupts
to be enabled, hence the need for a workqueue to raise them because
the posix cpu timer set path has interrupts disabled.

Now if there is no full dynticks CPU on the system, the workqueue is
still scheduled but it simply won't send any IPI and return immediately.

So lets spare that worqueue when it is not needed.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
2019-07-27 21:53:17 +02:00
Frederic Weisbecker 2f9679df3b posix_timers: fix racy timer delta caching on task exit
When a task exits, we perform a caching of the remaining cputime delta
before expiring of its timers.

This is done from the following places:

* When the task is reaped. We iterate through its list of
  posix cpu timers and store the remaining timer delta to
  the timer struct instead of the absolute value.
  (See posix_cpu_timers_exit() / posix_cpu_timers_exit_group() )

* When we call posix_cpu_timer_get() or posix_cpu_timer_schedule().
  If the timer's task is considered dying when watched from these
  places, the same conversion from absolute to relative expiry time
  is performed. Then the given task's reference is released.
  (See clear_dead_task() ).

The relevance of this caching is questionable but this is another
and deeper debate.

The big issue here is that these two sources of caching don't mix
up very well together.

More specifically, the caching can easily be done twice, resulting
in a wrong delta as it gets spuriously substracted a second time by
the elapsed clock. This can happen in the following scenario:

1) The task exits and gets reaped: we call posix_cpu_timers_exit()
   and the absolute timer expiry values are converted to a relative
   delta.

2) timer_gettime() -> posix_cpu_timer_get() is called and relies on
   clear_dead_task() because  tsk->exit_state == EXIT_DEAD.
   The delta gets substracted again by the elapsed clock and we return
   a wrong result.

To fix this, just remove the caching done on task reaping time.  It
doesn't bring much value on its own.  The caching done from
posix_cpu_timer_get/schedule is enough.

And it would also be hard to get it really right: we could make it put and
clear the target task in the timer struct so that readers know if they are
dealing with a relative cached of absolute value.  But it would be racy.
The only safe way to do it would be to lock the itimer->it_lock so that we
know nobody reads the cputime expiry value while we modify it and its
target task reference.  Doing so would involve some funny workarounds to
avoid circular lock against the sighand lock.  There is just no reason to
maintain this.

The user visible effect of this patch can be observed by running the
following code: it creates a subthread that launches a posix cputimer
which expires after 10 seconds. But then the subthread only busy loops for 2
seconds and exits. The parent reaps the subthread and read the timer value.
Its expected value should the be the initial timer's expiration value
minus the cputime elapsed in the subthread. Roughly 10 - 2 = 8 seconds:

	#include <sys/time.h>
	#include <stdio.h>
	#include <unistd.h>
	#include <time.h>
	#include <pthread.h>

	static timer_t id;
	static struct itimerspec val = { .it_value.tv_sec = 10, }, new;

	static void *thread(void *unused)
	{
		int err;
		struct timeval start, end, diff;

		timer_create(CLOCK_THREAD_CPUTIME_ID, NULL, &id);
		if (err < 0) {
			perror("Can't create timer\n");
			return NULL;
		}

		/* Arm 10 sec timer */
		err = timer_settime(id, 0, &val, NULL);
		if (err < 0) {
			perror("Can't set timer\n");
			return NULL;
		}

		/* Exit after 2 seconds of execution */
		gettimeofday(&start, NULL);
	        do {
			gettimeofday(&end, NULL);
			timersub(&end, &start, &diff);
		} while (diff.tv_sec < 2);

		return NULL;
	}

	int main(int argc, char **argv)
	{
		pthread_t pthread;
		int err;

		err = pthread_create(&pthread, NULL, thread, NULL);
		if (err) {
			perror("Can't create thread\n");
			return -1;
		}
		pthread_join(pthread, NULL);
		/* Just wait a little bit to make sure the child got reaped */
		sleep(1);
		err = timer_gettime(id, &new);
		if (err)
			perror("Can't get timer value\n");
		printf("%d %ld\n", new.it_value.tv_sec, new.it_value.tv_nsec);

		return 0;
	}

Before the patch:

       $ ./posix_cpu_timers
       6 2278074

After the patch:

      $ ./posix_cpu_timers
      8 1158766

Before the patch, the elapsed time got two more seconds spuriously accounted.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Olivier Langlois <olivier@trillion01.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2019-07-27 21:53:16 +02:00
Frederic Weisbecker 952fb3f8b3 posix-timers: correctly get dying task time sample in posix_cpu_timer_schedule()
In order to re-arm a timer after it fired, we take a sample of the current
process or thread cputime.

If the task is dying though, we don't arm anything but we cache the
remaining timer expiration delta for further reads.

Something similar is performed in posix_cpu_timer_get() but here we forget
to take the process wide cputime sample before caching it.

As a result we are storing random stack content, leading every further
reads of that timer to return junk values.

Fix this by taking the appropriate sample in the case of process wide
timers.

This probably doesn't matter much in practice because, at this stage, the
thread is the last one in the group and we reached exit_notify().  This
implies that we called exit_itimers() and there should be no more timers
to handle for that task.

So this is likely dead code anyway but let's fix the current logic
and the warning that came along:

    kernel/posix-cpu-timers.c: In function 'posix_cpu_timer_schedule':
    kernel/posix-cpu-timers.c:1127: warning: 'now' may be used uninitialized in this function

Then we can start to think further about cleaning up that code.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Reported-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Chen Gang <gang.chen@asianux.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Olivier Langlois <olivier@trillion01.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2019-07-27 21:53:16 +02:00
Frederic Weisbecker 4f7b34f303 posix_cpu_timers: consolidate expired timers check
Consolidate the common code amongst per thread and per process timers list
on tick time.

List traversal, expiry check and subsequent updates can be shared in a
common helper.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Olivier Langlois <olivier@trillion01.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2019-07-27 21:53:16 +02:00
Frederic Weisbecker dda98a55b5 posix_cpu_timers: consolidate timer list cleanups
Cleaning up the posix cpu timers on task exit shares some common code
among timer list types, most notably the list traversal and expiry time
update.

Unify this in a common helper.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Olivier Langlois <olivier@trillion01.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2019-07-27 21:53:15 +02:00
Frederic Weisbecker 8824678163 posix_cpu_timer: consolidate expiry time type
The posix cpu timer expiry time is stored in a union of two types: a 64
bits field if we rely on scheduler precise accounting, or a cputime_t if
we rely on jiffies.

This results in quite some duplicate code and special cases to handle the
two types.

Just unify this into a single 64 bits field.  cputime_t can always fit
into it.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Olivier Langlois <olivier@trillion01.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2019-07-27 21:53:15 +02:00
Eric Dumazet c4cd7fe732 net: make skb_partial_csum_set() more robust against overflows
commit 52b5d6f5dcf0e5201392f7d417148ccb537dbf6f upstream.

syzbot managed to crash in skb_checksum_help() [1] :

        BUG_ON(offset + sizeof(__sum16) > skb_headlen(skb));

Root cause is the following check in skb_partial_csum_set()

	if (unlikely(start > skb_headlen(skb)) ||
	    unlikely((int)start + off > skb_headlen(skb) - 2))
		return false;

If skb_headlen(skb) is 1, then (skb_headlen(skb) - 2) becomes 0xffffffff
and the check fails to detect that ((int)start + off) is off the limit,
since the compare is unsigned.

When we fix that, then the first condition (start > skb_headlen(skb))
becomes obsolete.

Then we should also check that (skb_headroom(skb) + start) wont
overflow 16bit field.

[1]
kernel BUG at net/core/dev.c:2880!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 7330 Comm: syz-executor4 Not tainted 4.19.0-rc6+ #253
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:skb_checksum_help+0x9e3/0xbb0 net/core/dev.c:2880
Code: 85 00 ff ff ff 48 c1 e8 03 42 80 3c 28 00 0f 84 09 fb ff ff 48 8b bd 00 ff ff ff e8 97 a8 b9 fb e9 f8 fa ff ff e8 2d 09 76 fb <0f> 0b 48 8b bd 28 ff ff ff e8 1f a8 b9 fb e9 b1 f6 ff ff 48 89 cf
RSP: 0018:ffff8801d83a6f60 EFLAGS: 00010293
RAX: ffff8801b9834380 RBX: ffff8801b9f8d8c0 RCX: ffffffff8608c6d7
RDX: 0000000000000000 RSI: ffffffff8608cc63 RDI: 0000000000000006
RBP: ffff8801d83a7068 R08: ffff8801b9834380 R09: 0000000000000000
R10: ffff8801d83a76d8 R11: 0000000000000000 R12: 0000000000000001
R13: 0000000000010001 R14: 000000000000ffff R15: 00000000000000a8
FS:  00007f1a66db5700(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f7d77f091b0 CR3: 00000001ba252000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 skb_csum_hwoffload_help+0x8f/0xe0 net/core/dev.c:3269
 validate_xmit_skb+0xa2a/0xf30 net/core/dev.c:3312
 __dev_queue_xmit+0xc2f/0x3950 net/core/dev.c:3797
 dev_queue_xmit+0x17/0x20 net/core/dev.c:3838
 packet_snd net/packet/af_packet.c:2928 [inline]
 packet_sendmsg+0x422d/0x64c0 net/packet/af_packet.c:2953

Fixes: 5ff8dda303 ("net: Ensure partial checksum offset is inside the skb head")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:53:15 +02:00
Wei Wang f10b1dc4b2 ipv6: take rcu lock in rawv6_send_hdrinc()
commit a688caa34beb2fd2a92f1b6d33e40cde433ba160 upstream.

In rawv6_send_hdrinc(), in order to avoid an extra dst_hold(), we
directly assign the dst to skb and set passed in dst to NULL to avoid
double free.
However, in error case, we free skb and then do stats update with the
dst pointer passed in. This causes use-after-free on the dst.
Fix it by taking rcu read lock right before dst could get released to
make sure dst does not get freed until the stats update is done.
Note: we don't have this issue in ipv4 cause dst is not used for stats
update in v4.

Syzkaller reported following crash:
BUG: KASAN: use-after-free in rawv6_send_hdrinc net/ipv6/raw.c:692 [inline]
BUG: KASAN: use-after-free in rawv6_sendmsg+0x4421/0x4630 net/ipv6/raw.c:921
Read of size 8 at addr ffff8801d95ba730 by task syz-executor0/32088

CPU: 1 PID: 32088 Comm: syz-executor0 Not tainted 4.19.0-rc2+ #93
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1c4/0x2b4 lib/dump_stack.c:113
 print_address_description.cold.8+0x9/0x1ff mm/kasan/report.c:256
 kasan_report_error mm/kasan/report.c:354 [inline]
 kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412
 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
 rawv6_send_hdrinc net/ipv6/raw.c:692 [inline]
 rawv6_sendmsg+0x4421/0x4630 net/ipv6/raw.c:921
 inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798
 sock_sendmsg_nosec net/socket.c:621 [inline]
 sock_sendmsg+0xd5/0x120 net/socket.c:631
 ___sys_sendmsg+0x7fd/0x930 net/socket.c:2114
 __sys_sendmsg+0x11d/0x280 net/socket.c:2152
 __do_sys_sendmsg net/socket.c:2161 [inline]
 __se_sys_sendmsg net/socket.c:2159 [inline]
 __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2159
 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457099
Code: fd b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f83756edc78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f83756ee6d4 RCX: 0000000000457099
RDX: 0000000000000000 RSI: 0000000020003840 RDI: 0000000000000004
RBP: 00000000009300a0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 00000000004d4b30 R14: 00000000004c90b1 R15: 0000000000000000

Allocated by task 32088:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:448
 set_track mm/kasan/kasan.c:460 [inline]
 kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553
 kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
 kmem_cache_alloc+0x12e/0x730 mm/slab.c:3554
 dst_alloc+0xbb/0x1d0 net/core/dst.c:105
 ip6_dst_alloc+0x35/0xa0 net/ipv6/route.c:353
 ip6_rt_cache_alloc+0x247/0x7b0 net/ipv6/route.c:1186
 ip6_pol_route+0x8f8/0xd90 net/ipv6/route.c:1895
 ip6_pol_route_output+0x54/0x70 net/ipv6/route.c:2093
 fib6_rule_lookup+0x277/0x860 net/ipv6/fib6_rules.c:122
 ip6_route_output_flags+0x2c5/0x350 net/ipv6/route.c:2121
 ip6_route_output include/net/ip6_route.h:88 [inline]
 ip6_dst_lookup_tail+0xe27/0x1d60 net/ipv6/ip6_output.c:951
 ip6_dst_lookup_flow+0xc8/0x270 net/ipv6/ip6_output.c:1079
 rawv6_sendmsg+0x12d9/0x4630 net/ipv6/raw.c:905
 inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798
 sock_sendmsg_nosec net/socket.c:621 [inline]
 sock_sendmsg+0xd5/0x120 net/socket.c:631
 ___sys_sendmsg+0x7fd/0x930 net/socket.c:2114
 __sys_sendmsg+0x11d/0x280 net/socket.c:2152
 __do_sys_sendmsg net/socket.c:2161 [inline]
 __se_sys_sendmsg net/socket.c:2159 [inline]
 __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2159
 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 5356:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:448
 set_track mm/kasan/kasan.c:460 [inline]
 __kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521
 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
 __cache_free mm/slab.c:3498 [inline]
 kmem_cache_free+0x83/0x290 mm/slab.c:3756
 dst_destroy+0x267/0x3c0 net/core/dst.c:141
 dst_destroy_rcu+0x16/0x19 net/core/dst.c:154
 __rcu_reclaim kernel/rcu/rcu.h:236 [inline]
 rcu_do_batch kernel/rcu/tree.c:2576 [inline]
 invoke_rcu_callbacks kernel/rcu/tree.c:2880 [inline]
 __rcu_process_callbacks kernel/rcu/tree.c:2847 [inline]
 rcu_process_callbacks+0xf23/0x2670 kernel/rcu/tree.c:2864
 __do_softirq+0x30b/0xad8 kernel/softirq.c:292

Fixes: 1789a640f5 ("raw: avoid two atomics in xmit")
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.16:
 - We don't set tstamp here
 - Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:53:14 +02:00
Eric W. Biederman bee3e8dc10 ipv6: Compute net once in raw6_send_hdrinc
commit adb28c9d3371c845c7a28bfd4fb163aca0d0dc37 upstream.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:53:14 +02:00
Peter Zijlstra fa34295619 perf/core: Fix perf_pmu_unregister() locking
commit a9f9772114c8b07ae75bcb3654bd017461248095 upstream.

When we unregister a PMU, we fail to serialize the @pmu_idr properly.
Fix that by doing the entire thing under pmu_lock.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: 2e80a82a49 ("perf: Dynamic pmu types")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.16:
 - Also remove "out" label in free_pmu_context()
 - Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:53:14 +02:00
Jiri Olsa cf7f578544 perf/core: Protect PMU device removal with a 'pmu_bus_running' check, to fix CONFIG_DEBUG_TEST_DRIVER_REMOVE=y kernel panic
commit 0933840acf7b65d6d30a5b6089d882afea57aca3 upstream.

CAI Qian reported a crash in the PMU uncore device removal code,
enabled by the CONFIG_DEBUG_TEST_DRIVER_REMOVE=y option:

  https://marc.info/?l=linux-kernel&m=147688837328451

The reason for the crash is that perf_pmu_unregister() tries to remove
a PMU device which is not added at this point. We add PMU devices
only after pmu_bus is registered, which happens in the
perf_event_sysfs_init() call and sets the 'pmu_bus_running' flag.

The fix is to get the 'pmu_bus_running' flag state at the point
the PMU is taken out of the PMU list and remove the device
later only if it's set.

Reported-by: CAI Qian <caiqian@redhat.com>
Tested-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161020111011.GA13361@krava
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.16: no address filter attributes to clean up]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:53:13 +02:00
Joel Fernandes (Google) 4642f0d0c9 mm: shmem.c: Correctly annotate new inodes for lockdep
commit b45d71fb89ab8adfe727b9d0ee188ed58582a647 upstream.

Directories and inodes don't necessarily need to be in the same lockdep
class.  For ex, hugetlbfs splits them out too to prevent false positives
in lockdep.  Annotate correctly after new inode creation.  If its a
directory inode, it will be put into a different class.

This should fix a lockdep splat reported by syzbot:

> ======================================================
> WARNING: possible circular locking dependency detected
> 4.18.0-rc8-next-20180810+ #36 Not tainted
> ------------------------------------------------------
> syz-executor900/4483 is trying to acquire lock:
> 00000000d2bfc8fe (&sb->s_type->i_mutex_key#9){++++}, at: inode_lock
> include/linux/fs.h:765 [inline]
> 00000000d2bfc8fe (&sb->s_type->i_mutex_key#9){++++}, at:
> shmem_fallocate+0x18b/0x12e0 mm/shmem.c:2602
>
> but task is already holding lock:
> 0000000025208078 (ashmem_mutex){+.+.}, at: ashmem_shrink_scan+0xb4/0x630
> drivers/staging/android/ashmem.c:448
>
> which lock already depends on the new lock.
>
> -> #2 (ashmem_mutex){+.+.}:
>        __mutex_lock_common kernel/locking/mutex.c:925 [inline]
>        __mutex_lock+0x171/0x1700 kernel/locking/mutex.c:1073
>        mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:1088
>        ashmem_mmap+0x55/0x520 drivers/staging/android/ashmem.c:361
>        call_mmap include/linux/fs.h:1844 [inline]
>        mmap_region+0xf27/0x1c50 mm/mmap.c:1762
>        do_mmap+0xa10/0x1220 mm/mmap.c:1535
>        do_mmap_pgoff include/linux/mm.h:2298 [inline]
>        vm_mmap_pgoff+0x213/0x2c0 mm/util.c:357
>        ksys_mmap_pgoff+0x4da/0x660 mm/mmap.c:1585
>        __do_sys_mmap arch/x86/kernel/sys_x86_64.c:100 [inline]
>        __se_sys_mmap arch/x86/kernel/sys_x86_64.c:91 [inline]
>        __x64_sys_mmap+0xe9/0x1b0 arch/x86/kernel/sys_x86_64.c:91
>        do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
>        entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> -> #1 (&mm->mmap_sem){++++}:
>        __might_fault+0x155/0x1e0 mm/memory.c:4568
>        _copy_to_user+0x30/0x110 lib/usercopy.c:25
>        copy_to_user include/linux/uaccess.h:155 [inline]
>        filldir+0x1ea/0x3a0 fs/readdir.c:196
>        dir_emit_dot include/linux/fs.h:3464 [inline]
>        dir_emit_dots include/linux/fs.h:3475 [inline]
>        dcache_readdir+0x13a/0x620 fs/libfs.c:193
>        iterate_dir+0x48b/0x5d0 fs/readdir.c:51
>        __do_sys_getdents fs/readdir.c:231 [inline]
>        __se_sys_getdents fs/readdir.c:212 [inline]
>        __x64_sys_getdents+0x29f/0x510 fs/readdir.c:212
>        do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
>        entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> -> #0 (&sb->s_type->i_mutex_key#9){++++}:
>        lock_acquire+0x1e4/0x540 kernel/locking/lockdep.c:3924
>        down_write+0x8f/0x130 kernel/locking/rwsem.c:70
>        inode_lock include/linux/fs.h:765 [inline]
>        shmem_fallocate+0x18b/0x12e0 mm/shmem.c:2602
>        ashmem_shrink_scan+0x236/0x630 drivers/staging/android/ashmem.c:455
>        ashmem_ioctl+0x3ae/0x13a0 drivers/staging/android/ashmem.c:797
>        vfs_ioctl fs/ioctl.c:46 [inline]
>        file_ioctl fs/ioctl.c:501 [inline]
>        do_vfs_ioctl+0x1de/0x1720 fs/ioctl.c:685
>        ksys_ioctl+0xa9/0xd0 fs/ioctl.c:702
>        __do_sys_ioctl fs/ioctl.c:709 [inline]
>        __se_sys_ioctl fs/ioctl.c:707 [inline]
>        __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:707
>        do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
>        entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> other info that might help us debug this:
>
> Chain exists of:
>   &sb->s_type->i_mutex_key#9 --> &mm->mmap_sem --> ashmem_mutex
>
>  Possible unsafe locking scenario:
>
>        CPU0                    CPU1
>        ----                    ----
>   lock(ashmem_mutex);
>                                lock(&mm->mmap_sem);
>                                lock(ashmem_mutex);
>   lock(&sb->s_type->i_mutex_key#9);
>
>  *** DEADLOCK ***
>
> 1 lock held by syz-executor900/4483:
>  #0: 0000000025208078 (ashmem_mutex){+.+.}, at:
> ashmem_shrink_scan+0xb4/0x630 drivers/staging/android/ashmem.c:448

Link: http://lkml.kernel.org/r/20180821231835.166639-1-joel@joelfernandes.org
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reported-by: syzbot <syzkaller@googlegroups.com>
Reviewed-by: NeilBrown <neilb@suse.com>
Suggested-by: NeilBrown <neilb@suse.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:53:13 +02:00
Li Dongyang 26410ddb75 ext4: don't mark mmp buffer head dirty
commit fe18d649891d813964d3aaeebad873f281627fbc upstream.

Marking mmp bh dirty before writing it will make writeback
pick up mmp block later and submit a write, we don't want the
duplicate write as kmmpd thread should have full control of
reading and writing the mmp block.
Another reason is we will also have random I/O error on
the writeback request when blk integrity is enabled, because
kmmpd could modify the content of the mmp block(e.g. setting
new seq and time) while the mmp block is under I/O requested
by writeback.

Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:53:13 +02:00
Theodore Ts'o 1f7db01576 ext4: fix online resizing for bigalloc file systems with a 1k block size
commit 5f8c10936fab2b69a487400f2872902e597dd320 upstream.

An online resize of a file system with the bigalloc feature enabled
and a 1k block size would be refused since ext4_resize_begin() did not
understand s_first_data_block is 0 for all bigalloc file systems, even
when the block size is 1k.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:53:12 +02:00
Theodore Ts'o 2a2b736da7 ext4: prevent online resize with backup superblock
commit 011fa99404bea3f5d897c4983f6bd51170e3b18f upstream.

Prevent BUG or corrupted file systems after the following:

mkfs.ext4 /dev/vdc 100M
mount -t ext4 -o sb=40961 /dev/vdc /vdc
resize2fs /dev/vdc

We previously prevented online resizing using the old resize ioctl.
Move the code to ext4_resize_begin(), so the check applies for all of
the resize ioctl's.

Reported-by: Maxim Malkov <malkov@ispras.ru>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:53:12 +02:00
Hangbin Liu 90a24b0820 igmp: fix incorrect unsolicit report count after link down and up
commit ff06525fcb8ae3c302ac1319bf6c07c026dea964 upstream.

After link down and up, i.e. when call ip_mc_up(), we doesn't init
im->unsolicit_count. So after igmp_timer_expire(), we will not start
timer again and only send one unsolicit report at last.

Fix it by initializing im->unsolicit_count in igmp_group_added(), so
we can respect igmp robustness value.

Fixes: 24803f38a5c0b ("igmp: do not remove igmp souce list info when set link down")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.16:
 - Keep using constant IGMP_Unsolicited_Report_Count
 - Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:53:12 +02:00
Hangbin Liu 6ccdad7923 multicast: do not restore deleted record source filter mode to new one
commit 08d3ffcc0cfaba36f6b86fd568cc3bc773061fa6 upstream.

There are two scenarios that we will restore deleted records. The first is
when device down and up(or unmap/remap). In this scenario the new filter
mode is same with previous one. Because we get it from in_dev->mc_list and
we do not touch it during device down and up.

The other scenario is when a new socket join a group which was just delete
and not finish sending status reports. In this scenario, we should use the
current filter mode instead of restore old one. Here are 4 cases in total.

old_socket        new_socket       before_fix       after_fix
  IN(A)             IN(A)           ALLOW(A)         ALLOW(A)
  IN(A)             EX( )           TO_IN( )         TO_EX( )
  EX( )             IN(A)           TO_EX( )         ALLOW(A)
  EX( )             EX( )           TO_EX( )         TO_EX( )

Fixes: 24803f38a5c0b (igmp: do not remove igmp souce list info when set link down)
Fixes: 1666d49e1d416 (mld: do not remove mld souce list info when set link down)
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:53:11 +02:00
Hangbin Liu f13f6d884d mld: do not remove mld souce list info when set link down
commit 1666d49e1d416fcc2cce708242a52fe3317ea8ba upstream.

This is an IPv6 version of commit 24803f38a5c0 ("igmp: do not remove igmp
souce list..."). In mld_del_delrec(), we will restore back all source filter
info instead of flush them.

Move mld_clear_delrec() from ipv6_mc_down() to ipv6_mc_destroy_dev() since
we should not remove source list info when set link down. Remove
igmp6_group_dropped() in ipv6_mc_destroy_dev() since we have called it in
ipv6_mc_down().

Also clear all source info after igmp6_group_dropped() instead of in it
because ipv6_mc_down() will call igmp6_group_dropped().

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:53:11 +02:00
Hangbin Liu eeef4b8fa3 igmp: do not remove igmp souce list info when set link down
commit 24803f38a5c0b6c57ed800b47e695f9ce474bc3a upstream.

In commit 24cf3af3fe ("igmp: call ip_mc_clear_src..."), we forgot to remove
igmpv3_clear_delrec() in ip_mc_down(), which also called ip_mc_clear_src().
This make us clear all IGMPv3 source filter info after NETDEV_DOWN.
Move igmpv3_clear_delrec() to ip_mc_destroy_dev() and then no need
ip_mc_clear_src() in ip_mc_destroy_dev().

On the other hand, we should restore back instead of free all source filter
info in igmpv3_del_delrec(). Or we will not able to restore IGMPv3 source
filter info after NETDEV_UP and NETDEV_POST_TYPE_CHANGE.

Fixes: 24cf3af3fe ("igmp: call ip_mc_clear_src() only when ...")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.16: use IGMP_Unsolicited_Report_Count instead of
 sysctl_igmp_qrv]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:53:10 +02:00
Weilong Chen 91698ec19c ipv4: fix all space errors in file igmp.c
Signed-off-by: Weilong Chen <chenweilong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-27 21:53:10 +02:00
Hangbin Liu 76f77a4dcf igmp: fix incorrect unsolicit report count when join group
commit 4fb7253e4f9a8f06a986a3b317e2f79d9b43d552 upstream.

We should not start timer if im->unsolicit_count equal to 0 after decrease.
Or we will send one more unsolicit report message. i.e. 3 instead of 2 by
default.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:53:10 +02:00
William Manley 39a1d7828f net: igmp: Reduce Unsolicited report interval to 1s when using IGMPv3
If an IGMP join packet is lost you will not receive data sent to the
multicast group so if no data arrives from that multicast group in a
period of time after the IGMP join a second IGMP join will be sent.  The
delay between joins is the "IGMP Unsolicited Report Interval".

Previously this value was hard coded to be chosen randomly between 0-10s.
This can be too long for some use-cases, such as IPTV as it can cause
channel change to be slow in the presence of packet loss.

The value 10s has come from IGMPv2 RFC2236, which was reduced to 1s in
IGMPv3 RFC3376.  This patch makes the kernel use the 1s value from the
later RFC if we are operating in IGMPv3 mode.  IGMPv2 behaviour is
unaffected.

Tested with Wireshark and a simple program to join a (non-existent)
multicast group.  The distribution of timings for the second join differ
based upon setting /proc/sys/net/ipv4/conf/eth0/force_igmp_version.

Signed-off-by: William Manley <william.manley@youview.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-27 21:53:09 +02:00
Sabrina Dubroca df0a919694 ipv6: fix cleanup ordering for pingv6 registration
commit a03dc36bdca6b614651fedfcd8559cf914d2d21d upstream.

Commit 6d0bfe226116 ("net: ipv6: Add IPv6 support to the ping socket.")
contains an error in the cleanup path of inet6_init(): when
proto_register(&pingv6_prot, 1) fails, we try to unregister
&pingv6_prot. When rawv6_init() fails, we skip unregistering
&pingv6_prot.

Example of panic (triggered by faking a failure of
 proto_register(&pingv6_prot, 1)):

    general protection fault: 0000 [#1] PREEMPT SMP KASAN PTI
    [...]
    RIP: 0010:__list_del_entry_valid+0x79/0x160
    [...]
    Call Trace:
     proto_unregister+0xbb/0x550
     ? trace_preempt_on+0x6f0/0x6f0
     ? sock_no_shutdown+0x10/0x10
     inet6_init+0x153/0x1b8

Fixes: 6d0bfe226116 ("net: ipv6: Add IPv6 support to the ping socket.")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:53:09 +02:00
Theodore Ts'o 98a8b0db53 ext4: check for NUL characters in extended attribute's name
commit 7d95178c77014dbd8dce36ee40bbbc5e6c121ff5 upstream.

Extended attribute names are defined to be NUL-terminated, so the name
must not contain a NUL character.  This is important because there are
places when remove extended attribute, the code uses strlen to
determine the length of the entry.  That should probably be fixed at
some point, but code is currently really messy, so the simplest fix
for now is to simply validate that the extended attributes are sane.

https://bugzilla.kernel.org/show_bug.cgi?id=200401

Reported-by: Wen Xu <wen.xu@gatech.edu>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
[bwh: Backported to 3.16: return -EIO instead of -EFSCORRUPTED]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:53:09 +02:00
Miklos Szeredi cfe4f80048 fuse: Fix oops at process_init_reply()
commit e8f3bd773d22f488724dffb886a1618da85c2966 upstream.

syzbot is hitting NULL pointer dereference at process_init_reply().
This is because deactivate_locked_super() is called before response for
initial request is processed.

Fix this by aborting and waiting for all requests (including FUSE_INIT)
before resetting fc->sb.

Original patch by Tetsuo Handa <penguin-kernel@I-love.SKAURA.ne.jp>.

Reported-by: syzbot <syzbot+b62f08f4d5857755e3bc@syzkaller.appspotmail.com>
Fixes: e27c9d3877a0 ("fuse: fuse: add time_gran to INIT_OUT")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
[bwh: Backported to 3.16:
 - Drop second argument to fuse_abort_conn()
 - fuse_wait_aborted() is not needed]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:53:08 +02:00
Miklos Szeredi bf85216444 fuse: flush requests on umount
commit 580640ba5d331eb5631a5de46941c98f5ed90886 upstream.

Use fuse_abort_conn() instead of fuse_conn_kill() in fuse_put_super().
This flushes and aborts requests still on any queues.  But since we've
already reset fc->connected, those requests would not be useful anyway and
would be flushed when the fuse device is closed.

Next patches will rely on requests being flushed before the superblock is
destroyed.

Use fuse_abort_conn() in cuse_process_init_reply() too, since it makes no
difference there, and we can get rid of fuse_conn_kill().

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:53:08 +02:00
Miklos Szeredi 8fd9691398 fuse: don't wake up reserved req in fuse_conn_kill()
commit 0c4dd4ba1426c599072511dcf95a15ee5e12725b upstream.

Waking up reserved_req_waitq from fuse_conn_kill() doesn't make sense since
we aren't chaging ff->reserved_req here, which is what this waitqueue
signals.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:53:08 +02:00
Takashi Iwai 7135d2e3c3 ALSA: pcm: Call snd_pcm_unlink() conditionally at closing
commit b51abed8355e5556886623b2772fa6b7598d2282 upstream.

Currently the PCM core calls snd_pcm_unlink() always unconditionally
at closing a stream.  However, since snd_pcm_unlink() invokes the
global rwsem down, the lock can be easily contended.  More badly, when
a thread runs in a high priority RT-FIFO, it may stall at spinning.

Basically the call of snd_pcm_unlink() is required only for the linked
streams that are already rare occasion.  For normal use cases, this
code path is fairly superfluous.

As an optimization (and also as a workaround for the RT problem
above in normal situations without linked streams), this patch adds a
check before calling snd_pcm_unlink() and calls it only when needed.

Reported-by: Chanho Min <chanho.min@lge.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-27 21:53:07 +02:00
Siva Kumar Akkireddi f283dd5336 msm: sps: Suppress bind/unbind attributes
SPS driver does not support manual bind/unbind operations
through sysfs. Suppress the bind/unbind nodes. Do not free
SPS struct in sps_device_de_init since it is being done in
sps_exit, and also to avoid use-after-free.

Bug: 114042002
Change-Id: If6da6c5fb9d1a44d0420c6151f7f9d0a33cb2d04
Signed-off-by: Siva Kumar Akkireddi <sivaa@codeaurora.org>
2019-07-27 21:53:07 +02:00
Trishansh Bhardwaj 0be6ca7b73 msm: camera: Fix out-of-bounds read in string class name.
jpeg driver is calling class_create with stack variable, which
can be overwritten by other stack variables.

Bug: 114041685
Change-Id: I3c22a5b3375b970ff6b1c6de983dd5833f4e11d0
Signed-off-by: Trishansh Bhardwaj <tbhardwa@codeaurora.org>
2019-07-27 21:53:07 +02:00