android_kernel_google_msm/net
Eric Dumazet f0180de266 net: add a synchronize_net() in netdev_rx_handler_unregister()
[ Upstream commit 00cfec3748 ]

commit 35d48903e9 (bonding: fix rx_handler locking) added a race
in bonding driver, reported by Steven Rostedt who did a very good
diagnosis :

<quoting Steven>

I'm currently debugging a crash in an old 3.0-rt kernel that one of our
customers is seeing. The bug happens with a stress test that loads and
unloads the bonding module in a loop (I don't know all the details as
I'm not the one that is directly interacting with the customer). But the
bug looks to be something that may still be present and possibly present
in mainline too. It will just be much harder to trigger it in mainline.

In -rt, interrupts are threads, and can schedule in and out just like
any other thread. Note, mainline now supports interrupt threads so this
may be easily reproducible in mainline as well. I don't have the ability
to tell the customer to try mainline or other kernels, so my hands are
somewhat tied to what I can do.

But according to a core dump, I tracked down that the eth irq thread
crashed in bond_handle_frame() here:

        slave = bond_slave_get_rcu(skb->dev);
        bond = slave->bond; <--- BUG

the slave returned was NULL and accessing slave->bond caused a NULL
pointer dereference.

Looking at the code that unregisters the handler:

void netdev_rx_handler_unregister(struct net_device *dev)
{

        ASSERT_RTNL();
        RCU_INIT_POINTER(dev->rx_handler, NULL);
        RCU_INIT_POINTER(dev->rx_handler_data, NULL);
}

Which is basically:
        dev->rx_handler = NULL;
        dev->rx_handler_data = NULL;

And looking at __netif_receive_skb() we have:

        rx_handler = rcu_dereference(skb->dev->rx_handler);
        if (rx_handler) {
                if (pt_prev) {
                        ret = deliver_skb(skb, pt_prev, orig_dev);
                        pt_prev = NULL;
                }
                switch (rx_handler(&skb)) {

My question to all of you is, what stops this interrupt from happening
while the bonding module is unloading?  What happens if the interrupt
triggers and we have this:

        CPU0                    CPU1
        ----                    ----
  rx_handler = skb->dev->rx_handler

                        netdev_rx_handler_unregister() {
                           dev->rx_handler = NULL;
                           dev->rx_handler_data = NULL;

  rx_handler()
   bond_handle_frame() {
    slave = skb->dev->rx_handler;
    bond = slave->bond; <-- NULL pointer dereference!!!

What protection am I missing in the bond release handler that would
prevent the above from happening?

</quoting Steven>

We can fix bug this in two ways. First is adding a test in
bond_handle_frame() and others to check if rx_handler_data is NULL.

A second way is adding a synchronize_net() in
netdev_rx_handler_unregister() to make sure that a rcu protected reader
has the guarantee to see a non NULL rx_handler_data.

The second way is better as it avoids an extra test in fast path.

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jiri Pirko <jpirko@redhat.com>
Cc: Paul E. McKenney <paulmck@us.ibm.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-04-05 10:04:47 -07:00
..
9p
802
8021q 8021q: fix a potential use-after-free 2013-04-05 10:04:38 -07:00
appletalk
atm atm: fix info leak via getsockname() 2012-10-02 10:29:36 -07:00
ax25
batman-adv batman-adv: fix random jitter calculation 2013-01-11 09:07:03 -08:00
bluetooth Bluetooth: Fix not closing SCO sockets in the BT_CONNECT2 state 2013-04-05 10:04:15 -07:00
bridge bridge: set priority of STP packets 2013-02-28 06:59:05 -08:00
caif caif: Fix access to freed pernet memory 2012-08-09 08:31:42 -07:00
can can: bcm: initialize ifindex for timeouts without previous frame reception 2012-12-03 11:47:10 -08:00
ceph rbd: remove linger unconditionally 2013-01-17 08:51:20 -08:00
core net: add a synchronize_net() in netdev_rx_handler_unregister() 2013-04-05 10:04:47 -07:00
dcb dcbnl: fix various netlink info leaks 2013-03-20 13:05:02 -07:00
dccp inet: Fix kmemleak in tcp_v4/6_syn_recv_sock and dccp_v4/6_request_recv_sock 2013-01-11 09:07:14 -08:00
decnet
dns_resolver
dsa
econet
ethernet
ieee802154 6lowpan: Fix endianness issue in is_addr_link_local(). 2013-03-20 13:05:02 -07:00
ipv4 tcp: undo spurious timeout after SACK reneging 2013-04-05 10:04:38 -07:00
ipv6 ipv6: don't accept node local multicast traffic from the wire 2013-04-05 10:04:41 -07:00
ipx
irda net/irda: add missing error path release_sock call 2013-04-05 10:04:34 -07:00
iucv
key
l2tp l2tp: Restore socket refcount when sendmsg succeeds 2013-03-20 13:05:01 -07:00
lapb
llc llc: fix info leak via getsockname() 2012-10-02 10:29:37 -07:00
mac80211 mac80211: synchronize scan off/on-channel and PS states 2013-02-03 18:24:42 -06:00
netfilter netfilter: Mark SYN/ACK packets as invalid from original direction 2012-11-26 11:37:48 -08:00
netlabel netlabel: correctly list all the static label mappings 2013-03-20 13:05:01 -07:00
netlink thermal: shorten too long mcast group name 2013-04-05 10:04:38 -07:00
netrom netrom: copy_datagram_iovec can fail 2012-10-13 05:38:45 +09:00
nfc NFC: Fix nfc_llcp_local chained list insertion 2012-12-03 11:47:12 -08:00
openvswitch openvswitch: Reset upper layer protocol info on internal devices. 2012-10-02 10:29:50 -07:00
packet packet: fix leakage of tx_ring memory 2013-02-14 10:49:05 -08:00
phonet
rds rds: limit the size allocated by rds_message_alloc() 2013-03-20 13:05:01 -07:00
rfkill
rose
rxrpc
sched net: sched: integer overflow fix 2013-01-11 09:07:14 -08:00
sctp sctp: don't break the loop while meeting the active_path so as to find the matched transport 2013-03-28 12:11:53 -07:00
sunrpc SUNRPC: Add barriers to ensure read ordering in rpc_wake_up_task_queue_locked 2013-04-05 10:04:14 -07:00
tipc
unix unix: fix a race condition in unix_release() 2013-04-05 10:04:39 -07:00
wanrouter wanmain: comparing array with NULL 2012-08-09 08:31:51 -07:00
wimax
wireless wireless: allow 40 MHz on world roaming channels 12/13 2012-11-26 11:37:46 -08:00
x25
xfrm xfrm_user: ensure user supplied esn replay window is valid 2012-10-13 05:38:41 +09:00
compat.c net: Fix references to out-of-scope variables in put_cmsg_compat() 2012-08-09 08:31:42 -07:00
Kconfig
Makefile
nonet.c
socket.c net: fix info leak in compat dev_ifconf() 2012-10-02 10:29:37 -07:00
sysctl_net.c