Commit Graph

428 Commits

Author SHA1 Message Date
David S. Miller 58717686cf Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
	drivers/net/ethernet/emulex/benet/be.h
	include/net/tcp.h
	net/mac802154/mac802154.h

Most conflicts were minor overlapping stuff.

The be2net driver brought in some fixes that added __vlan_put_tag
calls, which in net-next take an additional argument.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-30 03:55:20 -04:00
nikolay@redhat.com c6cdcf6d82 bonding: fix locking in enslave failure path
In commit 3c5913b53f ("bonding:
primary_slave & curr_active_slave are not cleaned on enslave failure")
I didn't account for the use of curr_active_slave without curr_slave_lock
and since there are such users, we should hold bond->lock for writing while
setting it to NULL (in the NULL case we don't need the curr_slave_lock).
Keeping the bond lock as to avoid the extra release/acquire cycle.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-25 04:03:21 -04:00
David S. Miller 6e0895c2ea Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/emulex/benet/be_main.c
	drivers/net/ethernet/intel/igb/igb_main.c
	drivers/net/wireless/brcm80211/brcmsmac/mac80211_if.c
	include/net/scm.h
	net/batman-adv/routing.c
	net/ipv4/tcp_input.c

The e{uid,gid} --> {uid,gid} credentials fix conflicted with the
cleanup in net-next to now pass cred structs around.

The be2net driver had a bug fix in 'net' that overlapped with the VLAN
interface changes by Patrick McHardy in net-next.

An IGB conflict existed because in 'net' the build_skb() support was
reverted, and in 'net-next' there was a comment style fix within that
code.

Several batman-adv conflicts were resolved by making sure that all
calls to batadv_is_my_mac() are changed to have a new bat_priv first
argument.

Eric Dumazet's TS ECR fix in TCP in 'net' conflicted with the F-RTO
rewrite in 'net-next', mostly overlapping changes.

Thanks to Stephen Rothwell and Antonio Quartulli for help with several
of these merge resolutions.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-22 20:32:51 -04:00
nikolay@redhat.com d632ce989c bonding: in bond_mc_swap() bond's mc addr list is walked without lock
Use netif_addr_lock_bh() to acquire the appropriate lock before walking.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19 17:48:19 -04:00
nikolay@redhat.com fc7a72ac86 bonding: disable netpoll on enslave failure
slave_disable_netpoll() is not called upon enslave failure which would
lead to a memory leak. Call slave_disable_netpoll() after err_detach as
that's the first error path after enabling netpoll on that slave.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19 17:48:19 -04:00
nikolay@redhat.com 3c5913b53f bonding: primary_slave & curr_active_slave are not cleaned on enslave failure
On enslave failure primary_slave can point to new_slave which is to be
freed, and the same applies to curr_active_slave. So check if this is
the case and clean up properly after err_detach because that's the first
error code path after they're set.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19 17:48:19 -04:00
nikolay@redhat.com a506e7b479 bonding: vlans don't get deleted on enslave failure
The main problem is with vid refcount which only gets bumped up.
Delete the vlans after err_detach as that's the first error path
after the vlans are added.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19 17:48:18 -04:00
nikolay@redhat.com 25e40305d4 bonding: mc addresses don't get deleted on enslave failure
Add bond_mc_list_flush() after err_detach as that's the first error path
after the addresses are added. The main issue is the mc addresses' refcount
which only gets bumped up.

v2: update log message and don't move code unnecessarily

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19 17:48:18 -04:00
Andy Gospodarek bb5b052f75 bond: add support to read speed and duplex via ethtool
This patch adds support for the get_settings ethtool op to the bonding
driver.  This was motivated by users who wanted to get the speed of the
bond and compare that against throughput to understand utilization.
The behavior before this patch was added was problematic when computing
line utilization after trying to get link-speed and throughput via SNMP.

Output from ethtool looks like this for a round-robin bond:

Settings for bond0:
	Supported ports: [ ]
	Supported link modes:   Not reported
	Supported pause frame use: No
	Supports auto-negotiation: No
	Advertised link modes:  Not reported
	Advertised pause frame use: No
	Advertised auto-negotiation: No
	Speed: 11000Mb/s
	Duplex: Full
	Port: Other
	PHYAD: 0
	Transceiver: internal
	Auto-negotiation: off
	MDI-X: Unknown
	Link detected: yes

I tested this and verified it works as expected.  A test was also done
on a version backported to an older kernel and it worked well there.

v2: Switch to using ethtool_cmd_speed_set to set speed, added check to
SLAVE_IS_OK for each slave in bond, dropped mode-specific calculations
as they were not needed, and set port type to 'Other.'

v3: Fix useless assignment and checkpatch warning.

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19 16:39:50 -04:00
Patrick McHardy 1fd9b1fc31 net: vlan: prepare for 802.1ad support
Make the encapsulation protocol value a property of VLAN devices and change
the device lookup functions to take the protocol value into account.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19 14:45:27 -04:00
Patrick McHardy 80d5c3689b net: vlan: prepare for 802.1ad VLAN filtering offload
Change the rx_{add,kill}_vid callbacks to take a protocol argument in
preparation of 802.1ad support. The protocol argument used so far is
always htons(ETH_P_8021Q).

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19 14:45:27 -04:00
Patrick McHardy f646968f8f net: vlan: rename NETIF_F_HW_VLAN_* feature flags to NETIF_F_HW_VLAN_CTAG_*
Rename the hardware VLAN acceleration features to include "CTAG" to indicate
that they only support CTAGs. Follow up patches will introduce 802.1ad
server provider tagging (STAGs) and require the distinction for hardware not
supporting acclerating both.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19 14:45:26 -04:00
Eric Dumazet 4394542ca4 bonding: fix l23 and l34 load balancing in forwarding path
Since commit 6b923cb718 (bonding: support for IPv6 transmit hashing)
bonding doesn't properly hash traffic in forwarding setups.

Vitaly V. Bursov diagnosed that skb_network_header_len() returned 0 in
this case.

More generally, the transport header might not be in the skb head.

Use pskb_may_pull() & skb_header_pointer() to get it right, and use
proto_ports_offset() in bond_xmit_hash_policy_l34() to get support for
more protocols than TCP and UDP.

Reported-by: Vitaly V. Bursov <vitalyb@telenet.dn.ua>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: John Eaglesham <linux@8192.net>
Tested-by: Vitaly V. Bursov <vitalyb@telenet.dn.ua>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-18 15:07:29 -04:00
nikolay@redhat.com b6a5a7b9a5 bonding: IFF_BONDING is not stripped on enslave failure
While enslaving a new device and after IFF_BONDING flag is set, in case
of failure it is not stripped from the device's priv_flags while
cleaning up, which could lead to other problems.
Cleaning at err_close because the flag is set after dev_open().

v2: no change

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-11 16:01:47 -04:00
nikolay@redhat.com 6101391d4a bonding: fix netdev event NULL pointer dereference
In commit 471cb5a33d ("bonding: remove
usage of dev->master") a bug was introduced which causes a NULL pointer
dereference. If a bond device is in mode 6 (ALB) and a slave is added
it will dereference a NULL pointer in bond_slave_netdev_event().
This is because in bond_enslave we have bond_alb_init_slave() which
changes the MAC address of the slave and causes a NETDEV_CHANGEADDR.
Then we have in bond_slave_netdev_event():
        struct slave *slave = bond_slave_get_rtnl(slave_dev);
        struct bonding *bond = slave->bond;
bond_slave_get_rtnl() dereferences slave_dev->rx_handler_data which at
that time is NULL since netdev_rx_handler_register() is called later.

This is fixed by checking if slave is NULL before dereferencing it.

v2: Comment style changed.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-11 16:01:47 -04:00
nikolay@redhat.com 69b0216ac2 bonding: fix bonding_masters race condition in bond unloading
While the bonding module is unloading, it is considered that after
rtnl_link_unregister all bond devices are destroyed but since no
synchronization mechanism exists, a new bond device can be created
via bonding_masters before unregister_pernet_subsys which would
lead to multiple problems (e.g. NULL pointer dereference, wrong RIP,
list corruption).

This patch fixes the issue by removing any bond devices left in the
netns after bonding_masters is removed from sysfs.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Acked-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-08 16:45:09 -04:00
nikolay@redhat.com ffcdedb667 Revert "bonding: remove sysfs before removing devices"
This reverts commit 4de79c737b.

This patch introduces a new bug which causes access to freed memory.
In bond_uninit: list_del(&bond->bond_list);
bond_list is linked in bond_net's dev_list which is freed by
unregister_pernet_subsys.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-08 16:45:09 -04:00
David S. Miller d978a6361a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/nfc/microread/mei.c
	net/netfilter/nfnetlink_queue_core.c

Pull in 'net' to get Eric Biederman's AF_UNIX fix, upon which
some cleanups are going to go on-top.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 18:37:01 -04:00
Veaceslav Falico 4de79c737b bonding: remove sysfs before removing devices
We have a race condition if we try to rmmod bonding and simultaneously add
a bond master through sysfs. In bonding_exit() we first remove the devices
(through rtnl_link_unregister() ) and only after that we remove the sysfs.
If we manage to add a device through sysfs after that the devices were
removed - we'll end up with that device/sysfs structure and with the module
unloaded.

Fix this by first removing the sysfs and only after that calling
rtnl_link_unregister().

Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-05 00:46:13 -04:00
David S. Miller d662483264 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull net into net-next to get the synchronize_net() bug fix in
bonding.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-03 01:31:54 -04:00
Veaceslav Falico fcd99434fb bonding: get netdev_rx_handler_unregister out of locks
Now that netdev_rx_handler_unregister contains synchronize_net(), we need
to call it outside of bond->lock, cause it might sleep. Also, remove the
already unneded synchronize_net().

Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-02 12:05:28 -04:00
Veaceslav Falico ad999eee66 bonding: cleanup unneeded rcu_read_lock()
bond_resend_igmp_join_requests_delayed() calls _resend_igmp_join_requests()
under rcu_read_lock(), while it gets its own rcu_read_lock() for the whole
function. Remove the lock from the _delayed function.

Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-26 12:49:40 -04:00
Veaceslav Falico 876254ae27 bonding: don't call update_speed_duplex() under spinlocks
bond_update_speed_duplex() might sleep while calling underlying slave's
routines. Move it out of atomic context in bond_enslave() and remove it
from bond_miimon_commit() - it was introduced by commit 546add79, however
when the slave interfaces go up/change state it's their responsibility to
fire NETDEV_UP/NETDEV_CHANGE events so that bonding can properly update
their speed.

I've tested it on all combinations of ifup/ifdown, autoneg/speed/duplex
changes, remote-controlled and local, on (not) MII-based cards. All changes
are visible.

Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-13 04:53:17 -04:00
Veaceslav Falico 80028ea1c0 bonding: fire NETDEV_RELEASE event only on 0 slaves
Currently, if we set up netconsole over bonding and release a slave,
netconsole will stop logging on the whole bonding device. Change the
behavior to stop the netconsole only when the last slave is released.

Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-07 16:15:18 -05:00
Jiri Pirko 6c8c4e4c24 bond: check if slave count is 0 in case when deciding to take slave's mac
in bond_enslave(), check slave_cnt before actually using slave address.

introduced by:
commit 409cc1f8a4 (bond: have random dev address by default instead of zeroes)

Reported-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-26 17:30:38 -05:00
Doug Goldstein b3f92b63c4 bonding: set sysfs device_type to 'bond'
Sets the sysfs device_type to 'bond' for udev. This allows udev rules to
be created for bond devices. This is similar to how other network
devices set their device_type.

Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-19 00:51:09 -05:00
nikolay@redhat.com 0896341a44 bonding: fix bond_release_all inconsistencies
This patch fixes the following inconsistencies in bond_release_all:
- IFF_BONDING flag is not stripped from slaves
- MTU is not restored
- no netdev notifiers are sent
Instead of trying to keep bond_release and bond_release_all in sync
I think we can re-use bond_release as the environment for calling it
is correct (RTNL is held). I have been running tests for the past
week and they came out successful. The only way for bond_release to fail
is for the slave to be attached in a different bond or to not be a slave
but that cannot happen as RTNL is held and no slave manipulations can be
achieved.

V2: As suggested bond_release is renamed to __bond_release_one with a
new parameter "all" introduced so to avoid calling unnecessary code while
destroying a bond, and a wrapper for it called bond_release is created
because of ndo_del_link. bond_release_all() is removed.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-19 00:51:09 -05:00
Neil Horman 2cde6acd49 netpoll: Fix __netpoll_rcu_free so that it can hold the rtnl lock
__netpoll_rcu_free is used to free netpoll structures when the rtnl_lock is
already held.  The mechanism is used to asynchronously call __netpoll_cleanup
outside of the holding of the rtnl_lock, so as to avoid deadlock.
Unfortunately, __netpoll_cleanup modifies pointers (dev->np), which means the
rtnl_lock must be held while calling it.  Further, it cannot be held, because
rcu callbacks may be issued in softirq contexts, which cannot sleep.

Fix this by converting the rcu callback to a work queue that is guaranteed to
get scheduled in process context, so that we can hold the rtnl properly while
calling __netpoll_cleanup

Tested successfully by myself.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: Cong Wang <amwang@redhat.com>
CC: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-11 19:19:33 -05:00
Gao feng 387ff91184 netns: bond: allow unprivileged users to control bond device
reduce the permission check of bond device's ioctl.
allow the userns root to control the bond device.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-04 13:12:16 -05:00
Jiri Pirko 409cc1f8a4 bond: have random dev address by default instead of zeroes
Makes more sense to have randomly generated address by default than to
have all zeroes. It also allows user to for example put the bond into
bridge without need to have any slaves in it.

Also note that this changes only behaviour of bonds with no slaves. Once
the first slave device is enslaved, its address will be used (no change
here).

Also, fix dev_assign_type values on the way.

Reported-by: Pavel Šimerda <psimerda@redhat.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-30 15:34:00 -05:00
Jiri Pirko 7826d43f2d ethtool: fix drvinfo strings set in drivers
Use strlcpy where possible to ensure the string is \0 terminated.
Use always sizeof(string) instead of 32, ETHTOOL_BUSINFO_LEN
and custom defines.
Use snprintf instead of sprint.
Remove unnecessary inits of ->fw_version
Remove unnecessary inits of drvinfo struct.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-06 21:06:31 -08:00
Jiri Pirko 471cb5a33d bonding: remove usage of dev->master
Benefit from new upper dev list and free bonding from dev->master usage.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-04 13:31:50 -08:00
Konstantin Khlebnikov cfb6f99dd9 bonding: do not cancel works in bond_uninit()
Bonding initializes these works in bond_open() and cancels in bond_close(),
thus in bond_uninit() they are already canceled but may be unitialized yet.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Nikolay Aleksandrov <nikolay@redhat.com>
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-14 13:14:07 -05:00
Linus Torvalds 6be35c700f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking changes from David Miller:

1) Allow to dump, monitor, and change the bridge multicast database
   using netlink.  From Cong Wang.

2) RFC 5961 TCP blind data injection attack mitigation, from Eric
   Dumazet.

3) Networking user namespace support from Eric W. Biederman.

4) tuntap/virtio-net multiqueue support by Jason Wang.

5) Support for checksum offload of encapsulated packets (basically,
   tunneled traffic can still be checksummed by HW).  From Joseph
   Gasparakis.

6) Allow BPF filter access to VLAN tags, from Eric Dumazet and
   Daniel Borkmann.

7) Bridge port parameters over netlink and BPDU blocking support
   from Stephen Hemminger.

8) Improve data access patterns during inet socket demux by rearranging
   socket layout, from Eric Dumazet.

9) TIPC protocol updates and cleanups from Ying Xue, Paul Gortmaker, and
   Jon Maloy.

10) Update TCP socket hash sizing to be more in line with current day
    realities.  The existing heurstics were choosen a decade ago.
    From Eric Dumazet.

11) Fix races, queue bloat, and excessive wakeups in ATM and
    associated drivers, from Krzysztof Mazur and David Woodhouse.

12) Support DOVE (Distributed Overlay Virtual Ethernet) extensions
    in VXLAN driver, from David Stevens.

13) Add "oops_only" mode to netconsole, from Amerigo Wang.

14) Support set and query of VEB/VEPA bridge mode via PF_BRIDGE, also
    allow DCB netlink to work on namespaces other than the initial
    namespace.  From John Fastabend.

15) Support PTP in the Tigon3 driver, from Matt Carlson.

16) tun/vhost zero copy fixes and improvements, plus turn it on
    by default, from Michael S. Tsirkin.

17) Support per-association statistics in SCTP, from Michele
    Baldessari.

And many, many, driver updates, cleanups, and improvements.  Too
numerous to mention individually.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1722 commits)
  net/mlx4_en: Add support for destination MAC in steering rules
  net/mlx4_en: Use generic etherdevice.h functions.
  net: ethtool: Add destination MAC address to flow steering API
  bridge: add support of adding and deleting mdb entries
  bridge: notify mdb changes via netlink
  ndisc: Unexport ndisc_{build,send}_skb().
  uapi: add missing netconf.h to export list
  pkt_sched: avoid requeues if possible
  solos-pci: fix double-free of TX skb in DMA mode
  bnx2: Fix accidental reversions.
  bna: Driver Version Updated to 3.1.2.1
  bna: Firmware update
  bna: Add RX State
  bna: Rx Page Based Allocation
  bna: TX Intr Coalescing Fix
  bna: Tx and Rx Optimizations
  bna: Code Cleanup and Enhancements
  ath9k: check pdata variable before dereferencing it
  ath5k: RX timestamp is reported at end of frame
  ath9k_htc: RX timestamp is reported at end of frame
  ...
2012-12-12 18:07:07 -08:00
Ben Hutchings c772dde343 bonding: Fix check for ethtool get_link operation support
Since commit 2c60db0370 ('net: provide a default dev->ethtool_ops')
all devices have a non-null ethtool_ops.  Test only
dev->ethtool_ops->get_link in both places where we care.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-07 14:35:35 -05:00
nikolay@redhat.com 90fb6250c5 bonding: make arp_ip_target parameter checks consistent with sysfs
The module can be loaded with arp_ip_target="255.255.255.255" which makes
 it impossible to remove as the function in sysfs checks for that value,
 so we make the parameter checks consistent with sysfs.

 v2: Fix formatting
 v3: Make description text < 75 columns

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-29 13:13:15 -05:00
nikolay@redhat.com fbb0c41b81 bonding: fix miimon and arp_interval delayed work race conditions
First I would give three observations which will be used later.
Observation 1: if (delayed_work_pending(wq)) cancel_delayed_work(wq)
 This usage is wrong because the pending bit is cleared just before the
 work's fn is executed and if the function re-arms itself we might end up
 with the work still running. It's safe to call cancel_delayed_work_sync()
 even if the work is not queued at all.
Observation 2: Use of INIT_DELAYED_WORK()
 Work needs to be initialized only once prior to (de/en)queueing.
Observation 3: IFF_UP is set only after ndo_open is called

Related race conditions:
1. Race between bonding_store_miimon() and bonding_store_arp_interval()
 Because of Obs.1 we can end up having both works enqueued.
2. Multiple races with INIT_DELAYED_WORK()
 Since the works are not protected by anything between INIT_DELAYED_WORK()
 and calls to (en/de)queue it is possible for races between the following
 functions:
 (races are also possible between the calls to INIT_DELAYED_WORK()
  and workqueue code)
 bonding_store_miimon() - bonding_store_arp_interval(), bond_close(),
			  bond_open(), enqueued functions
 bonding_store_arp_interval() - bonding_store_miimon(), bond_close(),
				bond_open(), enqueued functions
3. By Obs.1 we need to change bond_cancel_all()

Bugs 1 and 2 are fixed by moving all work initializations in bond_open
which by Obs. 2 and Obs. 3 and the fact that we make sure that all works
are cancelled in bond_close(), is guaranteed not to have any work
enqueued.
Also RTNL lock is now acquired in bonding_store_miimon/arp_interval so
they can't race with bond_close and bond_open. The opposing work is
cancelled only if the IFF_UP flag is set and it is cancelled
unconditionally. The opposing work is already cancelled if the interface
is down so no need to cancel it again. This way we don't need new
synchronizations for the bonding workqueue. These bugs (and fixes) are
tied together and belong in the same patch.
Note: I have left 1 line intentionally over 80 characters (84) because I
      didn't like how it looks broken down. If you'd prefer it otherwise,
      then simply break it.

 v2: Make description text < 75 columns

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-29 13:13:15 -05:00
Michal Kubeček 4e591b93d5 bonding: in balance-rr mode, set curr_active_slave only if it is up
If all slaves of a balance-rr bond with ARP monitor are enslaved
with down link state, bond keeps down state even after slaves
go up.

This is caused by bond_enslave() setting curr_active_slave to
first slave not taking into account its link state. As
bond_loadbalance_arp_mon() uses curr_active_slave to identify
whether slave's down->up transition should update bond's link
state, bond stays down even if slaves are up (until first slave
goes from up to down at least once).

Before commit f31c7937 "bonding: start slaves with link down for
ARP monitor", this was masked by slaves always starting in UP
state with ARP monitor (and MII monitor not relying on
curr_active_slave being NULL if there is no slave up).

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-28 11:28:45 -05:00
Sarveshwar Bandi 0e376bd0b7 bonding: Bonding driver does not consider the gso_max_size/gso_max_segs setting of slave devices.
Patch sets the lowest gso_max_size and gso_max_segs values of the slave devices during enslave and detach.

Signed-off-by: Sarveshwar Bandi <sarveshwar.bandi@emulex.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-21 11:50:31 -05:00
Jiri Pirko 55462cf30a vlan: fix bond/team enslave of vlan challenged slave/port
In vlan_uses_dev() check for number of vlan devs rather than existence
of vlan_info. The reason is that vlan id 0 is there without appropriate
vlan dev on it by default which prevented from enslaving vlan challenged
dev.

Reported-by: Jon Stanley <jstanley@rmrf.net>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-16 14:41:46 -04:00
Eric Dumazet 49ee49202b bonding: set qdisc_tx_busylock to avoid LOCKDEP splat
If a qdisc is installed on a bonding device, its possible to get
following lockdep splat under stress :

 =============================================
 [ INFO: possible recursive locking detected ]
 3.6.0+ #211 Not tainted
 ---------------------------------------------
 ping/4876 is trying to acquire lock:
  (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+.-...}, at: [<ffffffff8157a191>] dev_queue_xmit+0xe1/0x830

 but task is already holding lock:
  (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+.-...}, at: [<ffffffff8157a191>] dev_queue_xmit+0xe1/0x830

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(dev->qdisc_tx_busylock ?: &qdisc_tx_busylock);
   lock(dev->qdisc_tx_busylock ?: &qdisc_tx_busylock);

  *** DEADLOCK ***

  May be due to missing lock nesting notation

 6 locks held by ping/4876:
  #0:  (sk_lock-AF_INET){+.+.+.}, at: [<ffffffff815e5030>] raw_sendmsg+0x600/0xc30
  #1:  (rcu_read_lock_bh){.+....}, at: [<ffffffff815ba4bd>] ip_finish_output+0x12d/0x870
  #2:  (rcu_read_lock_bh){.+....}, at: [<ffffffff8157a0b0>] dev_queue_xmit+0x0/0x830
  #3:  (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+.-...}, at: [<ffffffff8157a191>] dev_queue_xmit+0xe1/0x830
  #4:  (&bond->lock){++.?..}, at: [<ffffffffa02128c1>] bond_start_xmit+0x31/0x4b0 [bonding]
  #5:  (rcu_read_lock_bh){.+....}, at: [<ffffffff8157a0b0>] dev_queue_xmit+0x0/0x830

 stack backtrace:
 Pid: 4876, comm: ping Not tainted 3.6.0+ #211
 Call Trace:
  [<ffffffff810a0145>] __lock_acquire+0x715/0x1b80
  [<ffffffff810a256b>] ? mark_held_locks+0x9b/0x100
  [<ffffffff810a1bf2>] lock_acquire+0x92/0x1d0
  [<ffffffff8157a191>] ? dev_queue_xmit+0xe1/0x830
  [<ffffffff81726b7c>] _raw_spin_lock+0x3c/0x50
  [<ffffffff8157a191>] ? dev_queue_xmit+0xe1/0x830
  [<ffffffff8106264d>] ? rcu_read_lock_bh_held+0x5d/0x90
  [<ffffffff8157a191>] dev_queue_xmit+0xe1/0x830
  [<ffffffff8157a0b0>] ? netdev_pick_tx+0x570/0x570
  [<ffffffffa0212a6a>] bond_start_xmit+0x1da/0x4b0 [bonding]
  [<ffffffff815796d0>] dev_hard_start_xmit+0x240/0x6b0
  [<ffffffff81597c6e>] sch_direct_xmit+0xfe/0x2a0
  [<ffffffff8157a249>] dev_queue_xmit+0x199/0x830
  [<ffffffff8157a0b0>] ? netdev_pick_tx+0x570/0x570
  [<ffffffff815ba96f>] ip_finish_output+0x5df/0x870
  [<ffffffff815ba4bd>] ? ip_finish_output+0x12d/0x870
  [<ffffffff815bb964>] ip_output+0x54/0xf0
  [<ffffffff815bad48>] ip_local_out+0x28/0x90
  [<ffffffff815bc444>] ip_send_skb+0x14/0x50
  [<ffffffff815bc4b2>] ip_push_pending_frames+0x32/0x40
  [<ffffffff815e536a>] raw_sendmsg+0x93a/0xc30
  [<ffffffff8128d570>] ? selinux_file_send_sigiotask+0x1f0/0x1f0
  [<ffffffff8109ddb4>] ? __lock_is_held+0x54/0x80
  [<ffffffff815f6730>] ? inet_recvmsg+0x220/0x220
  [<ffffffff8109ddb4>] ? __lock_is_held+0x54/0x80
  [<ffffffff815f6855>] inet_sendmsg+0x125/0x240
  [<ffffffff815f6730>] ? inet_recvmsg+0x220/0x220
  [<ffffffff8155cddb>] sock_sendmsg+0xab/0xe0
  [<ffffffff810a1650>] ? lock_release_non_nested+0xa0/0x2e0
  [<ffffffff810a1650>] ? lock_release_non_nested+0xa0/0x2e0
  [<ffffffff8155d18c>] __sys_sendmsg+0x37c/0x390
  [<ffffffff81195b2a>] ? fsnotify+0x2ca/0x7e0
  [<ffffffff811958e8>] ? fsnotify+0x88/0x7e0
  [<ffffffff81361f36>] ? put_ldisc+0x56/0xd0
  [<ffffffff8116f98a>] ? fget_light+0x3da/0x510
  [<ffffffff8155f6c4>] sys_sendmsg+0x44/0x80
  [<ffffffff8172fc22>] system_call_fastpath+0x16/0x1b

Avoid this problem using a distinct lock_class_key for bonding
devices.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-04 15:53:48 -04:00
Jiri Bohac da210f5590 bonding: add some slack to arp monitoring time limits
Currently, all the time limits in the bonding ARP monitor are in
multiples of arp_interval -- the time interval at which the ARP
monitor is periodically scheduled.

With a fast network round-trip and a little scheduling latency
of the ARP monitor work, a limit of n*delta_in_ticks may
effectively mean (n-1)*delta_in_ticks.

This is fatal in case of n==1  (the link will stay down
forever) and makes the behaviour non-deterministic in all the
other cases.

Add a delta_in_ticks/2 time slack to all the time limits.

Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-31 16:37:12 -04:00
John Eaglesham 6b923cb718 bonding: support for IPv6 transmit hashing
Currently the "bonding" driver does not support load balancing outgoing
traffic in LACP mode for IPv6 traffic. IPv4 (and TCP or UDP over IPv4)
are currently supported; this patch adds transmit hashing for IPv6 (and
TCP or UDP over IPv6), bringing IPv6 up to par with IPv4 support in the
bonding driver. In addition, bounds checking has been added to all
transmit hashing functions.

The algorithm chosen (xor'ing the bottom three quads of the source and
destination addresses together, then xor'ing each byte of that result into
the bottom byte, finally xor'ing with the last bytes of the MAC addresses)
was selected after testing almost 400,000 unique IPv6 addresses harvested
from server logs. This algorithm had the most even distribution for both
big- and little-endian architectures while still using few instructions. Its
behavior also attempts to closely match that of the IPv4 algorithm.

The IPv6 flow label was intentionally not included in the hash as it appears
to be unset in the vast majority of IPv6 traffic sampled, and the current
algorithm not using the flow label already offers a very even distribution.

Fragmented IPv6 packets are handled the same way as fragmented IPv4 packets,
ie, they are not balanced based on layer 4 information. Additionally,
IPv6 packets with intermediate headers are not balanced based on layer
4 information. In practice these intermediate headers are not common and
this should not cause any problems, and the alternative (a packet-parsing
loop and look-up table) seemed slow and complicated for little gain.

Tested-by: John Eaglesham <linux@8192.net>
Signed-off-by: John Eaglesham <linux@8192.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-22 22:49:30 -07:00
David S. Miller 1304a7343b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-08-22 14:21:38 -07:00
Amerigo Wang e15c3c2294 netpoll: check netpoll tx status on the right device
Although this doesn't matter actually, because netpoll_tx_running()
doesn't use the parameter, the code will be more readable.

For team_dev_queue_xmit() we have to move it down to avoid
compile errors.

Cc: David Miller <davem@davemloft.net>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 14:33:32 -07:00
Amerigo Wang 38e6bc185d netpoll: make __netpoll_cleanup non-block
Like the previous patch, slave_disable_netpoll() and __netpoll_cleanup()
may be called with read_lock() held too, so we should make them
non-block, by moving the cleanup and kfree() to call_rcu_bh() callbacks.

Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 14:33:30 -07:00
Amerigo Wang 47be03a28c netpoll: use GFP_ATOMIC in slave_enable_netpoll() and __netpoll_setup()
slave_enable_netpoll() and __netpoll_setup() may be called
with read_lock() held, so should use GFP_ATOMIC to allocate
memory. Eric suggested to pass gfp flags to __netpoll_setup().

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 14:33:30 -07:00
Amerigo Wang b7bc2a5b5b net: remove netdev_bonding_change()
I don't see any benifits to use netdev_bonding_change() than
using call_netdevice_notifiers() directly.

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 14:28:24 -07:00
Jiri Pirko df4ab5b3c2 net: rename bond_queue_mapping to slave_dev_queue_mapping
As this is going to be used not only by bonding.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 11:07:00 -07:00
Jiri Pirko d40156aa5e rtnl: allow to specify different num for rx and tx queue count
Also cut out unused function parameters and possible err in return
value.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 11:06:59 -07:00