With increasing receive window sizes, but speed of light not improved
that much, out of order queue can contain a huge number of skbs, waiting
to be moved to receive_queue when missing packets can fill the holes.
Some devices happen to use fat skbs (truesize of 4096 + sizeof(struct
sk_buff)) to store regular (MTU <= 1500) frames. This makes highly
probable sk_rmem_alloc hits sk_rcvbuf limit, which can be 4Mbytes in
many cases.
When limit is hit, tcp stack calls tcp_collapse_ofo_queue(), a true
latency killer and cpu cache blower.
Doing the coalescing attempt each time we add a frame in ofo queue
permits to keep memory use tight and in many cases avoid the
tcp_collapse() thing later.
Tested on various wireless setups (b43, ath9k, ...) known to use big skb
truesize, this patch removed the "packets collapsed in receive queue due
to low socket buffer" I had before.
This also reduced average memory used by tcp sockets.
With help from Neal Cardwell.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: H.K. Jerry Chu <hkchu@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Split tcp_data_queue() in two parts for better readability.
tcp_data_queue_ofo() is responsible for queueing incoming skb into out
of order queue.
Change code layout so that the skb_set_owner_r() is performed only if
skb is not dropped.
This is a preliminary patch before "reduce out_of_order memory use"
following patch.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: H.K. Jerry Chu <hkchu@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kerin Millar reported hardlockups while running `conntrackd -c'
in a busy firewall. That system (with several processors) was
acting as backup in a primary-backup setup.
After several tries, I found a race condition between the deletion
operation of ctnetlink and timeout expiration. This patch fixes
this problem.
Tested-by: Kerin Millar <kerframil@gmail.com>
Reported-by: Kerin Millar <kerframil@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
I found recently that the arp_process function which handles all of our received
arp frames, is using IPV4_DEVCONF_ALL macro to check the state of the arp_process
flag. This seems wrong, as it implies that either none or all of the network
interfaces accept gratuitous arps. This patch corrects that, allowing
per-interface arp_accept configuration to deviate from the all setting. Note
this also brings us into line with the way the arp_filter setting is handled
during arp_process execution.
Tested this myself on my home network, and confirmed it works as expected.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip6_mc_find_dev_rcu() is called with rcu_read_lock(), so don't
need to dev_hold().
With dev_hold(), not corresponding dev_put(), will lead to leak.
[ bug introduced in 96b52e61be (ipv6: mcast: RCU conversions) ]
Signed-off-by: RongQing.Li <roy.qing.li@gmail.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit d47a0ac7b6 (sch_sfq: dont put new flow at the end of
flows)
As Jesper found out, patch sounded great but has bad side effects.
In stress situation, pushing new flows in front of the queue can prevent
old flows doing any progress. Packets can stay in SFQ queue for
unlimited amount of time.
It's possible to add heuristics to limit this problem, but this would
add complexity outside of SFQ scope.
A more sensible answer to Dave Taht concerns (who reported the issued I
tried to solve in original commit) is probably to use a qdisc hierarchy
so that high prio packets dont enter a potentially crowded SFQ qdisc.
Reported-by: Jesper Dangaard Brouer <jdb@comx.dk>
Cc: Dave Taht <dave.taht@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 87a115783 ( ipv6: Move xfrm_lookup() call down into
icmp6_dst_alloc().) forgot to convert one error path, leading
to crashes in mld_sendpack()
Many thanks to Dave Jones for providing a very complete bug report.
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
uapsd_queues and uapsd_max_sp_len are relevant only for managed
interfaces, and can be configured differently for each vif.
Move them from the local struct to sdata->u.mgd, and update
the debugfs functions accordingly.
Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Some debugfs write functions call kstrto* functions, which
assume the string is null-terminated. Make it valid by changing
ieee80211_if_write() to use static buffer instead of allocating
one, and set the last char to NULL.
(The write functions try to parse some integer/mac address,
so 64 bytes buffer should be enough)
Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The current max throughput rate is known to be good as otherwise it
wouldn't be the max throughput rate. Since rate sampling can introduce
some overhead (by adding RTS for example or due to not aggregating the
frame) don't sample the max throughput rate.
Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Acked-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When regulatory information changes our HT behavior (e.g,
when we get a country code from the AP we have just associated
with), we should use this information to change the power with
which we transmit, and what channels we transmit. Sometimes
the channel parameters we derive from regulatory information
contradicts the parameters we used in association. For example,
we could have associated specifying HT40, but the regulatory
rules we apply may forbid HT40 operation.
In the situation above, we should reconfigure ourselves to
transmit in HT20 only, however it makes no sense for us to
disable receive in HT40, since if we associated with these
parameters, the AP has every reason to expect we can and
will receive packets this way. The code in mac80211 does
not have the capability of sending the appropriate action
frames to signal a change in HT behaviour so the AP has
no clue we can no longer receive frames encoded this way.
In some broken AP implementations, this can leave us
effectively deaf if the AP never retries in lower HT rates.
This change breaks up the channel_type parameter in the
ieee80211_enable_ht function into a separate receive and
transmit part. It honors the channel flags set by regulatory
in order to configure the rate control algorithm, but uses
the capability flags to configure the channel on the radio,
since these were used in association to set the AP's transmit
rate.
Signed-off-by: Paul Stewart <pstew@chromium.org>
Cc: Sam Leffler <sleffler@chromium.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Reviewed-by: Luis R Rodriguez <mcgrof@frijolero.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This value is not really very useful by itself,
yet some drivers (including iwlwifi until I can
figure out what it should do) use it. At least
rename it to "last_tsf" to indicate the meaning
and add a note that it may be really old.
I suspect the value may become useful combined
with the rx_status->mactime, but we don't (yet)
store that value and pass it to the driver.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This is intended to be the timestamp sent by the
peer in the beacon/probe response, not any form
of host timestamp. Clarify the documentation and
variable names.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Not linearizing every SKB will help actually pass
non-linear SKBs all the way up when on an encrypted
connection. For now, linearize TKIP completely as
it is lower performance and I don't quite grok all
the details.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This is better done inside the WEP decrypt
function where it doesn't have to check all
the conditions any more since they've been
tested already.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Add #define pr_fmt(fmt) as appropriate.
Add "IPv4: ", "TCP: ", and "IPsec: " to appropriate files.
Standardize on "UDPLite: " for appropriate uses.
Some prefixes were previously "UDPLITE: " and "UDP-Lite: ".
Add KBUILD_MODNAME ": " to icmp and gre.
Remove embedded prefixes as appropriate.
Add missing "\n" to pr_info in gre.c.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The authentication and association handshake
already happens in the context of the new BSS,
and the basic rates are needed at least for
the ACK response frame to the authentication
or association response frames. Therefore the
basic rates should already be configured into
the driver when those frames are sent.
Change the logic to set up the basic rates in
the connection preparation that happens for
authentication and association (if needed).
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
As associating is possible without first authenticating
(for FT over DS) association also has to be able to
switch to the right channel, insert the station entry
etc. Factor out this common code into a new function
called ieee80211_prep_connection().
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The BSSID has been set a lot earlier already and
didn't change again in ieee80211_set_associated().
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Instead of setting assoc_data->wmm_used solely
based on the BSS also take into account our own
capabilities and later check those.
Also rename "wmm_used" and "uapsd_used" to just
"wmm" and "uapsd".
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Always set/use IEEE80211_STA_DISABLE_11N instead
of duplicating the queue, WMM and HT checks in
all places.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Looks like some changes in this area moved
the code but not the comment that belongs
to the code, move it to the right place.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Disable multi stream rates (MCS > 7) when a STA is in static SMPS mode
since it has only one active rx chain. Hence, it doesn't even make
sense to sample multi stream rates.
Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
As we've discussed, we want to avoid channel changes
while associated. While the part when we actually
associate needs a bit more work, the bit that happens
on disassociating can be changed quite easily. Move
the channel type change later in the disassociate
process to set the channel only after the driver was
told that it's now disassociated.
As the driver could expect powersave to be enabled
only when associated, this thus results in splitting
the config call, but overall what happens makes more
sense this way.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When the station state callback was added, this
was no longer needed in theory. With the iwlwifi
changes to remove use of it landing, we can kill
the entire tx-sync framework again, RIP.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
While setting up or tearing down a BA session mac80211 is buffering
pending frames for the according TID. However, there's currently no
limit on how many frames are buffered possibly leading to an out-of-
memory situation. This can happen on systems with little memory when
the CPU is fully loaded since the BA session work is executed in
process context while frames can still come via softirq.
Apply a limitation to the TIDs pending queue to avoid consuming
too much memory in this situation.
Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Receive background scan period as part of connect
command and pass the same to the driver.
Signed-off-by: Bala Shanmugam <bkamatch@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Use a more current kernel messaging style.
Convert a printk block to print_hex_dump.
Coalesce formats, align arguments.
Use %s, __func__ instead of embedding function names.
Some messages that were prefixed with <foo>_close are
now prefixed with <foo>_fini. Some ah4 and esp messages
are now not prefixed with "ip ".
The intent of this patch is to later add something like
#define pr_fmt(fmt) "IPv4: " fmt.
to standardize the output messages.
Text size is trivially reduced. (x86-32 allyesconfig)
$ size net/ipv4/built-in.o*
text data bss dec hex filename
887888 31558 249696 1169142 11d6f6 net/ipv4/built-in.o.new
887934 31558 249800 1169292 11d78c net/ipv4/built-in.o.old
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The following 4 functions:
move_addr_to_kernel
move_addr_to_user
verify_iovec
verify_compat_iovec
are always effectively called with a sockaddr_storage.
Make this explicit by changing their signature.
This removes a large number of casts from sockaddr_storage to sockaddr.
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With commit d6ddef9e641d(IPv6: Fix not join all-router mcast group
when forwarding set.) I check 'dev' after it's dereference that
leads to a Smatch complaint:
net/ipv6/addrconf.c:438 ipv6_add_dev()
warn: variable dereferenced before check 'dev' (see line 432)
net/ipv6/addrconf.c
431 /* protected by rtnl_lock */
432 rcu_assign_pointer(dev->ip6_ptr, ndev);
^^^^^^^^^^^^
Old dereference.
433
434 /* Join all-node multicast group */
435 ipv6_dev_mc_inc(dev, &in6addr_linklocal_allnodes);
436
437 /* Join all-router multicast group if forwarding is set
*/
438 if (ndev->cnf.forwarding && dev && (dev->flags &
IFF_MULTICAST))
^^^
Remove the check to avoid the complaint as 'dev' can't be NULL.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Li Wei <lw@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit ea4fc0d619 (ipv4: Don't use rt->rt_{src,dst} in ip_queue_xmit())
added a serious regression on synflood handling.
Simon Kirby discovered a successful connection was delayed by 20 seconds
before being responsive.
In my tests, I discovered that xmit frames were lost, and needed ~4
retransmits and a socket dst rebuild before being really sent.
In case of syncookie initiated connection, we use a different path to
initialize the socket dst, and inet->cork.fl.u.ip4 is left cleared.
As ip_queue_xmit() now depends on inet flow being setup, fix this by
copying the temp flowi4 we use in cookie_v4_check().
Reported-by: Simon Kirby <sim@netnation.com>
Bisected-by: Simon Kirby <sim@netnation.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Connection ID configured through RTNL must allow zero as
connection id. If connection-id is not given when creating the
interface, configure a loopback interface using ifindex as
connection-id.
Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kill faulty checks on flow-off leading to connection drop at race conditions.
caif_socket checks for flow-on before transmitting and goes to sleep or
return -EAGAIN upon flow stop. Remove faulty subsequent checks on flow-off
leading to connection drop. Also fix memory leaks on some of the errors paths.
Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
lookup sctp_association within sctp_do_peeloff() to enable its use outside of
the sctp code with minimal knowledge of the former.
Signed-off-by: Benjamin Poirier <bpoirier@suse.de>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As we invalidate the inetpeer tree along with the routing cache now,
we don't need a genid to reset the redirect handling when the routing
cache is flushed.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We initialize the routing metrics with the values cached on the
inetpeer in rt_init_metrics(). So if we have the metrics cached on the
inetpeer, we ignore the user configured fib_metrics.
To fix this issue, we replace the old tree with a fresh initialized
inet_peer_base. The old tree is removed later with a delayed work queue.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now we have:
eth0: link *down*
br0: port 1(eth0) entered *forwarding* state
br_log_state(p) should be called *after* p->state is set
to BR_STATE_DISABLED.
Reported-by: Zilvinas Valinskas <zilvinas@wilibox.com>
Signed-off-by: Paulius Zaleckas <paulius.zaleckas@gmail.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When br_log_state() is reporting state it should say "entered"
istead of "entering" since state at this point is already
changed.
Signed-off-by: Paulius Zaleckas <paulius.zaleckas@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>