android_kernel_google_msm/net/ipv6
Eric Dumazet e7e3467ab1 tcp: TCP Small Queues
This introduce TSQ (TCP Small Queues)

TSQ goal is to reduce number of TCP packets in xmit queues (qdisc &
device queues), to reduce RTT and cwnd bias, part of the bufferbloat
problem.

sk->sk_wmem_alloc not allowed to grow above a given limit,
allowing no more than ~128KB [1] per tcp socket in qdisc/dev layers at a
given time.

TSO packets are sized/capped to half the limit, so that we have two
TSO packets in flight, allowing better bandwidth use.

As a side effect, setting the limit to 40000 automatically reduces the
standard gso max limit (65536) to 40000/2 : It can help to reduce
latencies of high prio packets, having smaller TSO packets.

This means we divert sock_wfree() to a tcp_wfree() handler, to
queue/send following frames when skb_orphan() [2] is called for the
already queued skbs.

Results on my dev machines (tg3/ixgbe nics) are really impressive,
using standard pfifo_fast, and with or without TSO/GSO.

Without reduction of nominal bandwidth, we have reduction of buffering
per bulk sender :
< 1ms on Gbit (instead of 50ms with TSO)
< 8ms on 100Mbit (instead of 132 ms)

I no longer have 4 MBytes backlogged in qdisc by a single netperf
session, and both side socket autotuning no longer use 4 Mbytes.

As skb destructor cannot restart xmit itself ( as qdisc lock might be
taken at this point ), we delegate the work to a tasklet. We use one
tasklest per cpu for performance reasons.

If tasklet finds a socket owned by the user, it sets TSQ_OWNED flag.
This flag is tested in a new protocol method called from release_sock(),
to eventually send new segments.

[1] New /proc/sys/net/ipv4/tcp_limit_output_bytes tunable
[2] skb_orphan() is usually called at TX completion time,
  but some drivers call it in their start_xmit() handler.
  These drivers should at least use BQL, or else a single TCP
  session can still fill the whole NIC TX ring, since TSQ will
  have no effect.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Dave Taht <dave.taht@bufferbloat.net>
Cc: Tom Herbert <therbert@google.com>
Cc: Matt Mathis <mattmathis@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: I37d5e4d7c9ced1846385b6a04ae3ad134763a949
2020-11-30 19:35:00 +03:00
..
netfilter netfilter: xt_rpfilter: depend on raw or mangle table 2018-12-07 22:04:24 +04:00
addrconf.c net: Explicitly initialize u64_stats_sync structures for lockdep 2020-11-30 19:26:40 +03:00
addrconf_core.c
addrlabel.c ipv6/addrlabel: fix ip6addrlbl_get() 2016-10-26 23:15:41 +08:00
af_inet6.c net: Explicitly initialize u64_stats_sync structures for lockdep 2020-11-30 19:26:40 +03:00
ah6.c
anycast.c ipv6: clean up anycast when an interface is destroyed 2016-10-29 23:12:33 +08:00
datagram.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
esp6.c
exthdrs.c ipv6: add complete rcu protection around np->opt 2016-06-17 02:54:32 +00:00
exthdrs_core.c
fib6_rules.c
icmp.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
inet6_connection_sock.c Revert "net: core: Support UID-based routing." 2017-08-27 19:09:20 +03:00
inet6_hashtables.c net: do not call sock_put() on TIMEWAIT sockets 2013-11-04 04:23:40 -08:00
ip6_fib.c ipv6: update ip6_rt_last_gc every time GC is run 2016-10-26 23:15:43 +08:00
ip6_flowlabel.c
ip6_input.c ipv6: add option to drop unicast encapsulated in L2 multicast 2018-12-07 21:59:38 +04:00
ip6_output.c netfilter: nf_conntrack_ipv6: improve fragmentation handling 2018-12-07 22:02:09 +04:00
ip6_tunnel.c net: Replace u64_stats_fetch_begin_bh to u64_stats_fetch_begin_irq 2020-11-30 19:26:49 +03:00
ip6mr.c ipv4, fib: pass LOOPBACK_IFINDEX instead of 0 to flowi4_iif 2018-08-27 14:52:49 +00:00
ipcomp6.c
ipv6_sockglue.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
Kconfig
Makefile Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
mcast.c ipv6: some ipv6 statistic counters failed to disable bh 2014-04-26 17:13:18 -07:00
mip6.c
ndisc.c ipv6: add option to drop unsolicited neighbor advertisements 2018-12-07 21:59:38 +04:00
netfilter.c
output_core.c drivers/net, ipv6: Select IPv6 fragment idents for virtio UFO packets 2015-02-02 17:05:26 +08:00
ping.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
proc.c
protocol.c
raw.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
reassembly.c ipv6: drop packets with multiple fragmentation headers 2013-09-14 06:02:10 -07:00
route.c net: Add missing LOOPBACK_IFINDEX change in ipv6/route.c 2018-12-07 22:04:24 +04:00
sit.c net: Replace u64_stats_fetch_begin_bh to u64_stats_fetch_begin_irq 2020-11-30 19:26:49 +03:00
syncookies.c Revert "net: core: Support UID-based routing." 2017-08-27 19:09:20 +03:00
sysctl_net_ipv6.c net: add a sysctl to reflect the fwmark on replies 2014-05-12 22:39:57 -07:00
tcp_ipv6.c tcp: TCP Small Queues 2020-11-30 19:35:00 +03:00
tunnel6.c ipv6: fix tunnel error handling 2016-10-26 23:15:24 +08:00
udp.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
udp_impl.h ipv6: do not clear pinet6 field 2017-12-15 22:54:52 +03:00
udplite.c ipv6: do not clear pinet6 field 2017-12-15 22:54:52 +03:00
xfrm6_input.c
xfrm6_mode_beet.c
xfrm6_mode_ro.c
xfrm6_mode_transport.c
xfrm6_mode_tunnel.c
xfrm6_output.c ipv6: Fix IPsec pre-encap fragmentation check 2016-04-27 18:55:20 +08:00
xfrm6_policy.c xfrm6: release dev before returning error 2013-05-19 10:54:47 -07:00
xfrm6_state.c
xfrm6_tunnel.c