android_kernel_google_msm/net/core
Eric Dumazet e7e3467ab1 tcp: TCP Small Queues
This introduce TSQ (TCP Small Queues)

TSQ goal is to reduce number of TCP packets in xmit queues (qdisc &
device queues), to reduce RTT and cwnd bias, part of the bufferbloat
problem.

sk->sk_wmem_alloc not allowed to grow above a given limit,
allowing no more than ~128KB [1] per tcp socket in qdisc/dev layers at a
given time.

TSO packets are sized/capped to half the limit, so that we have two
TSO packets in flight, allowing better bandwidth use.

As a side effect, setting the limit to 40000 automatically reduces the
standard gso max limit (65536) to 40000/2 : It can help to reduce
latencies of high prio packets, having smaller TSO packets.

This means we divert sock_wfree() to a tcp_wfree() handler, to
queue/send following frames when skb_orphan() [2] is called for the
already queued skbs.

Results on my dev machines (tg3/ixgbe nics) are really impressive,
using standard pfifo_fast, and with or without TSO/GSO.

Without reduction of nominal bandwidth, we have reduction of buffering
per bulk sender :
< 1ms on Gbit (instead of 50ms with TSO)
< 8ms on 100Mbit (instead of 132 ms)

I no longer have 4 MBytes backlogged in qdisc by a single netperf
session, and both side socket autotuning no longer use 4 Mbytes.

As skb destructor cannot restart xmit itself ( as qdisc lock might be
taken at this point ), we delegate the work to a tasklet. We use one
tasklest per cpu for performance reasons.

If tasklet finds a socket owned by the user, it sets TSQ_OWNED flag.
This flag is tested in a new protocol method called from release_sock(),
to eventually send new segments.

[1] New /proc/sys/net/ipv4/tcp_limit_output_bytes tunable
[2] skb_orphan() is usually called at TX completion time,
  but some drivers call it in their start_xmit() handler.
  These drivers should at least use BQL, or else a single TCP
  session can still fill the whole NIC TX ring, since TSQ will
  have no effect.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Dave Taht <dave.taht@bufferbloat.net>
Cc: Tom Herbert <therbert@google.com>
Cc: Matt Mathis <mattmathis@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: I37d5e4d7c9ced1846385b6a04ae3ad134763a949
2020-11-30 19:35:00 +03:00
..
datagram.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
dev.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
dev_addr_lists.c
drop_monitor.c net: drop_monitor: fix the value of maxattr 2014-01-15 15:27:10 -08:00
dst.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
ethtool.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
fib_rules.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
filter.c filter: prevent nla extensions to peek beyond the end of the message 2017-04-03 20:07:36 -06:00
flow.c
flow_dissector.c net: flow_dissector: fail on evil iph->ihl 2013-11-20 10:43:18 -08:00
gen_estimator.c
gen_stats.c
iovec.c iovec: make sure the caller actually wants anything in memcpy_fromiovecend 2014-08-14 08:42:36 +08:00
kmap_skb.h
link_watch.c
Makefile
neighbour.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
net-sysfs.c
net-sysfs.h
net-traces.c
net_namespace.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
netevent.c
netpoll.c
netprio_cgroup.c
pktgen.c pktgen: adjust spacing in proc file interface output 2015-10-22 09:20:02 +08:00
request_sock.c
rtnetlink.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
scm.c unix: correctly track in-flight fds in sending process user_struct 2017-06-26 16:09:55 +03:00
secure_seq.c netfilter: ipv6: add IPv6 NAT support 2018-12-07 22:02:09 +04:00
skbuff.c net: Correctly set segment mac_len in skb_segment(). 2014-08-14 08:42:36 +08:00
sock.c tcp: TCP Small Queues 2020-11-30 19:35:00 +03:00
sock_diag.c net: diag: Add the ability to destroy a socket. 2017-12-15 16:50:17 +03:00
stream.c
sysctl_net_core.c net: avoid to hang up on sending due to sysctl configuration overflow. 2016-03-21 09:17:56 +08:00
timestamping.c
user_dma.c
utils.c net: core: add function for incremental IPv6 pseudo header checksum updates 2018-12-07 22:02:09 +04:00