android_kernel_google_msm/net
Eric Dumazet e7e3467ab1 tcp: TCP Small Queues
This introduce TSQ (TCP Small Queues)

TSQ goal is to reduce number of TCP packets in xmit queues (qdisc &
device queues), to reduce RTT and cwnd bias, part of the bufferbloat
problem.

sk->sk_wmem_alloc not allowed to grow above a given limit,
allowing no more than ~128KB [1] per tcp socket in qdisc/dev layers at a
given time.

TSO packets are sized/capped to half the limit, so that we have two
TSO packets in flight, allowing better bandwidth use.

As a side effect, setting the limit to 40000 automatically reduces the
standard gso max limit (65536) to 40000/2 : It can help to reduce
latencies of high prio packets, having smaller TSO packets.

This means we divert sock_wfree() to a tcp_wfree() handler, to
queue/send following frames when skb_orphan() [2] is called for the
already queued skbs.

Results on my dev machines (tg3/ixgbe nics) are really impressive,
using standard pfifo_fast, and with or without TSO/GSO.

Without reduction of nominal bandwidth, we have reduction of buffering
per bulk sender :
< 1ms on Gbit (instead of 50ms with TSO)
< 8ms on 100Mbit (instead of 132 ms)

I no longer have 4 MBytes backlogged in qdisc by a single netperf
session, and both side socket autotuning no longer use 4 Mbytes.

As skb destructor cannot restart xmit itself ( as qdisc lock might be
taken at this point ), we delegate the work to a tasklet. We use one
tasklest per cpu for performance reasons.

If tasklet finds a socket owned by the user, it sets TSQ_OWNED flag.
This flag is tested in a new protocol method called from release_sock(),
to eventually send new segments.

[1] New /proc/sys/net/ipv4/tcp_limit_output_bytes tunable
[2] skb_orphan() is usually called at TX completion time,
  but some drivers call it in their start_xmit() handler.
  These drivers should at least use BQL, or else a single TCP
  session can still fill the whole NIC TX ring, since TSQ will
  have no effect.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Dave Taht <dave.taht@bufferbloat.net>
Cc: Tom Herbert <therbert@google.com>
Cc: Matt Mathis <mattmathis@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: I37d5e4d7c9ced1846385b6a04ae3ad134763a949
2020-11-30 19:35:00 +03:00
..
9p 9p: forgetting to cancel request on interrupted zero-copy RPC 2015-10-22 09:20:07 +08:00
802
8021q net: Replace u64_stats_fetch_begin_bh to u64_stats_fetch_begin_irq 2020-11-30 19:26:49 +03:00
appletalk appletalk: Fix socket referencing in skb 2014-07-28 07:06:45 -07:00
atm atm: deal with setting entry before mkip was called 2016-03-21 09:17:56 +08:00
ax25 net: add validation for the socket syscall protocol argument 2016-10-29 23:12:11 +08:00
batman-adv
bluetooth Bluetooth: cmtp: cmtp_add_connection() should verify that it's dealing with l2cap socket 2018-01-13 17:14:31 +03:00
bridge net: Explicitly initialize u64_stats_sync structures for lockdep 2020-11-30 19:26:40 +03:00
caif caif: remove wrong dev_net_set() call 2015-04-14 17:33:59 +08:00
can can: add missing initialisations in CAN related skbuffs 2015-06-19 11:40:23 +08:00
ceph crush: fix a bug in tree bucket decode 2015-10-22 09:20:07 +08:00
core tcp: TCP Small Queues 2020-11-30 19:35:00 +03:00
dcb
dccp Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
decnet net: Loopback ifindex is constant now 2018-08-27 14:52:48 +00:00
dns_resolver dns_resolver: Null-terminate the right string 2014-07-28 07:06:46 -07:00
dsa
econet
ethernet
ieee802154
ipv4 tcp: TCP Small Queues 2020-11-30 19:35:00 +03:00
ipv6 tcp: TCP Small Queues 2020-11-30 19:35:00 +03:00
ipx ipx: call ipxitf_put() in ioctl error path 2018-02-16 20:15:04 -07:00
irda Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
iucv
key net: Fix RCU splat in af_key 2016-03-21 09:17:52 +08:00
l2tp Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
lapb
llc Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
mac80211 Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
netfilter net: Replace u64_stats_fetch_begin_bh to u64_stats_fetch_begin_irq 2020-11-30 19:26:49 +03:00
netlabel
netlink Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
netrom
nfc
openvswitch net: Replace u64_stats_fetch_begin_bh to u64_stats_fetch_begin_irq 2020-11-30 19:26:49 +03:00
packet Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
phonet
rds RDS: fix race condition when sending a message on unbound socket 2016-03-21 09:17:54 +08:00
rfkill Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
rose NET: ROSE: Don't dereference NULL neighbour pointer. 2015-09-18 09:20:47 +08:00
rxrpc Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
sched Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
sctp Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
sunrpc vfs: make it possible to access the dentry hash/len as one 64-bit entry 2018-12-07 22:20:38 +04:00
tipc tipc: clear 'next'-pointer of message fragments before reassembly 2014-07-28 07:06:45 -07:00
unix pull mnt_want_write()/mnt_drop_write() into kern_path_create()/done_path_create() resp. 2018-12-07 22:28:48 +04:00
wanrouter
wimax
wireless cfg80211: Fix use after free when process wdev events 2020-10-25 02:37:54 -04:00
x25
xfrm xfrm: policy: check policy direction value 2018-02-16 20:15:07 -07:00
activity_stats.c
compat.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
Kconfig
Makefile
nonet.c
socket.c Merge remote-tracking branch 'stable/linux-3.4.y' into lineage-15.1 2017-12-27 17:13:15 +03:00
sysctl_net.c