tcp: gso: do not generate out of order packets

GSO TCP handler has following issues :

1) ooo_okay from original GSO packet is duplicated to all segments
2) segments (but the last one) are orphaned, so transmit path can not
get transmit queue number from the socket. This happens if GSO
segmentation is done before stacked device for example.

Result is we can send packets from a given TCP flow to different TX
queues (if using multiqueue NICS). This generates OOO problems and
spurious SACK & retransmits.

Fix this by keeping socket pointer set for all segments.

This means that every segment must also have a destructor, and the
original gso skb truesize must be split on all segments, to keep
precise sk->sk_wmem_alloc accounting.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: I9d9d4216ec24ad4091535a9b9e811760cd949825
This commit is contained in:
Eric Dumazet 2013-05-15 15:38:01 +00:00 committed by surblazer
parent 1315c0d65e
commit 0cf083915c

View file

@ -2761,6 +2761,7 @@ struct sk_buff *tcp_tso_segment(struct sk_buff *skb,
unsigned int mss;
struct sk_buff *gso_skb = skb;
__sum16 newcheck;
bool ooo_okay, copy_destructor;
if (!pskb_may_pull(skb, sizeof(*th)))
goto out;
@ -2799,10 +2800,18 @@ struct sk_buff *tcp_tso_segment(struct sk_buff *skb,
goto out;
}
copy_destructor = gso_skb->destructor == tcp_wfree;
ooo_okay = gso_skb->ooo_okay;
/* All segments but the first should have ooo_okay cleared */
skb->ooo_okay = 0;
segs = skb_segment(skb, features);
if (IS_ERR(segs))
goto out;
/* Only first segment might have ooo_okay set */
segs->ooo_okay = ooo_okay;
delta = htonl(oldlen + (thlen + mss));
skb = segs;
@ -2822,6 +2831,17 @@ struct sk_buff *tcp_tso_segment(struct sk_buff *skb,
thlen, skb->csum));
seq += mss;
if (copy_destructor) {
skb->destructor = gso_skb->destructor;
skb->sk = gso_skb->sk;
/* {tcp|sock}_wfree() use exact truesize accounting :
* sum(skb->truesize) MUST be exactly be gso_skb->truesize
* So we account mss bytes of 'true size' for each segment.
* The last segment will contain the remaining.
*/
skb->truesize = mss;
gso_skb->truesize -= mss;
}
skb = skb->next;
th = tcp_hdr(skb);
@ -2834,7 +2854,7 @@ struct sk_buff *tcp_tso_segment(struct sk_buff *skb,
* is freed at TX completion, and not right now when gso_skb
* is freed by GSO engine
*/
if (gso_skb->destructor == tcp_wfree) {
if (copy_destructor) {
swap(gso_skb->sk, skb->sk);
swap(gso_skb->destructor, skb->destructor);
swap(gso_skb->truesize, skb->truesize);