Commit graph

23974 commits

Author SHA1 Message Date
Eric Dumazet
05fe7848aa tcp: fix a potential deadlock in tcp_get_info()
Taking socket spinlock in tcp_get_info() can deadlock, as
inet_diag_dump_icsk() holds the &hashinfo->ehash_locks[i],
while packet processing can use the reverse locking order.

We could avoid this locking for TCP_LISTEN states, but lockdep would
certainly get confused as all TCP sockets share same lockdep classes.

[  523.722504] ======================================================
[  523.728706] [ INFO: possible circular locking dependency detected ]
[  523.734990] 4.1.0-dbg-DEV #1676 Not tainted
[  523.739202] -------------------------------------------------------
[  523.745474] ss/18032 is trying to acquire lock:
[  523.750002]  (slock-AF_INET){+.-...}, at: [<ffffffff81669d44>] tcp_get_info+0x2c4/0x360
[  523.758129]
[  523.758129] but task is already holding lock:
[  523.763968]  (&(&hashinfo->ehash_locks[i])->rlock){+.-...}, at: [<ffffffff816bcb75>] inet_diag_dump_icsk+0x1d5/0x6c0
[  523.774661]
[  523.774661] which lock already depends on the new lock.
[  523.774661]
[  523.782850]
[  523.782850] the existing dependency chain (in reverse order) is:
[  523.790326]
-> #1 (&(&hashinfo->ehash_locks[i])->rlock){+.-...}:
[  523.796599]        [<ffffffff811126bb>] lock_acquire+0xbb/0x270
[  523.802565]        [<ffffffff816f5868>] _raw_spin_lock+0x38/0x50
[  523.808628]        [<ffffffff81665af8>] __inet_hash_nolisten+0x78/0x110
[  523.815273]        [<ffffffff816819db>] tcp_v4_syn_recv_sock+0x24b/0x350
[  523.822067]        [<ffffffff81684d41>] tcp_check_req+0x3c1/0x500
[  523.828199]        [<ffffffff81682d09>] tcp_v4_do_rcv+0x239/0x3d0
[  523.834331]        [<ffffffff816842fe>] tcp_v4_rcv+0xa8e/0xc10
[  523.840202]        [<ffffffff81658fa3>] ip_local_deliver_finish+0x133/0x3e0
[  523.847214]        [<ffffffff81659a9a>] ip_local_deliver+0xaa/0xc0
[  523.853440]        [<ffffffff816593b8>] ip_rcv_finish+0x168/0x5c0
[  523.859624]        [<ffffffff81659db7>] ip_rcv+0x307/0x420

Lets use u64_sync infrastructure instead. As a bonus, 64bit
arches get optimized, as these are nop for them.

Fixes: 0df48c26d841 ("tcp: add tcpi_bytes_acked to tcp_info")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: I50c833b77f41111a6fa3a57834c6311f95573754
2020-11-30 19:31:51 +03:00
Marcelo Ricardo Leitner
461a9cb3c4 tcp: add tcpi_segs_in and tcpi_segs_out to tcp_info
This patch tracks the total number of inbound and outbound segments on a
TCP socket. One may use this number to have an idea on connection
quality when compared against the retransmissions.

RFC4898 named these : tcpEStatsPerfSegsIn and tcpEStatsPerfSegsOut

These are a 32bit field each and can be fetched both from TCP_INFO
getsockopt() if one has a handle on a TCP socket, or from inet_diag
netlink facility (iproute2/ss patch will follow)

Note that tp->segs_out was placed near tp->snd_nxt for good data
locality and minimal performance impact, while tp->segs_in was placed
near tp->bytes_received for the same reason.

Join work with Eric Dumazet.

Note that received SYN are accounted on the listener, but sent SYNACK
are not accounted.

Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: Ic3e5daea102e13fb24c5fb1ce6913d5ece13c521
2020-11-30 19:31:48 +03:00
Eric Dumazet
2afbb021d9 tcp: add tcpi_bytes_received to tcp_info
This patch tracks total number of payload bytes received on a TCP socket.
This is the sum of all changes done to tp->rcv_nxt

RFC4898 named this : tcpEStatsAppHCThruOctetsReceived

This is a 64bit field, and can be fetched both from TCP_INFO
getsockopt() if one has a handle on a TCP socket, or from inet_diag
netlink facility (iproute2/ss patch will follow)

Note that tp->bytes_received was placed near tp->rcv_nxt for
best data locality and minimal performance impact.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Matt Mathis <mattmathis@google.com>
Cc: Eric Salo <salo@google.com>
Cc: Martin Lau <kafai@fb.com>
Cc: Chris Rapier <rapier@psc.edu>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: Ie1fec6455c8c6cf9ae3e2df09bf55abb56b57286
2020-11-30 19:31:44 +03:00
Eric Dumazet
45970b0b74 tcp: add tcpi_bytes_acked to tcp_info
This patch tracks total number of bytes acked for a TCP socket.
This is the sum of all changes done to tp->snd_una, and allows
for precise tracking of delivered data.

RFC4898 named this : tcpEStatsAppHCThruOctetsAcked

This is a 64bit field, and can be fetched both from TCP_INFO
getsockopt() if one has a handle on a TCP socket, or from inet_diag
netlink facility (iproute2/ss patch will follow)

Note that tp->bytes_acked was placed near tp->snd_una for
best data locality and minimal performance impact.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Cc: Matt Mathis <mattmathis@google.com>
Cc: Eric Salo <salo@google.com>
Cc: Martin Lau <kafai@fb.com>
Cc: Chris Rapier <rapier@psc.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: I975b11b049d6bde1b6802dc4331a9da93c371c9b
2020-11-30 19:31:41 +03:00
Eric Dumazet
3e12c5a9d4 tcp: add pacing_rate information into tcp_info
Add two new fields to struct tcp_info, to report sk_pacing_rate
and sk_max_pacing_rate to monitoring applications, as ss from iproute2.

User exported fields are 64bit, even if kernel is currently using 32bit
fields.

lpaa5:~# ss -i
..
	 skmem:(r0,rb357120,t0,tb2097152,f1584,w1980880,o0,bl0) ts sack cubic
wscale:6,6 rto:400 rtt:0.875/0.75 mss:1448 cwnd:1 ssthresh:12 send
13.2Mbps pacing_rate 3336.2Mbps unacked:15 retrans:1/5448 lost:15
rcv_space:29200

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: I31149707ef565448d5f2f1b43f5759f308f824f5
2020-11-30 19:31:38 +03:00
Eric Dumazet
adc3caedd0 net: introduce SO_MAX_PACING_RATE
As mentioned in commit afe4fd0624 ("pkt_sched: fq: Fair Queue packet
scheduler"), this patch adds a new socket option.

SO_MAX_PACING_RATE offers the application the ability to cap the
rate computed by transport layer. Value is in bytes per second.

u32 val = 1000000;
setsockopt(sockfd, SOL_SOCKET, SO_MAX_PACING_RATE, &val, sizeof(val));

To be effectively paced, a flow must use FQ packet scheduler.

Note that a packet scheduler takes into account the headers for its
computations. The effective payload rate depends on MSS and retransmits
if any.

I chose to make this pacing rate a SOL_SOCKET option instead of a
TCP one because this can be used by other protocols.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: Iea51b6104a5420d5c9ed1ea7382cbd53bdb8f4be
2020-11-30 19:31:32 +03:00
Eric Dumazet
e5a9f4f1cc tcp: TSO packets automatic sizing
After hearing many people over past years complaining against TSO being
bursty or even buggy, we are proud to present automatic sizing of TSO
packets.

One part of the problem is that tcp_tso_should_defer() uses an heuristic
relying on upcoming ACKS instead of a timer, but more generally, having
big TSO packets makes little sense for low rates, as it tends to create
micro bursts on the network, and general consensus is to reduce the
buffering amount.

This patch introduces a per socket sk_pacing_rate, that approximates
the current sending rate, and allows us to size the TSO packets so
that we try to send one packet every ms.

This field could be set by other transports.

Patch has no impact for high speed flows, where having large TSO packets
makes sense to reach line rate.

For other flows, this helps better packet scheduling and ACK clocking.

This patch increases performance of TCP flows in lossy environments.

A new sysctl (tcp_min_tso_segs) is added, to specify the
minimal size of a TSO packet (default being 2).

A follow-up patch will provide a new packet scheduler (FQ), using
sk_pacing_rate as an input to perform optional per flow pacing.

This explains why we chose to set sk_pacing_rate to twice the current
rate, allowing 'slow start' ramp up.

sk_pacing_rate = 2 * cwnd * mss / srtt

v2: Neal Cardwell reported a suspect deferring of last two segments on
initial write of 10 MSS, I had to change tcp_tso_should_defer() to take
into account tp->xmit_size_goal_segs

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Van Jacobson <vanj@google.com>
Cc: Tom Herbert <therbert@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: If293e2c6a0f52fe0e2b4ee149c9242f8c5e46ef5
2020-11-30 19:30:45 +03:00
Eric W. Biederman
ec0a45cd4a net: Replace u64_stats_fetch_begin_bh to u64_stats_fetch_begin_irq
Replace the bh safe variant with the hard irq safe variant.

We need a hard irq safe variant to deal with netpoll transmitting
packets from hard irq context, and we need it in most if not all of
the places using the bh safe variant.

Except on 32bit uni-processor the code is exactly the same so don't
bother with a bh variant, just have a hard irq safe variant that
everyone can use.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[javelinanddart]: Merge conflicts were resolved for drivers that exist in 3.4,
and additionally a treewide find and replace was run for these functions.
Signed-off-by: Paul Keith <javelinanddart@gmail.com>
Change-Id: Ib74db793de5e546414a0599f23095f82f0e20c86
2020-11-30 19:26:49 +03:00
Li RongQing
a559d1f125 ipv6: fix the use of pcpu_tstats in ip6_tunnel
when read/write the 64bit data, the correct lock should be hold.

Fixes: 87b6d218f3 ("tunnel: implement 64 bits statistics")

Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Paul Keith <javelinanddart@gmail.com>
Change-Id: Ib90ecb495739eafb1563d5cfd3d2dc275e4944ea
2020-11-30 19:26:46 +03:00
Li RongQing
2e641339f3 ipv6: fix the use of pcpu_tstats in sit
when read/write the 64bit data, the correct lock should be hold.

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Paul Keith <javelinanddart@gmail.com>
Change-Id: Ie9c73149b86eb028e1646d1001bdb4310a72fe14
2020-11-30 19:26:43 +03:00
John Stultz
ad74346428 net: Explicitly initialize u64_stats_sync structures for lockdep
In order to enable lockdep on seqcount/seqlock structures, we
must explicitly initialize any locks.

The u64_stats_sync structure, uses a seqcount, and thus we need
to introduce a u64_stats_init() function and use it to initialize
the structure.

This unfortunately adds a lot of fairly trivial initialization code
to a number of drivers. But the benefit of ensuring correctness makes
this worth while.

Because these changes are required for lockdep to be enabled, and the
changes are quite trivial, I've not yet split this patch out into 30-some
separate patches, as I figured it would be better to get the various
maintainers thoughts on how to best merge this change along with
the seqcount lockdep enablement.

Feedback would be appreciated!

Signed-off-by: John Stultz <john.stultz@linaro.org>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: James Morris <jmorris@namei.org>
Cc: Jesse Gross <jesse@nicira.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Mirko Lindner <mlindner@marvell.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Roger Luethi <rl@hellgate.ch>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Simon Horman <horms@verge.net.au>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: Wensong Zhang <wensong@linux-vs.org>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/1381186321-4906-2-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Change-Id: Ieda06b95f0ec302dbef8576ef4c8fb4bd28ffb2f
2020-11-30 19:26:40 +03:00
Stephen Hemminger
39b326bec0 tcp: remove Appropriate Byte Count support
TCP Appropriate Byte Count was added by me, but later disabled.
There is no point in maintaining it since it is a potential source
of bugs and Linux already implements other better window protection
heuristics.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: Ifc93fa043256c275580a9f93611ad1d013889bc4
2020-11-30 19:26:36 +03:00
stephen hemminger
a9d180c942 tunnel: implement 64 bits statistics
Convert the per-cpu statistics kept for GRE, IPIP, and SIT tunnels
to use 64 bit statistics.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Paul Keith <javelinanddart@gmail.com>
Change-Id: I317ad78a7f3a31326a1bc6f3069712f348b38a6c
2020-11-30 19:26:33 +03:00
Jianmin Zhu
66d4bfb7da cfg80211: Fix use after free when process wdev events
"bssid" is only initialized out of the while loop, in case of two
events with same type: EVENT_CONNECT_RESULT, but one has zero
ether addr, the other is non-zero, the bssid pointer will be
referenced twice, which lead to use-after-free issue

Change-Id: Ie8a24275f7ec5c2f936ef0a802a42e5f63be9c71
CRs-Fixed: 2254305
Signed-off-by: Zhu Jianmin <jianminz@codeaurora.org>
2020-10-25 02:37:54 -04:00
Luca Weiss
e4cede11f4 ipv4: Pass struct flowi4 directly to rt_fill_info
This is partly a backport of d6c0a4f609
  (ipv4: Kill 'rt_src' from 'struct rtable').

skb->sk can be null, and in fact it is when creating the buffer
in inet_rtm_getroute. There is no other way of accessing the flow,
so pass it directly.

Fixes invalid memory address when running 'ip route get $IPADDR'

Change-Id: I7b9e5499614b96360c9c8420907e82e145bb97f3
2020-10-25 02:37:54 -04:00
Luca Weiss
4bfbd25a70 ipv4: Pass struct flowi4 directly to rt_fill_info
This is partly a backport of d6c0a4f609
  (ipv4: Kill 'rt_src' from 'struct rtable').

skb->sk can be null, and in fact it is when creating the buffer
in inet_rtm_getroute. There is no other way of accessing the flow,
so pass it directly.

Fixes invalid memory address when running 'ip route get $IPADDR'

Change-Id: I7b9e5499614b96360c9c8420907e82e145bb97f3
2020-06-02 17:24:51 +02:00
Jianmin Zhu
58d618c7b5 cfg80211: Fix use after free when process wdev events
"bssid" is only initialized out of the while loop, in case of two
events with same type: EVENT_CONNECT_RESULT, but one has zero
ether addr, the other is non-zero, the bssid pointer will be
referenced twice, which lead to use-after-free issue

Change-Id: Ie8a24275f7ec5c2f936ef0a802a42e5f63be9c71
CRs-Fixed: 2254305
Signed-off-by: Zhu Jianmin <jianminz@codeaurora.org>
2020-01-17 18:07:42 +01:00
followmsi
69672b517e Merge branch 'lineage-16.0' into followmsi-pie
Conflicts:
	arch/arm/configs/lineageos_flo_defconfig
2019-01-03 13:57:48 +01:00
Al Viro
579400a5b2 pull mnt_want_write()/mnt_drop_write() into kern_path_create()/done_path_create() resp.
One side effect - attempt to create a cross-device link on a read-only fs fails
with EROFS instead of EXDEV now.  Makes more sense, POSIX allows, etc.

Change-Id: I264b03a230dbd310f3b3671d2da06ceb2930179b
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-12-07 22:28:48 +04:00
Al Viro
2c1bd80538 new helper: done_path_create()
releases what needs to be released after {kern,user}_path_create()

Change-Id: If7fa7455e2ba8a6f4f4c4d2db502a38b4a60d7c7
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-12-07 22:28:48 +04:00
Linus Torvalds
89d83ed09c vfs: make it possible to access the dentry hash/len as one 64-bit entry
This allows comparing hash and len in one operation on 64-bit
architectures.  Right now only __d_lookup_rcu() takes advantage of this,
since that is the case we care most about.

The use of anonymous struct/unions hides the alternate 64-bit approach
from most users, the exception being a few cases where we initialize a
'struct qstr' with a static initializer.  This makes the problematic
cases use a new QSTR_INIT() helper function for that (but initializing
just the name pointer with a "{ .name = xyzzy }" initializer remains
valid, as does just copying another qstr structure).

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: I710810eb8717563264428e39c99e62599d31907e
2018-12-07 22:20:38 +04:00
stephen hemminger
5a64021cf7 netfilter: nf_conntrack: add include to fix sparse warning
Include header file to pickup prototype of nf_nat_seq_adjust_hook

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: I0cdf5dd8b9c5ad765920f8666d836f8c9fd835cd
2018-12-07 22:04:24 +04:00
Florian Westphal
0edd378e5e netfilter: xt_rpfilter: depend on raw or mangle table
rpfilter is only valid in raw/mangle PREROUTING, i.e.
RPFILTER=y|m is useless without raw or mangle table support.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: I26118146ace6e802b5700472222198ef31f9d562
2018-12-07 22:04:24 +04:00
Gao feng
93ca5c0e51 netfilter: use IS_ENABLE to replace if defined in TRACE target
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: If9b17fcdb53091906a15512e6cb74a210a0cd93f
2018-12-07 22:04:24 +04:00
Florian Westphal
73d5ec4729 netfilter: fix missing dependencies for NETFILTER_XT_MATCH_CONNLABEL
It was possible to set NF_CONNTRACK=n and NF_CONNTRACK_LABELS=y via
NETFILTER_XT_MATCH_CONNLABEL=y.

warning: (NETFILTER_XT_MATCH_CONNLABEL) selects NF_CONNTRACK_LABELS which has
unmet direct dependencies (NET && INET && NETFILTER && NF_CONNTRACK)

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: Ice48a81a97a7106a8f67b8b9e0ffb46400982005
2018-12-07 22:04:24 +04:00
Florian Westphal
6b989d3879 netfilter: add connlabel conntrack extension
similar to connmarks, except labels are bit-based; i.e.
all labels may be attached to a flow at the same time.

Up to 128 labels are supported.  Supporting more labels
is possible, but requires increasing the ct offset delta
from u8 to u16 type due to increased extension sizes.

Mapping of bit-identifier to label name is done in userspace.

The extension is enabled at run-time once "-m connlabel" netfilter
rules are added.

Change-Id: I98cfb16533e7e4bde1d73f2aa30e1be425f0b67e
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-12-07 22:04:24 +04:00
Jan Engelhardt
adf69c4966 netfilter: x_tables: print correct hook names for ARP
arptables 0.0.4 (released on 10th Jan 2013) supports calling the
CLASSIFY target, but on adding a rule to the wrong chain, the
diagnostic is as follows:

	# arptables -A INPUT -j CLASSIFY --set-class 0:0
	arptables: Invalid argument
	# dmesg | tail -n1
	x_tables: arp_tables: CLASSIFY target: used from hooks
	PREROUTING, but only usable from INPUT/FORWARD

This is incorrect, since xt_CLASSIFY.c does specify
(1 << NF_ARP_OUT) | (1 << NF_ARP_FORWARD).

This patch corrects the x_tables diagnostic message to print the
proper hook names for the NFPROTO_ARP case.

Affects all kernels down to and including v2.6.31.

Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: Ied71aaf676bf71199699299f1933524bb20fc486
2018-12-07 22:04:24 +04:00
Pablo Neira Ayuso
b40c7fe4b4 netfilter: nfnetlink_queue: add NFQA_CAP_LEN attribute
This patch adds the NFQA_CAP_LEN attribute that allows us to know
what is the real packet size from user-space (even if we decided
to retrieve just a few bytes from the packet instead of all of it).

Security software that inspects packets should always check for
this new attribute to make sure that it is inspecting the entire
packet.

This also helps to provide a workaround for the problem described
in: http://marc.info/?l=netfilter-devel&m=134519473212536&w=2

Original idea from Florian Westphal.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: I74a268651298a8284ea1c8c2dcd9b2d706cd6a29
2018-12-07 22:04:24 +04:00
Pablo Neira Ayuso
0313d86853 netfilter: nfnetlink_queue: fix maximum packet length to userspace
The packets that we send via NFQUEUE are encapsulated in the NFQA_PAYLOAD
attribute. The length of the packet in userspace is obtained via
attr->nla_len field. This field contains the size of the Netlink
attribute header plus the packet length.

If the maximum packet length is specified, ie. 65535 bytes, and
packets in the range of (65531,65535] are sent to userspace, the
attr->nla_len overflows and it reports bogus lengths to the
application.

To fix this, this patch limits the maximum packet length to 65531
bytes. If larger packet length is specified, the packet that we
send to user-space is truncated to 65531 bytes.

To support 65535 bytes packets, we have to revisit the idea of
the 32-bits Netlink attribute length.

Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: Id4d70b0b3e4d7fc3422130cc2989473d545530e1
2018-12-07 22:04:24 +04:00
Eric W. Biederman
27845b8f9f net: Allow userns root to control ipv4
Allow an unpriviled user who has created a user namespace, and then
created a network namespace to effectively use the new network
namespace, by reducing capable(CAP_NET_ADMIN) and
capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns,
CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls.

Settings that merely control a single network device are allowed.
Either the network device is a logical network device where
restrictions make no difference or the network device is hardware NIC
that has been explicity moved from the initial network namespace.

In general policy and network stack state changes are allowed
while resource control is left unchanged.

Allow creating raw sockets.
Allow the SIOCSARP ioctl to control the arp cache.
Allow the SIOCSIFFLAG ioctl to allow setting network device flags.
Allow the SIOCSIFADDR ioctl to allow setting a netdevice ipv4 address.
Allow the SIOCSIFBRDADDR ioctl to allow setting a netdevice ipv4 broadcast address.
Allow the SIOCSIFDSTADDR ioctl to allow setting a netdevice ipv4 destination address.
Allow the SIOCSIFNETMASK ioctl to allow setting a netdevice ipv4 netmask.
Allow the SIOCADDRT and SIOCDELRT ioctls to allow adding and deleting ipv4 routes.

Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for
adding, changing and deleting gre tunnels.

Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for
adding, changing and deleting ipip tunnels.

Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for
adding, changing and deleting ipsec virtual tunnel interfaces.

Allow setting the MRT_INIT, MRT_DONE, MRT_ADD_VIF, MRT_DEL_VIF, MRT_ADD_MFC,
MRT_DEL_MFC, MRT_ASSERT, MRT_PIM, MRT_TABLE socket options on multicast routing
sockets.

Allow setting and receiving IPOPT_CIPSO, IP_OPT_SEC, IP_OPT_SID and
arbitrary ip options.

Allow setting IP_SEC_POLICY/IP_XFRM_POLICY ipv4 socket option.
Allow setting the IP_TRANSPARENT ipv4 socket option.
Allow setting the TCP_REPAIR socket option.
Allow setting the TCP_CONGESTION socket option.

Change-Id: I3b6ce2465e354cd2865e1a7fe67d6e812f88b16a
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-07 22:04:24 +04:00
Pablo Neira Ayuso
cb7b3c7534 netfilter: fix missing dependencies for the NOTRACK target
warning: (NETFILTER_XT_TARGET_NOTRACK) selects NETFILTER_XT_TARGET_CT which has unmet direct
+dependencies (NET && INET && NETFILTER && NETFILTER_XTABLES && NF_CONNTRACK && (IP_NF_RAW ||
+IP6_NF_RAW) && NETFILTER_ADVANCED)

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: I620fe7c4e7e4aa37485b25466edff14061d2ca0c
2018-12-07 22:04:24 +04:00
Pablo Neira Ayuso
e224886882 netfilter: xt_CT: recover NOTRACK target support
Florian Westphal reported that the removal of the NOTRACK target
(9655050 netfilter: remove xt_NOTRACK) is breaking some existing
setups.

That removal was scheduled for removal since long time ago as
described in Documentation/feature-removal-schedule.txt

What:  xt_NOTRACK
Files: net/netfilter/xt_NOTRACK.c
When:  April 2011
Why:   Superseded by xt_CT

Still, people may have not notice / may have decided to stick to an
old iptables version. I agree with him in that some more conservative
approach by spotting some printk to warn users for some time is less
agressive.

Current iptables 1.4.16.3 already contains the aliasing support
that makes it point to the CT target, so upgrading would fix it.
Still, the policy so far has been to avoid pushing our users to
upgrade.

As a solution, this patch recovers the NOTRACK target inside the CT
target and it now spots a warning.

Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: I93292ad4f6e377df739e0366e36ab72786c3750d
2018-12-07 22:04:24 +04:00
Cong Wang
efd4c4ccf9 netfilter: remove xt_NOTRACK
It was scheduled to be removed for a long time.

Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netfilter@vger.kernel.org
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: Iae2ab6490a36ecde8fb9ee5b65e4448969f74b07
2018-12-07 22:04:24 +04:00
Arne Coucheron
70e12c60f4 net: Add missing LOOPBACK_IFINDEX change in ipv6/route.c
Missed in "net: Loopback ifindex is constant now" backport.

Change-Id: I7259ac064e9ece4520ff355d55ffd04646d384ce
2018-12-07 22:04:24 +04:00
Florent Fourcot
f499215262 netfilter: nf_conntrack_ipv6: fix comment for packets without data
Remove ambiguity of double negation.

Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Acked-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: Ife5632986d43a9e6e719415c8a234f100e8de958
2018-12-07 22:02:09 +04:00
Andrew Collins
d4739bc9f5 netfilter: nf_nat: Also handle non-ESTABLISHED routing changes in MASQUERADE
Since (a0ecb85 netfilter: nf_nat: Handle routing changes in MASQUERADE
target), the MASQUERADE target handles routing changes which affect
the output interface of a connection, but only for ESTABLISHED
connections.  It is also possible for NEW connections which
already have a conntrack entry to be affected by routing changes.

This adds a check to drop entries in the NEW+conntrack state
when the oif has changed.

Signed-off-by: Andrew Collins <bsderandrew@gmail.com>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: I6ae5c391d502f44cebc0563f51f4a65667efd0ae
2018-12-07 22:02:09 +04:00
Mukund Jampala
1b6cc2ac53 netfilter: ip[6]t_REJECT: fix wrong transport header pointer in TCP reset
The problem occurs when iptables constructs the tcp reset packet.
It doesn't initialize the pointer to the tcp header within the skb.
When the skb is passed to the ixgbe driver for transmit, the ixgbe
driver attempts to access the tcp header and crashes.
Currently, other drivers (such as our 1G e1000e or igb drivers) don't
access the tcp header on transmit unless the TSO option is turned on.

<1>BUG: unable to handle kernel NULL pointer dereference at 0000000d
<1>IP: [<d081621c>] ixgbe_xmit_frame_ring+0x8cc/0x2260 [ixgbe]
<4>*pdpt = 0000000085e5d001 *pde = 0000000000000000
<0>Oops: 0000 [#1] SMP
[...]
<4>Pid: 0, comm: swapper Tainted: P            2.6.35.12 #1 Greencity/Thurley
<4>EIP: 0060:[<d081621c>] EFLAGS: 00010246 CPU: 16
<4>EIP is at ixgbe_xmit_frame_ring+0x8cc/0x2260 [ixgbe]
<4>EAX: c7628820 EBX: 00000007 ECX: 00000000 EDX: 00000000
<4>ESI: 00000008 EDI: c6882180 EBP: dfc6b000 ESP: ced95c48
<4> DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
<0>Process swapper (pid: 0, ti=ced94000 task=ced73bd0 task.ti=ced94000)
<0>Stack:
<4> cbec7418 c779e0d8 c77cc888 c77cc8a8 0903010a 00000000 c77c0008 00000002
<4><0> cd4997c0 00000010 dfc6b000 00000000 d0d176c9 c77cc8d8 c6882180 cbec7318
<4><0> 00000004 00000004 cbec7230 cbec7110 00000000 cbec70c0 c779e000 00000002
<0>Call Trace:
<4> [<d0d176c9>] ? 0xd0d176c9
<4> [<d0d18a4d>] ? 0xd0d18a4d
<4> [<411e243e>] ? dev_hard_start_xmit+0x218/0x2d7
<4> [<411f03d7>] ? sch_direct_xmit+0x4b/0x114
<4> [<411f056a>] ? __qdisc_run+0xca/0xe0
<4> [<411e28b0>] ? dev_queue_xmit+0x2d1/0x3d0
<4> [<411e8120>] ? neigh_resolve_output+0x1c5/0x20f
<4> [<411e94a1>] ? neigh_update+0x29c/0x330
<4> [<4121cf29>] ? arp_process+0x49c/0x4cd
<4> [<411f80c9>] ? nf_hook_slow+0x3f/0xac
<4> [<4121ca8d>] ? arp_process+0x0/0x4cd
<4> [<4121ca8d>] ? arp_process+0x0/0x4cd
<4> [<4121c6d5>] ? T.901+0x38/0x3b
<4> [<4121c918>] ? arp_rcv+0xa3/0xb4
<4> [<4121ca8d>] ? arp_process+0x0/0x4cd
<4> [<411e1173>] ? __netif_receive_skb+0x32b/0x346
<4> [<411e19e1>] ? netif_receive_skb+0x5a/0x5f
<4> [<411e1ea9>] ? napi_skb_finish+0x1b/0x30
<4> [<d0816eb4>] ? ixgbe_xmit_frame_ring+0x1564/0x2260 [ixgbe]
<4> [<41013468>] ? lapic_next_event+0x13/0x16
<4> [<410429b2>] ? clockevents_program_event+0xd2/0xe4
<4> [<411e1b03>] ? net_rx_action+0x55/0x127
<4> [<4102da1a>] ? __do_softirq+0x77/0xeb
<4> [<4102dab1>] ? do_softirq+0x23/0x27
<4> [<41003a67>] ? do_IRQ+0x7d/0x8e
<4> [<41002a69>] ? common_interrupt+0x29/0x30
<4> [<41007bcf>] ? mwait_idle+0x48/0x4d
<4> [<4100193b>] ? cpu_idle+0x37/0x4c
<0>Code: df 09 d7 0f 94 c2 0f b6 d2 e9 e7 fb ff ff 31 db 31 c0 e9 38
ff ff ff 80 78 06 06 0f 85 3e fb ff ff 8b 7c 24 38 8b 8f b8 00 00 00
<0f> b6 51 0d f6 c2 01 0f 85 27 fb ff ff 80 e2 02 75 0d 8b 6c 24
<0>EIP: [<d081621c>] ixgbe_xmit_frame_ring+0x8cc/0x2260 [ixgbe] SS:ESP

Signed-off-by: Mukund Jampala <jbmukund@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: I4866944d4992f703e55afcb20e9746a416d3d498
2018-12-07 22:02:09 +04:00
Jozsef Kadlecsik
2e46149f8c netfilter: nf_nat: Handle routing changes in MASQUERADE target
When the route changes (backup default route, VPNs) which affect a
masqueraded target, the packets were sent out with the outdated source
address. The patch addresses the issue by comparing the outgoing interface
directly with the masqueraded interface in the nat table.

Events are inefficient in this case, because it'd require adding route
events to the network core and then scanning the whole conntrack table
and re-checking the route for all entry.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: I148103e8a4b257d004b8f0f03455243a28ff0eec
2018-12-07 22:02:09 +04:00
Patrick McHardy
5a296c2724 netfilter: ip6tables: add stateless IPv6-to-IPv6 Network Prefix Translation target
Signed-off-by: Patrick McHardy <kaber@trash.net>
Change-Id: I0508ce6f05db08fa455814ef3b86e189687d3087
2018-12-07 22:02:09 +04:00
Patrick McHardy
e5a2b99b4d netfilter: ip6tables: add NETMAP target
Signed-off-by: Patrick McHardy <kaber@trash.net>
Change-Id: Ifc6a92823b7a29d3500e998ab16436b01dd440cb
2018-12-07 22:02:09 +04:00
Patrick McHardy
1e28beb0aa netfilter: ip6tables: add REDIRECT target
Signed-off-by: Patrick McHardy <kaber@trash.net>
Change-Id: I9e6ba8908914bd52a771da6b73f458036c2b30eb
2018-12-07 22:02:09 +04:00
Patrick McHardy
dd6490947e netfilter: ip6tables: add MASQUERADE target
Signed-off-by: Patrick McHardy <kaber@trash.net>
Change-Id: I2296776746567afe4f8067628308c88ba2233e5c
2018-12-07 22:02:09 +04:00
Patrick McHardy
104f2e20f9 netfilter: ipv6: add IPv6 NAT support
Signed-off-by: Patrick McHardy <kaber@trash.net>
Change-Id: I843d848e57d95e8fc932d95ca939a8c94fc8cfd8
2018-12-07 22:02:09 +04:00
Patrick McHardy
b06bea8f8f net: core: add function for incremental IPv6 pseudo header checksum updates
Add inet_proto_csum_replace16 for incrementally updating IPv6 pseudo header
checksums for IPv6 NAT.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: David S. Miller <davem@davemloft.net>
Change-Id: I858e211dfe3e1f4dc9daa42db3bd6fc535887324
2018-12-07 22:02:09 +04:00
Patrick McHardy
5e85f6149f netfilter: nf_conntrack_ipv6: fix tracking of ICMPv6 error messages containing fragments
ICMPv6 error messages are tracked by extracting the conntrack tuple of
the inner packet and looking up the corresponding conntrack entry. Tuple
extraction uses the ->get_l4proto() callback, which in case of fragments
returns NEXTHDR_FRAGMENT instead of the upper protocol, even for the
first fragment when the entire next header is present, resulting in a
failure to find the correct connection tracking entry.

This patch changes ipv6_get_l4proto() to use ipv6_skip_exthdr() instead
of nf_ct_ipv6_skip_exthdr() in order to skip fragment headers when the
fragment offset is zero.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Change-Id: Ic8a86d7038676b044c89700226225ffa20e01079
2018-12-07 22:02:09 +04:00
Patrick McHardy
d1e8c3336e ipv4: fix path MTU discovery with connection tracking
IPv4 conntrack defragments incoming packet at the PRE_ROUTING hook and
(in case of forwarded packets) refragments them at POST_ROUTING
independent of the IP_DF flag. Refragmentation uses the dst_mtu() of
the local route without caring about the original fragment sizes,
thereby breaking PMTUD.

This patch fixes this by keeping track of the largest received fragment
with IP_DF set and generates an ICMP fragmentation required error during
refragmentation if that size exceeds the MTU.

Change-Id: Ibac77b728baba05841286ea5a8a2089d56e6ad65
Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: David S. Miller <davem@davemloft.net>
2018-12-07 22:02:09 +04:00
Patrick McHardy
69adc8757e netfilter: nf_conntrack_ipv6: improve fragmentation handling
The IPv6 conntrack fragmentation currently has a couple of shortcomings.
Fragmentes are collected in PREROUTING/OUTPUT, are defragmented, the
defragmented packet is then passed to conntrack, the resulting conntrack
information is attached to each original fragment and the fragments then
continue their way through the stack.

Helper invocation occurs in the POSTROUTING hook, at which point only
the original fragments are available. The result of this is that
fragmented packets are never passed to helpers.

This patch improves the situation in the following way:

- If a reassembled packet belongs to a connection that has a helper
  assigned, the reassembled packet is passed through the stack instead
  of the original fragments.

- During defragmentation, the largest received fragment size is stored.
  On output, the packet is refragmented if required. If the largest
  received fragment size exceeds the outgoing MTU, a "packet too big"
  message is generated, thus behaving as if the original fragments
  were passed through the stack from an outside point of view.

- The ipv6_helper() hook function can't receive fragments anymore for
  connections using a helper, so it is switched to use ipv6_skip_exthdr()
  instead of the netfilter specific nf_ct_ipv6_skip_exthdr() and the
  reassembled packets are passed to connection tracking helpers.

The result of this is that we can properly track fragmented packets, but
still generate ICMPv6 Packet too big messages if we would have before.

This patch is also required as a precondition for IPv6 NAT, where NAT
helpers might enlarge packets up to a point that they require
fragmentation. In that case we can't generate Packet too big messages
since the proper MTU can't be calculated in all cases (f.i. when
changing textual representation of a variable amount of addresses),
so the packet is transparently fragmented iff the original packet or
fragments would have fit the outgoing MTU.

IPVS parts by Jesper Dangaard Brouer <brouer@redhat.com>.

Change-Id: I75d83668e7de723fb271232f475f46f4037a4a4f
Signed-off-by: Patrick McHardy <kaber@trash.net>
2018-12-07 22:02:09 +04:00
Pablo Neira Ayuso
c788f9bf3e netfilter: nf_ct_ftp: add sequence tracking pickup facility for injected entries
This patch allows the FTP helper to pickup the sequence tracking from
the first packet seen. This is useful to fix the breakage of the first
FTP command after the failover while using conntrackd to synchronize
states.

The seq_aft_nl_num field in struct nf_ct_ftp_info has been shrinked to
16-bits (enough for what it does), so we can use the remaining 16-bits
to store the flags while using the same size for the private FTP helper
data.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: I4463de3e064fb13f7b1dc58ef1dd0f91d40f1bbc
2018-12-07 22:02:09 +04:00
Pablo Neira Ayuso
057e9ad3b0 netfilter: nf_nat: support IPv6 in TFTP NAT helper
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Change-Id: If5534638ee9e5d2083eb503ce4e94777e9889876
2018-12-07 22:02:09 +04:00
Pablo Neira Ayuso
0462c8af52 netfilter: nf_nat: support IPv6 in IRC NAT helper
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Change-Id: Iba939b75c976019614c616e28bbca6b5457fada8
2018-12-07 22:02:09 +04:00
Patrick McHardy
d484ac1667 netfilter: nf_nat: support IPv6 in SIP NAT helper
Add IPv6 support to the SIP NAT helper. There are no functional differences
to IPv4 NAT, just different formats for addresses.

Change-Id: I151f527731d4724606203ca82244b5aad4b9e026
Signed-off-by: Patrick McHardy <kaber@trash.net>
2018-12-07 22:02:09 +04:00
Patrick McHardy
39cce7ba50 netfilter: nf_nat: support IPv6 in amanda NAT helper
Signed-off-by: Patrick McHardy <kaber@trash.net>
Change-Id: I52b59bcdfbff5ad94cfda035cd965c3f89cede9d
2018-12-07 22:02:09 +04:00
Patrick McHardy
0d2cb57410 netfilter: nf_nat: support IPv6 in FTP NAT helper
Signed-off-by: Patrick McHardy <kaber@trash.net>
Change-Id: Ieed29cda783b6ad03cb7a4a1a6b594cf4503007c
2018-12-07 22:02:09 +04:00
Elison Niven
66c00242ab netfilter: xt_nat: fix incorrect hooks for SNAT and DNAT targets
In (c7232c9 netfilter: add protocol independent NAT core), the
hooks were accidentally modified:

SNAT hooks are POST_ROUTING and LOCAL_IN (before it was LOCAL_OUT).
DNAT hooks are PRE_ROUTING and LOCAL_OUT (before it was LOCAL_IN).

Signed-off-by: Elison Niven <elison.niven@cyberoam.com>
Signed-off-by: Sanket Shah <sanket.shah@cyberoam.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: Idf352cdc22185f53702efdef755412137e9f95f1
2018-12-07 22:02:09 +04:00
Ulrich Weber
4f4747fed3 netfilter: nf_nat: remove obsolete rcu_read_unlock call
hlist walk in find_appropriate_src() is not protected anymore by rcu_read_lock(),
so rcu_read_unlock() is unnecessary if in_range() matches.

This bug was added in (c7232c9 netfilter: add protocol independent NAT core).

Signed-off-by: Ulrich Weber <ulrich.weber@sophos.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: I28d41758f0654cae9851d78a88f51c9f2f62234d
2018-12-07 22:02:09 +04:00
Pablo Neira Ayuso
2ee145fb54 netfilter: ctnetlink: fix module auto-load in ctnetlink_parse_nat
(c7232c9 netfilter: add protocol independent NAT core) added
incorrect locking for the module auto-load case in ctnetlink_parse_nat.

That function is always called from ctnetlink_create_conntrack which
requires no locking.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: I035dcfb9aa109541dc69cf53a145322af912b137
2018-12-07 22:02:09 +04:00
Pablo Neira Ayuso
8e5c18ea70 netfilter: fix crash during boot if NAT has been compiled built-in
(c7232c9 netfilter: add protocol independent NAT core) introduced a
problem that leads to crashing during boot due to NULL pointer
dereference. It seems that xt_nat calls xt_register_target() before
xt_init():

net/netfilter/x_tables.c:static struct xt_af *xt; is NULL and we crash on
xt_register_target(struct xt_target *target)
{
        u_int8_t af = target->family;
        int ret;

        ret = mutex_lock_interruptible(&xt[af].mutex);
...

Fix this by changing the linking order, to make sure that x_tables
comes before xt_nat.

Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: I10eff95a9afc013663404d8e7392221630d8e141
2018-12-07 22:02:09 +04:00
Patrick McHardy
7ce958755b netfilter: add protocol independent NAT core
Convert the IPv4 NAT implementation to a protocol independent core and
address family specific modules.

Change-Id: I926b42af53b37c96fb654021e7f568450e8c63c0
Signed-off-by: Patrick McHardy <kaber@trash.net>
2018-12-07 22:02:09 +04:00
Patrick McHardy
d7349a49b5 netfilter: nf_nat: add protoff argument to packet mangling functions
For mangling IPv6 packets the protocol header offset needs to be known
by the NAT packet mangling functions. Add a so far unused protoff argument
and convert the conntrack and NAT helpers to use it in preparation of
IPv6 NAT.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Change-Id: I34f8597c072b0e9d99d19ae958c1788e3d8d51ee
2018-12-07 22:02:09 +04:00
Patrick McHardy
1aa2e7331a netfilter: nf_conntrack: restrict NAT helper invocation to IPv4
The NAT helpers currently only handle IPv4 packets correctly. Restrict
invocation of the helpers to IPv4 in preparation of IPv6 NAT.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Change-Id: Ided2edb286d4098d4835bf739aafce71cecb3876
2018-12-07 22:02:09 +04:00
Patrick McHardy
b9df259c9b netfilter: nf_ct_sip: fix IPv6 address parsing
Within SIP messages IPv6 addresses are enclosed in square brackets in most
cases, with the exception of the "received=" header parameter. Currently
the helper fails to parse enclosed addresses.

This patch:

- changes the SIP address parsing function to enforce square brackets
  when required, and accept them when not required but present, as
  recommended by RFC 5118.

- adds a new SDP address parsing function that never accepts square
  brackets since SDP doesn't use them.

With these changes, the SIP helper correctly parses all test messages
from RFC 5118 (Session Initiation Protocol (SIP) Torture Test Messages
for Internet Protocol Version 6 (IPv6)).

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: I269237f68085de081ad6b3779f9806cd8380bbc9
2018-12-07 22:02:09 +04:00
Patrick McHardy
241a92f0d1 netfilter: nf_ct_sip: fix helper name
Commit 3a8fc53a (netfilter: nf_ct_helper: allocate 16 bytes for the helper
and policy names) introduced a bug in the SIP helper, the helper name is
sprinted to the sip_names array instead of instead of into the helper
structure. This breaks the helper match and the /proc/net/nf_conntrack_expect
output.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: I3ba18ddaef444f0a50f49738f2f4718e338ec9b3
2018-12-07 22:02:09 +04:00
Pablo Neira Ayuso
4d0eab37de netfilter: ctnetlink: fix compilation with NF_CONNTRACK_EVENTS=n
This patch fixes compilation with NF_CONNTRACK_EVENTS=n and
NETFILTER_NETLINK_QUEUE_CT=y.

I'm leaving all those static inline functions that calculate the size
of the event message out of the ifdef area of NF_CONNTRACK_EVENTS since
they will not be included by gcc in case they are unused.

Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: Ied1198a98d2fbbe81e836b317cc3a835644f6bef
2018-12-07 22:02:09 +04:00
Pablo Neira Ayuso
edf49156e2 netfilter: nfnetlink_queue: fix sparse warning due to missing include
This patch fixes a sparse warning due to missing include header file.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: Ieeeae7f9a48a69fa45e824d8d03feb3bd5faba3e
2018-12-07 22:02:09 +04:00
Pablo Neira Ayuso
b3bc43d3b7 netfilter: nfnetlink_queue: fix compilation with CONFIG_NF_NAT=m and CONFIG_NF_CT_NETLINK=y
LD      init/built-in.o
net/built-in.o:(.data+0x4408): undefined reference to `nf_nat_tcp_seq_adjust'
make: *** [vmlinux] Error 1

This patch adds a new pointer hook (nfq_ct_nat_hook) similar to other existing
in Netfilter to solve our complicated configuration dependencies.

Reported-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: I332521b61f008793381ba6eeb3d3a34369adaea7
2018-12-07 22:02:09 +04:00
Pablo Neira Ayuso
dac6d0969f netfilter: nfq_ct_hook needs __rcu and __read_mostly
This removes some sparse warnings.

Reported-by: Fengguang Wu <wfg@linux.intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: I1a40288d4c5acc97885a07011abdc0ef88710880
2018-12-07 22:02:09 +04:00
Pablo Neira Ayuso
a506f5d363 netfilter: nfnetlink_queue: fix compilation with NF_CONNTRACK disabled
In "9cb0176 netfilter: add glue code to integrate nfnetlink_queue and ctnetlink"
the compilation with NF_CONNTRACK disabled is broken. This patch fixes this
issue.

I have moved the conntrack part into nfnetlink_queue_ct.c to avoid
peppering the entire nfnetlink_queue.c code with ifdefs.

I also needed to rename nfnetlink_queue.c to nfnetlink_queue_pkt.c
to update the net/netfilter/Makefile to support conditional compilation
of the conntrack integration.

This patch also adds CONFIG_NETFILTER_QUEUE_CT in case you want to explicitly
disable the integration between nf_conntrack and nfnetlink_queue.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: I211402286b8e154f13af067cfbf470ab6a635282
2018-12-07 22:02:09 +04:00
Pablo Neira Ayuso
9aedc06c77 netfilter: fix compilation of the nfnl_cthelper if NF_CONNTRACK is unset
This patch fixes the compilation of net/netfilter/nfnetlink_cthelper.c
if CONFIG_NF_CONNTRACK is not set.

This patch also moves the definition of the cthelper infrastructure to
the scope of NF_CONNTRACK things.

I have also renamed NETFILTER_NETLINK_CTHELPER by NF_CT_NETLINK_HELPER,
to use similar names to other nf_conntrack_netlink extensions. Better now
that this has been only for two days in David's tree.

Two new dependencies have been added:

* NF_CT_NETLINK
* NETFILTER_NETLINK_QUEUE

Since these infrastructure requires both ctnetlink and nfqueue.

Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: I7e5a5893dcc11f4da6765d4716ba1399356e246b
2018-12-07 22:02:09 +04:00
Pablo Neira Ayuso
27d7221df0 netfilter: nf_ct_helper: disable automatic helper re-assignment of different type
This patch modifies __nf_ct_try_assign_helper in a way that invalidates support
for the following scenario:

1) attach the helper A for first time when the conntrack is created
2) attach new (different) helper B due to changes the reply tuple caused by NAT

eg. port redirection from TCP/21 to TCP/5060 with both FTP and SIP helpers
loaded, which seems to be a quite unorthodox scenario.

I can provide a more elaborated patch to support this scenario but explicit
helper attachment provides a better solution for this since now the use can
attach the helpers consistently, without relying on the automatic helper
lookup magic.

This patch fixes a possible out of bound zeroing of the conntrack helper
extension if the helper B uses more memory for its private data than
helper A.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: If4b9e3226358dc3e77e1b1cf1a47204ab6643e88
2018-12-07 22:02:09 +04:00
Pablo Neira Ayuso
d59a1ab483 netfilter: ctnetlink: fix NULL dereference while trying to change helper
The patch 1afc56794e: "netfilter: nf_ct_helper: implement variable
length helper private data" from Jun 7, 2012, leads to the following
Smatch complaint:

net/netfilter/nf_conntrack_netlink.c:1231 ctnetlink_change_helper()
         error: we previously assumed 'help->helper' could be null (see line 1228)

This NULL dereference can be triggered with the following sequence:

1) attach the helper for first time when the conntrack is created.
2) remove the helper module or detach the helper from the conntrack
   via ctnetlink.
3) attach helper again (the same or different one, no matter) to the
   that existing conntrack again via ctnetlink.

This patch fixes the problem by removing the use case that allows you
to re-assign again a helper for one conntrack entry via ctnetlink since
I cannot find any practical use for it.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: Id7c741812457c98032b8d52c7fbf0ea9e059c388
2018-12-07 22:02:09 +04:00
Pablo Neira Ayuso
f3f602eba6 netfilter: nf_ct_ext: support variable length extensions
We can now define conntrack extensions of variable size. This
patch is useful to get rid of these unions:

union nf_conntrack_help
union nf_conntrack_proto
union nf_conntrack_nat_help

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: Icc8a18ec32f495a1e18afebc748bfdcff6b9c658
2018-12-07 22:02:09 +04:00
Pablo Neira Ayuso
d9bd97f7e3 netfilter: ctnetlink: add CTA_HELP_INFO attribute
This attribute can be used to modify and to dump the internal
protocol information.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: Ifa704afd118b8f9b860ebf8ffba843ea804f815d
2018-12-07 22:02:09 +04:00
Pablo Neira Ayuso
678d79d6bc netfilter: nf_ct_helper: implement variable length helper private data
This patch uses the new variable length conntrack extensions.

Instead of using union nf_conntrack_help that contain all the
helper private data information, we allocate variable length
area to store the private helper data.

This patch includes the modification of all existing helpers.
It also includes a couple of include header to avoid compilation
warnings.

Change-Id: I2b855f3687c16ac0996053006d0543ad05411acd
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-12-07 22:02:09 +04:00
Pablo Neira Ayuso
b258b03178 netfilter: nf_ct_helper: allocate 16 bytes for the helper and policy names
This patch modifies the struct nf_conntrack_helper to allocate
the room for the helper name. The maximum length is 16 bytes
(this was already introduced in 2.6.24).

For the maximum length for expectation policy names, I have
also selected 16 bytes.

This patch is required by the follow-up patch to support
user-space connection tracking helpers.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: I9ed449622e5c101c87519604b1c6ea37c0bd91cd
2018-12-07 22:02:09 +04:00
Pablo Neira Ayuso
26948d946b netfilter: nfnetlink_queue: add NAT TCP sequence adjustment if packet mangled
User-space programs that receive traffic via NFQUEUE may mangle packets.
If NAT is enabled, this usually puzzles sequence tracking, leading to
traffic disruptions.

With this patch, nfnl_queue will make the corresponding NAT TCP sequence
adjustment if:

1) The packet has been mangled,
2) the NFQA_CFG_F_CONNTRACK flag has been set, and
3) NAT is detected.

There are some records on the Internet complaning about this issue:
http://stackoverflow.com/questions/260757/packet-mangling-utilities-besides-iptables

By now, we only support TCP since we have no helpers for DCCP or SCTP.
Better to add this if we ever have some helper over those layer 4 protocols.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: Id74b2db6fa3264a322d682b64a3104ce6ff42eaf
2018-12-07 22:02:09 +04:00
Pablo Neira Ayuso
5ffb7798c1 netfilter: add glue code to integrate nfnetlink_queue and ctnetlink
This patch allows you to include the conntrack information together
with the packet that is sent to user-space via NFQUEUE.

Previously, there was no integration between ctnetlink and
nfnetlink_queue. If you wanted to access conntrack information
from your libnetfilter_queue program, you required to query
ctnetlink from user-space to obtain it. Thus, delaying the packet
processing even more.

Including the conntrack information is optional, you can set it
via NFQA_CFG_F_CONNTRACK flag with the new NFQA_CFG_FLAGS attribute.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: Ibbc9436d0dada42bc780314dd762e635c4a15b71
2018-12-07 22:02:09 +04:00
Pablo Neira Ayuso
191c3003e8 netfilter: add user-space connection tracking helper infrastructure
There are good reasons to supports helpers in user-space instead:

* Rapid connection tracking helper development, as developing code
  in user-space is usually faster.

* Reliability: A buggy helper does not crash the kernel. Moreover,
  we can monitor the helper process and restart it in case of problems.

* Security: Avoid complex string matching and mangling in kernel-space
  running in privileged mode. Going further, we can even think about
  running user-space helpers as a non-root process.

* Extensibility: It allows the development of very specific helpers (most
  likely non-standard proprietary protocols) that are very likely not to be
  accepted for mainline inclusion in the form of kernel-space connection
  tracking helpers.

This patch adds the infrastructure to allow the implementation of
user-space conntrack helpers by means of the new nfnetlink subsystem
`nfnetlink_cthelper' and the existing queueing infrastructure
(nfnetlink_queue).

I had to add the new hook NF_IP6_PRI_CONNTRACK_HELPER to register
ipv[4|6]_helper which results from splitting ipv[4|6]_confirm into
two pieces. This change is required not to break NAT sequence
adjustment and conntrack confirmation for traffic that is enqueued
to our user-space conntrack helpers.

Basic operation, in a few steps:

1) Register user-space helper by means of `nfct':

 nfct helper add ftp inet tcp

 [ It must be a valid existing helper supported by conntrack-tools ]

2) Add rules to enable the FTP user-space helper which is
   used to track traffic going to TCP port 21.

For locally generated packets:

 iptables -I OUTPUT -t raw -p tcp --dport 21 -j CT --helper ftp

For non-locally generated packets:

 iptables -I PREROUTING -t raw -p tcp --dport 21 -j CT --helper ftp

3) Run the test conntrackd in helper mode (see example files under
   doc/helper/conntrackd.conf

 conntrackd

4) Generate FTP traffic going, if everything is OK, then conntrackd
   should create expectations (you can check that with `conntrack':

 conntrack -E expect

    [NEW] 301 proto=6 src=192.168.1.136 dst=130.89.148.12 sport=0 dport=54037 mask-src=255.255.255.255 mask-dst=255.255.255.255 sport=0 dport=65535 master-src=192.168.1.136 master-dst=130.89.148.12 sport=57127 dport=21 class=0 helper=ftp
[DESTROY] 301 proto=6 src=192.168.1.136 dst=130.89.148.12 sport=0 dport=54037 mask-src=255.255.255.255 mask-dst=255.255.255.255 sport=0 dport=65535 master-src=192.168.1.136 master-dst=130.89.148.12 sport=57127 dport=21 class=0 helper=ftp

This confirms that our test helper is receiving packets including the
conntrack information, and adding expectations in kernel-space.

The user-space helper can also store its private tracking information
in the conntrack structure in the kernel via the CTA_HELP_INFO. The
kernel will consider this a binary blob whose layout is unknown. This
information will be included in the information that is transfered
to user-space via glue code that integrates nfnetlink_queue and
ctnetlink.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: Ifad12a30d86a6eb0b72d20f079336a24348d711b
2018-12-07 22:02:09 +04:00
Krishna Kumar
65663ad73f netfilter: Add fail-open support
Implement a new "fail-open" mode where packets are not dropped
upon queue-full condition. This mode can be enabled/disabled per
queue using netlink NFQA_CFG_FLAGS & NFQA_CFG_MASK attributes.

Change-Id: Ibe58610239603d0080fabab640b142d6c15f257e
Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Vivek Kashyap <vivk@us.ibm.com>
Signed-off-by: Sridhar Samudrala <samudrala@us.ibm.com>
2018-12-07 22:02:09 +04:00
Cong Wang
cc845a50be netfilter: xt_connlimit: remove revision 0
It was scheduled to be removed.

Change-Id: I8ad3a555a10f2159d8bc7bd658e43aaa5ebfc519
Cc: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-12-07 22:02:09 +04:00
Hans Schillstrom
93f5c0f489 netfilter: xt_HMARK: fix endianness and provide consistent hashing
This patch addresses two issues:

a) Fix usage of u32 and __be32 that causes endianess warnings via sparse.
b) Ensure consistent hashing in a cluster that is composed of big and
   little endian systems. Thus, we obtain the same hash mark in an
   heterogeneous cluster.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: I34dc3aec1d437097ce73fac64e4c4e8e02efd1a8
2018-12-07 22:02:09 +04:00
Eldad Zack
17f8a9e47f netfilter: xt_CT: remove redundant header include
nf_conntrack_l4proto.h is included twice.

Signed-off-by: Eldad Zack <eldad@fogrefinery.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: Ie187dbb40294216188d9b03a78376209b0f15757
2018-12-07 22:02:09 +04:00
Pablo Neira Ayuso
399cf2b57f netfilter: xt_HMARK: modulus is expensive for hash calculation
Use:

((u64)(HASH_VAL * HASH_SIZE)) >> 32

as suggested by David S. Miller.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: Ic24e8221499b767fb7200b7c83017f573fd4bc4d
2018-12-07 22:02:09 +04:00
Dan Carpenter
e562017983 netfilter: xt_HMARK: potential NULL dereference in get_inner_hdr()
There is a typo in the error checking and "&&" was used instead of "||".
If skb_header_pointer() returns NULL then it leads to a NULL
dereference.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: I6663bf06e012a5b338e0bf8faafea445a5247b7f
2018-12-07 22:02:09 +04:00
Hans Schillstrom
837348a44f netfilter: add xt_hmark target for hash-based skb marking
The target allows you to create rules in the "raw" and "mangle" tables
which set the skbuff mark by means of hash calculation within a given
range. The nfmark can influence the routing method (see "Use netfilter
MARK value as routing key") and can also be used by other subsystems to
change their behaviour.

[ Part of this patch has been refactorized and modified by Pablo Neira Ayuso ]

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: I4ab0fcff9b5d4637f480e1146e5bc4100cf2cf7b
2018-12-07 22:02:09 +04:00
Hans Schillstrom
f1d65de19b netfilter: ip6_tables: add flags parameter to ipv6_find_hdr()
This patch adds the flags parameter to ipv6_find_hdr. This flags
allows us to:

* know if this is a fragment.
* stop at the AH header, so the information contained in that header
  can be used for some specific packet handling.

This patch also adds the offset parameter for inspection of one
inner IPv6 header that is contained in error messages.

Change-Id: I8aa13399597dcb72c73084bcd7f8ca4156326357
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-12-07 22:02:06 +04:00
Pablo Neira Ayuso
a2688d5375 netfilter: remove ip_queue support
This patch removes ip_queue support which was marked as obsolete
years ago. The nfnetlink_queue modules provides more advanced
user-space packet queueing mechanism.

This patch also removes capability code included in SELinux that
refers to ip_queue. Otherwise, we break compilation.

Several warning has been sent regarding this to the mailing list
in the past month without anyone rising the hand to stop this
with some strong argument.

Change-Id: I62ab355af31e708b3c1000f2252c8196fb8ba428
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-12-07 22:00:11 +04:00
Pablo Neira Ayuso
cce1d97a6d netfilter: nf_conntrack: fix explicit helper attachment and NAT
Explicit helper attachment via the CT target is broken with NAT
if non-standard ports are used. This problem was hidden behind
the automatic helper assignment routine. Thus, it becomes more
noticeable now that we can disable the automatic helper assignment
with Eric Leblond's:

9e8ac5a netfilter: nf_ct_helper: allow to disable automatic helper assignment

Basically, nf_conntrack_alter_reply asks for looking up the helper
up if NAT is enabled. Unfortunately, we don't have the conntrack
template at that point anymore.

Since we don't want to rely on the automatic helper assignment,
we can skip the second look-up and stick to the helper that was
attached by iptables. With the CT target, the user is in full
control of helper attachment, thus, the policy is to trust what
the user explicitly configures via iptables (no automatic magic
anymore).

Interestingly, this bug was hidden by the automatic helper look-up
code. But it can be easily trigger if you attach the helper in
a non-standard port, eg.

iptables -I PREROUTING -t raw -p tcp --dport 8888 \
	-j CT --helper ftp

And you disabled the automatic helper assignment.

I added the IPS_HELPER_BIT that allows us to differenciate between
a helper that has been explicitly attached and those that have been
automatically assigned. I didn't come up with a better solution
(having backward compatibility in mind).

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: Iff14005c0627ce7e76c9fa3b0b6c95b892901ca3
2018-12-07 21:59:38 +04:00
Kelvie Wong
8a8de2d5b5 netfilter: nf_ct_expect: partially implement ctnetlink_change_expect
This refreshes the "timeout" attribute in existing expectations if one is
given.

The use case for this would be for userspace helpers to extend the lifetime
of the expectation when requested, as this is not possible right now
without deleting/recreating the expectation.

I use this specifically for forwarding DCERPC traffic through:

DCERPC has a port mapper daemon that chooses a (seemingly) random port for
future traffic to go to. We expect this traffic (with a reasonable
timeout), but sometimes the port mapper will tell the client to continue
using the same port. This allows us to extend the expectation accordingly.

Signed-off-by: Kelvie Wong <kelvie@ieee.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: I1b4f4479aaef63103a665ee7cb1ff70001912756
2018-12-07 21:59:38 +04:00
Eric Leblond
01cb0ca0ff netfilter: nf_ct_helper: allow to disable automatic helper assignment
This patch allows you to disable automatic conntrack helper
lookup based on TCP/UDP ports, eg.

echo 0 > /proc/sys/net/netfilter/nf_conntrack_helper

[ Note: flows that already got a helper will keep using it even
  if automatic helper assignment has been disabled ]

Once this behaviour has been disabled, you have to explicitly
use the iptables CT target to attach helper to flows.

There are good reasons to stop supporting automatic helper
assignment, for further information, please read:

http://www.netfilter.org/news.html#2012-04-03

This patch also adds one message to inform that automatic helper
assignment is deprecated and it will be removed soon (this is
spotted only once, with the first flow that gets a helper attached
to make it as less annoying as possible).

Signed-off-by: Eric Leblond <eric@regit.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: Ie1356fa575971063d579cf943ac0354d1513cfc3
2018-12-07 21:59:38 +04:00
Johannes Berg
dca89a7ac0 ipv6: add option to drop unsolicited neighbor advertisements
In certain 802.11 wireless deployments, there will be NA proxies
that use knowledge of the network to correctly answer requests.
To prevent unsolicitd advertisements on the shared medium from
being a problem, on such deployments wireless needs to drop them.

Enable this by providing an option called "drop_unsolicited_na".

Change-Id: I2567a9973e72165a8e546f3638b509fbd1c95298
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-07 21:59:38 +04:00
Johannes Berg
038345d130 ipv6: add option to drop unicast encapsulated in L2 multicast
In order to solve a problem with 802.11, the so-called hole-196 attack,
add an option (sysctl) called "drop_unicast_in_l2_multicast" which, if
enabled, causes the stack to drop IPv6 unicast packets encapsulated in
link-layer multi- or broadcast frames. Such frames can (as an attack)
be created by any member of the same wireless network and transmitted
as valid encrypted frames since the symmetric key for broadcast frames
is shared between all stations.

Change-Id: I8a0b45fbd533236fbd785e6e8aa20fb780aa1397
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-07 21:59:38 +04:00
Johannes Berg
f3fab131f3 ipv4: add option to drop gratuitous ARP packets
In certain 802.11 wireless deployments, there will be ARP proxies
that use knowledge of the network to correctly answer requests.
To prevent gratuitous ARP frames on the shared medium from being
a problem, on such deployments wireless needs to drop them.

Enable this by providing an option called "drop_gratuitous_arp".

Change-Id: Ic0ed4c7e520b1d973eb1ae206af0f882badc21ce
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-07 21:59:38 +04:00
Johannes Berg
e8958c5b2e ipv4: add option to drop unicast encapsulated in L2 multicast
In order to solve a problem with 802.11, the so-called hole-196 attack,
add an option (sysctl) called "drop_unicast_in_l2_multicast" which, if
enabled, causes the stack to drop IPv4 unicast packets encapsulated in
link-layer multi- or broadcast frames. Such frames can (as an attack)
be created by any member of the same wireless network and transmitted
as valid encrypted frames since the symmetric key for broadcast frames
is shared between all stations.

Additionally, enabling this option provides compliance with a SHOULD
clause of RFC 1122.

Change-Id: Ib0c44d9e36d879be4f073db1936a986003390b78
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-07 21:59:38 +04:00
followmsi
2310009d3c Merge branch 'lineage-16.0' into followmsi-pie 2018-09-13 21:49:00 +02:00
Cong Wang
d806df2b3a ipv4, fib: pass LOOPBACK_IFINDEX instead of 0 to flowi4_iif
As suggested by Julian:

	Simply, flowi4_iif must not contain 0, it does not
	look logical to ignore all ip rules with specified iif.

because in fib_rule_match() we do:

        if (rule->iifindex && (rule->iifindex != fl->flowi_iif))
                goto out;

flowi4_iif should be LOOPBACK_IFINDEX by default.

We need to move LOOPBACK_IFINDEX to include/net/flow.h:

1) It is mostly used by flowi_iif

2) Fix the following compile error if we use it in flow.h
by the patches latter:

In file included from include/linux/netfilter.h:277:0,
                 from include/net/netns/netfilter.h:5,
                 from include/net/net_namespace.h:21,
                 from include/linux/netdevice.h:43,
                 from include/linux/icmpv6.h:12,
                 from include/linux/ipv6.h:61,
                 from include/net/ipv6.h:16,
                 from include/linux/sunrpc/clnt.h:27,
                 from include/linux/nfs_fs.h:30,
                 from init/do_mounts.c:32:
include/net/flow.h: In function ‘flowi4_init_output’:
include/net/flow.h:84:32: error: ‘LOOPBACK_IFINDEX’ undeclared (first use in this function)

[Backport of net-next 6a662719c9]

Change-Id: Ib7a0a08d78c03800488afa1b2c170cb70e34cfd9
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
2018-08-27 14:52:49 +00:00
Florian Westphal
e4a17b5b70 netfilter: xt_rpfilter: skip locally generated broadcast/multicast, too
Alex Efros reported rpfilter module doesn't match following packets:
IN=br.qemu SRC=192.168.2.1 DST=192.168.2.255 [ .. ]
(netfilter bugzilla #814).

Problem is that network stack arranges for the locally generated broadcasts
to appear on the interface they were sent out, so the IFF_LOOPBACK check
doesn't trigger.

As -m rpfilter is restricted to PREROUTING, we can check for existing
rtable instead, it catches locally-generated broad/multicast case, too.

Change-Id: I2d921ac4d53e5b1ca9a5249e489c33e4fa4a4b3a
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-27 14:52:48 +00:00
Pavel Emelyanov
36640b6239 net: Loopback ifindex is constant now
As pointed out, there are places, that access net->loopback_dev->ifindex
and after ifindex generation is made per-net this value becomes constant
equals 1. So go ahead and introduce the LOOPBACK_IFINDEX constant and use
it where appropriate.

Change-Id: I29fd08fa01a9522240ab654d436b02a577bb610c
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-27 14:52:48 +00:00
Erik Kline
ec3c4f0012 Revert "netfilter: have ip*t REJECT set the sock err when an icmp is to be sent"
This reverts commit 6f489c42a9.

Bug: 28719525
Change-Id: I77707cc93b3c5f0339e6bce36734027586c639d3
2018-08-27 14:52:47 +00:00
Vladis Dronov
84655b5f70 xfrm: policy: check policy direction value
The 'dir' parameter in xfrm_migrate() is a user-controlled byte which is used
as an array index. This can lead to an out-of-bound access, kernel lockup and
DoS. Add a check for the 'dir' value.

This fixes CVE-2017-11600.

Change-Id: Ic55eec5b4767ad1bd8328b382c35f7b213abc38d
References: https://bugzilla.redhat.com/show_bug.cgi?id=1474928
Fixes: 80c9abaabf ("[XFRM]: Extension for dynamic update of endpoint address(es)")
Cc: <stable@vger.kernel.org> # v2.6.21-rc1
Reported-by: "bo Zhang" <zhangbo5891001@gmail.com>
Signed-off-by: Vladis Dronov <vdronov@redhat.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-02-16 20:15:07 -07:00
Dan Carpenter
6d64f82cb0 ipx: call ipxitf_put() in ioctl error path
We should call ipxitf_put() if the copy_to_user() fails.

Reported-by: 李强 <liqiang6-s@360.cn>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Change-Id: Ib541c679cc5f4242713eb035aed458043b8ce97e
(cherry picked from commit ee0d8d8482345ff97a75a7d747efc309f13b0d80)
2018-02-16 20:15:04 -07:00