Taking socket spinlock in tcp_get_info() can deadlock, as
inet_diag_dump_icsk() holds the &hashinfo->ehash_locks[i],
while packet processing can use the reverse locking order.
We could avoid this locking for TCP_LISTEN states, but lockdep would
certainly get confused as all TCP sockets share same lockdep classes.
[ 523.722504] ======================================================
[ 523.728706] [ INFO: possible circular locking dependency detected ]
[ 523.734990] 4.1.0-dbg-DEV #1676 Not tainted
[ 523.739202] -------------------------------------------------------
[ 523.745474] ss/18032 is trying to acquire lock:
[ 523.750002] (slock-AF_INET){+.-...}, at: [<ffffffff81669d44>] tcp_get_info+0x2c4/0x360
[ 523.758129]
[ 523.758129] but task is already holding lock:
[ 523.763968] (&(&hashinfo->ehash_locks[i])->rlock){+.-...}, at: [<ffffffff816bcb75>] inet_diag_dump_icsk+0x1d5/0x6c0
[ 523.774661]
[ 523.774661] which lock already depends on the new lock.
[ 523.774661]
[ 523.782850]
[ 523.782850] the existing dependency chain (in reverse order) is:
[ 523.790326]
-> #1 (&(&hashinfo->ehash_locks[i])->rlock){+.-...}:
[ 523.796599] [<ffffffff811126bb>] lock_acquire+0xbb/0x270
[ 523.802565] [<ffffffff816f5868>] _raw_spin_lock+0x38/0x50
[ 523.808628] [<ffffffff81665af8>] __inet_hash_nolisten+0x78/0x110
[ 523.815273] [<ffffffff816819db>] tcp_v4_syn_recv_sock+0x24b/0x350
[ 523.822067] [<ffffffff81684d41>] tcp_check_req+0x3c1/0x500
[ 523.828199] [<ffffffff81682d09>] tcp_v4_do_rcv+0x239/0x3d0
[ 523.834331] [<ffffffff816842fe>] tcp_v4_rcv+0xa8e/0xc10
[ 523.840202] [<ffffffff81658fa3>] ip_local_deliver_finish+0x133/0x3e0
[ 523.847214] [<ffffffff81659a9a>] ip_local_deliver+0xaa/0xc0
[ 523.853440] [<ffffffff816593b8>] ip_rcv_finish+0x168/0x5c0
[ 523.859624] [<ffffffff81659db7>] ip_rcv+0x307/0x420
Lets use u64_sync infrastructure instead. As a bonus, 64bit
arches get optimized, as these are nop for them.
Fixes: 0df48c26d841 ("tcp: add tcpi_bytes_acked to tcp_info")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: I50c833b77f41111a6fa3a57834c6311f95573754
This patch tracks the total number of inbound and outbound segments on a
TCP socket. One may use this number to have an idea on connection
quality when compared against the retransmissions.
RFC4898 named these : tcpEStatsPerfSegsIn and tcpEStatsPerfSegsOut
These are a 32bit field each and can be fetched both from TCP_INFO
getsockopt() if one has a handle on a TCP socket, or from inet_diag
netlink facility (iproute2/ss patch will follow)
Note that tp->segs_out was placed near tp->snd_nxt for good data
locality and minimal performance impact, while tp->segs_in was placed
near tp->bytes_received for the same reason.
Join work with Eric Dumazet.
Note that received SYN are accounted on the listener, but sent SYNACK
are not accounted.
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: Ic3e5daea102e13fb24c5fb1ce6913d5ece13c521
This patch tracks total number of payload bytes received on a TCP socket.
This is the sum of all changes done to tp->rcv_nxt
RFC4898 named this : tcpEStatsAppHCThruOctetsReceived
This is a 64bit field, and can be fetched both from TCP_INFO
getsockopt() if one has a handle on a TCP socket, or from inet_diag
netlink facility (iproute2/ss patch will follow)
Note that tp->bytes_received was placed near tp->rcv_nxt for
best data locality and minimal performance impact.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Matt Mathis <mattmathis@google.com>
Cc: Eric Salo <salo@google.com>
Cc: Martin Lau <kafai@fb.com>
Cc: Chris Rapier <rapier@psc.edu>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: Ie1fec6455c8c6cf9ae3e2df09bf55abb56b57286
This patch tracks total number of bytes acked for a TCP socket.
This is the sum of all changes done to tp->snd_una, and allows
for precise tracking of delivered data.
RFC4898 named this : tcpEStatsAppHCThruOctetsAcked
This is a 64bit field, and can be fetched both from TCP_INFO
getsockopt() if one has a handle on a TCP socket, or from inet_diag
netlink facility (iproute2/ss patch will follow)
Note that tp->bytes_acked was placed near tp->snd_una for
best data locality and minimal performance impact.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Cc: Matt Mathis <mattmathis@google.com>
Cc: Eric Salo <salo@google.com>
Cc: Martin Lau <kafai@fb.com>
Cc: Chris Rapier <rapier@psc.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: I975b11b049d6bde1b6802dc4331a9da93c371c9b
Add two new fields to struct tcp_info, to report sk_pacing_rate
and sk_max_pacing_rate to monitoring applications, as ss from iproute2.
User exported fields are 64bit, even if kernel is currently using 32bit
fields.
lpaa5:~# ss -i
..
skmem:(r0,rb357120,t0,tb2097152,f1584,w1980880,o0,bl0) ts sack cubic
wscale:6,6 rto:400 rtt:0.875/0.75 mss:1448 cwnd:1 ssthresh:12 send
13.2Mbps pacing_rate 3336.2Mbps unacked:15 retrans:1/5448 lost:15
rcv_space:29200
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: I31149707ef565448d5f2f1b43f5759f308f824f5
As mentioned in commit afe4fd0624 ("pkt_sched: fq: Fair Queue packet
scheduler"), this patch adds a new socket option.
SO_MAX_PACING_RATE offers the application the ability to cap the
rate computed by transport layer. Value is in bytes per second.
u32 val = 1000000;
setsockopt(sockfd, SOL_SOCKET, SO_MAX_PACING_RATE, &val, sizeof(val));
To be effectively paced, a flow must use FQ packet scheduler.
Note that a packet scheduler takes into account the headers for its
computations. The effective payload rate depends on MSS and retransmits
if any.
I chose to make this pacing rate a SOL_SOCKET option instead of a
TCP one because this can be used by other protocols.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: Iea51b6104a5420d5c9ed1ea7382cbd53bdb8f4be
After hearing many people over past years complaining against TSO being
bursty or even buggy, we are proud to present automatic sizing of TSO
packets.
One part of the problem is that tcp_tso_should_defer() uses an heuristic
relying on upcoming ACKS instead of a timer, but more generally, having
big TSO packets makes little sense for low rates, as it tends to create
micro bursts on the network, and general consensus is to reduce the
buffering amount.
This patch introduces a per socket sk_pacing_rate, that approximates
the current sending rate, and allows us to size the TSO packets so
that we try to send one packet every ms.
This field could be set by other transports.
Patch has no impact for high speed flows, where having large TSO packets
makes sense to reach line rate.
For other flows, this helps better packet scheduling and ACK clocking.
This patch increases performance of TCP flows in lossy environments.
A new sysctl (tcp_min_tso_segs) is added, to specify the
minimal size of a TSO packet (default being 2).
A follow-up patch will provide a new packet scheduler (FQ), using
sk_pacing_rate as an input to perform optional per flow pacing.
This explains why we chose to set sk_pacing_rate to twice the current
rate, allowing 'slow start' ramp up.
sk_pacing_rate = 2 * cwnd * mss / srtt
v2: Neal Cardwell reported a suspect deferring of last two segments on
initial write of 10 MSS, I had to change tcp_tso_should_defer() to take
into account tp->xmit_size_goal_segs
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Van Jacobson <vanj@google.com>
Cc: Tom Herbert <therbert@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: If293e2c6a0f52fe0e2b4ee149c9242f8c5e46ef5
Replace the bh safe variant with the hard irq safe variant.
We need a hard irq safe variant to deal with netpoll transmitting
packets from hard irq context, and we need it in most if not all of
the places using the bh safe variant.
Except on 32bit uni-processor the code is exactly the same so don't
bother with a bh variant, just have a hard irq safe variant that
everyone can use.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[javelinanddart]: Merge conflicts were resolved for drivers that exist in 3.4,
and additionally a treewide find and replace was run for these functions.
Signed-off-by: Paul Keith <javelinanddart@gmail.com>
Change-Id: Ib74db793de5e546414a0599f23095f82f0e20c86
when read/write the 64bit data, the correct lock should be hold.
Fixes: 87b6d218f3 ("tunnel: implement 64 bits statistics")
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Paul Keith <javelinanddart@gmail.com>
Change-Id: Ib90ecb495739eafb1563d5cfd3d2dc275e4944ea
when read/write the 64bit data, the correct lock should be hold.
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Paul Keith <javelinanddart@gmail.com>
Change-Id: Ie9c73149b86eb028e1646d1001bdb4310a72fe14
In order to enable lockdep on seqcount/seqlock structures, we
must explicitly initialize any locks.
The u64_stats_sync structure, uses a seqcount, and thus we need
to introduce a u64_stats_init() function and use it to initialize
the structure.
This unfortunately adds a lot of fairly trivial initialization code
to a number of drivers. But the benefit of ensuring correctness makes
this worth while.
Because these changes are required for lockdep to be enabled, and the
changes are quite trivial, I've not yet split this patch out into 30-some
separate patches, as I figured it would be better to get the various
maintainers thoughts on how to best merge this change along with
the seqcount lockdep enablement.
Feedback would be appreciated!
Signed-off-by: John Stultz <john.stultz@linaro.org>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: James Morris <jmorris@namei.org>
Cc: Jesse Gross <jesse@nicira.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Mirko Lindner <mlindner@marvell.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Roger Luethi <rl@hellgate.ch>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Simon Horman <horms@verge.net.au>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: Wensong Zhang <wensong@linux-vs.org>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/1381186321-4906-2-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Change-Id: Ieda06b95f0ec302dbef8576ef4c8fb4bd28ffb2f
TCP Appropriate Byte Count was added by me, but later disabled.
There is no point in maintaining it since it is a potential source
of bugs and Linux already implements other better window protection
heuristics.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: Ifc93fa043256c275580a9f93611ad1d013889bc4
Convert the per-cpu statistics kept for GRE, IPIP, and SIT tunnels
to use 64 bit statistics.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Paul Keith <javelinanddart@gmail.com>
Change-Id: I317ad78a7f3a31326a1bc6f3069712f348b38a6c
"bssid" is only initialized out of the while loop, in case of two
events with same type: EVENT_CONNECT_RESULT, but one has zero
ether addr, the other is non-zero, the bssid pointer will be
referenced twice, which lead to use-after-free issue
Change-Id: Ie8a24275f7ec5c2f936ef0a802a42e5f63be9c71
CRs-Fixed: 2254305
Signed-off-by: Zhu Jianmin <jianminz@codeaurora.org>
This is partly a backport of d6c0a4f609
(ipv4: Kill 'rt_src' from 'struct rtable').
skb->sk can be null, and in fact it is when creating the buffer
in inet_rtm_getroute. There is no other way of accessing the flow,
so pass it directly.
Fixes invalid memory address when running 'ip route get $IPADDR'
Change-Id: I7b9e5499614b96360c9c8420907e82e145bb97f3
This is partly a backport of d6c0a4f609
(ipv4: Kill 'rt_src' from 'struct rtable').
skb->sk can be null, and in fact it is when creating the buffer
in inet_rtm_getroute. There is no other way of accessing the flow,
so pass it directly.
Fixes invalid memory address when running 'ip route get $IPADDR'
Change-Id: I7b9e5499614b96360c9c8420907e82e145bb97f3
"bssid" is only initialized out of the while loop, in case of two
events with same type: EVENT_CONNECT_RESULT, but one has zero
ether addr, the other is non-zero, the bssid pointer will be
referenced twice, which lead to use-after-free issue
Change-Id: Ie8a24275f7ec5c2f936ef0a802a42e5f63be9c71
CRs-Fixed: 2254305
Signed-off-by: Zhu Jianmin <jianminz@codeaurora.org>
One side effect - attempt to create a cross-device link on a read-only fs fails
with EROFS instead of EXDEV now. Makes more sense, POSIX allows, etc.
Change-Id: I264b03a230dbd310f3b3671d2da06ceb2930179b
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
releases what needs to be released after {kern,user}_path_create()
Change-Id: If7fa7455e2ba8a6f4f4c4d2db502a38b4a60d7c7
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This allows comparing hash and len in one operation on 64-bit
architectures. Right now only __d_lookup_rcu() takes advantage of this,
since that is the case we care most about.
The use of anonymous struct/unions hides the alternate 64-bit approach
from most users, the exception being a few cases where we initialize a
'struct qstr' with a static initializer. This makes the problematic
cases use a new QSTR_INIT() helper function for that (but initializing
just the name pointer with a "{ .name = xyzzy }" initializer remains
valid, as does just copying another qstr structure).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: I710810eb8717563264428e39c99e62599d31907e
Include header file to pickup prototype of nf_nat_seq_adjust_hook
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: I0cdf5dd8b9c5ad765920f8666d836f8c9fd835cd
rpfilter is only valid in raw/mangle PREROUTING, i.e.
RPFILTER=y|m is useless without raw or mangle table support.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: I26118146ace6e802b5700472222198ef31f9d562
It was possible to set NF_CONNTRACK=n and NF_CONNTRACK_LABELS=y via
NETFILTER_XT_MATCH_CONNLABEL=y.
warning: (NETFILTER_XT_MATCH_CONNLABEL) selects NF_CONNTRACK_LABELS which has
unmet direct dependencies (NET && INET && NETFILTER && NF_CONNTRACK)
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: Ice48a81a97a7106a8f67b8b9e0ffb46400982005
similar to connmarks, except labels are bit-based; i.e.
all labels may be attached to a flow at the same time.
Up to 128 labels are supported. Supporting more labels
is possible, but requires increasing the ct offset delta
from u8 to u16 type due to increased extension sizes.
Mapping of bit-identifier to label name is done in userspace.
The extension is enabled at run-time once "-m connlabel" netfilter
rules are added.
Change-Id: I98cfb16533e7e4bde1d73f2aa30e1be425f0b67e
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
arptables 0.0.4 (released on 10th Jan 2013) supports calling the
CLASSIFY target, but on adding a rule to the wrong chain, the
diagnostic is as follows:
# arptables -A INPUT -j CLASSIFY --set-class 0:0
arptables: Invalid argument
# dmesg | tail -n1
x_tables: arp_tables: CLASSIFY target: used from hooks
PREROUTING, but only usable from INPUT/FORWARD
This is incorrect, since xt_CLASSIFY.c does specify
(1 << NF_ARP_OUT) | (1 << NF_ARP_FORWARD).
This patch corrects the x_tables diagnostic message to print the
proper hook names for the NFPROTO_ARP case.
Affects all kernels down to and including v2.6.31.
Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: Ied71aaf676bf71199699299f1933524bb20fc486
This patch adds the NFQA_CAP_LEN attribute that allows us to know
what is the real packet size from user-space (even if we decided
to retrieve just a few bytes from the packet instead of all of it).
Security software that inspects packets should always check for
this new attribute to make sure that it is inspecting the entire
packet.
This also helps to provide a workaround for the problem described
in: http://marc.info/?l=netfilter-devel&m=134519473212536&w=2
Original idea from Florian Westphal.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: I74a268651298a8284ea1c8c2dcd9b2d706cd6a29
The packets that we send via NFQUEUE are encapsulated in the NFQA_PAYLOAD
attribute. The length of the packet in userspace is obtained via
attr->nla_len field. This field contains the size of the Netlink
attribute header plus the packet length.
If the maximum packet length is specified, ie. 65535 bytes, and
packets in the range of (65531,65535] are sent to userspace, the
attr->nla_len overflows and it reports bogus lengths to the
application.
To fix this, this patch limits the maximum packet length to 65531
bytes. If larger packet length is specified, the packet that we
send to user-space is truncated to 65531 bytes.
To support 65535 bytes packets, we have to revisit the idea of
the 32-bits Netlink attribute length.
Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: Id4d70b0b3e4d7fc3422130cc2989473d545530e1
Allow an unpriviled user who has created a user namespace, and then
created a network namespace to effectively use the new network
namespace, by reducing capable(CAP_NET_ADMIN) and
capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns,
CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls.
Settings that merely control a single network device are allowed.
Either the network device is a logical network device where
restrictions make no difference or the network device is hardware NIC
that has been explicity moved from the initial network namespace.
In general policy and network stack state changes are allowed
while resource control is left unchanged.
Allow creating raw sockets.
Allow the SIOCSARP ioctl to control the arp cache.
Allow the SIOCSIFFLAG ioctl to allow setting network device flags.
Allow the SIOCSIFADDR ioctl to allow setting a netdevice ipv4 address.
Allow the SIOCSIFBRDADDR ioctl to allow setting a netdevice ipv4 broadcast address.
Allow the SIOCSIFDSTADDR ioctl to allow setting a netdevice ipv4 destination address.
Allow the SIOCSIFNETMASK ioctl to allow setting a netdevice ipv4 netmask.
Allow the SIOCADDRT and SIOCDELRT ioctls to allow adding and deleting ipv4 routes.
Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for
adding, changing and deleting gre tunnels.
Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for
adding, changing and deleting ipip tunnels.
Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for
adding, changing and deleting ipsec virtual tunnel interfaces.
Allow setting the MRT_INIT, MRT_DONE, MRT_ADD_VIF, MRT_DEL_VIF, MRT_ADD_MFC,
MRT_DEL_MFC, MRT_ASSERT, MRT_PIM, MRT_TABLE socket options on multicast routing
sockets.
Allow setting and receiving IPOPT_CIPSO, IP_OPT_SEC, IP_OPT_SID and
arbitrary ip options.
Allow setting IP_SEC_POLICY/IP_XFRM_POLICY ipv4 socket option.
Allow setting the IP_TRANSPARENT ipv4 socket option.
Allow setting the TCP_REPAIR socket option.
Allow setting the TCP_CONGESTION socket option.
Change-Id: I3b6ce2465e354cd2865e1a7fe67d6e812f88b16a
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal reported that the removal of the NOTRACK target
(9655050 netfilter: remove xt_NOTRACK) is breaking some existing
setups.
That removal was scheduled for removal since long time ago as
described in Documentation/feature-removal-schedule.txt
What: xt_NOTRACK
Files: net/netfilter/xt_NOTRACK.c
When: April 2011
Why: Superseded by xt_CT
Still, people may have not notice / may have decided to stick to an
old iptables version. I agree with him in that some more conservative
approach by spotting some printk to warn users for some time is less
agressive.
Current iptables 1.4.16.3 already contains the aliasing support
that makes it point to the CT target, so upgrading would fix it.
Still, the policy so far has been to avoid pushing our users to
upgrade.
As a solution, this patch recovers the NOTRACK target inside the CT
target and it now spots a warning.
Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: I93292ad4f6e377df739e0366e36ab72786c3750d
It was scheduled to be removed for a long time.
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netfilter@vger.kernel.org
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: Iae2ab6490a36ecde8fb9ee5b65e4448969f74b07
Remove ambiguity of double negation.
Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Acked-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: Ife5632986d43a9e6e719415c8a234f100e8de958
Since (a0ecb85 netfilter: nf_nat: Handle routing changes in MASQUERADE
target), the MASQUERADE target handles routing changes which affect
the output interface of a connection, but only for ESTABLISHED
connections. It is also possible for NEW connections which
already have a conntrack entry to be affected by routing changes.
This adds a check to drop entries in the NEW+conntrack state
when the oif has changed.
Signed-off-by: Andrew Collins <bsderandrew@gmail.com>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: I6ae5c391d502f44cebc0563f51f4a65667efd0ae
When the route changes (backup default route, VPNs) which affect a
masqueraded target, the packets were sent out with the outdated source
address. The patch addresses the issue by comparing the outgoing interface
directly with the masqueraded interface in the nat table.
Events are inefficient in this case, because it'd require adding route
events to the network core and then scanning the whole conntrack table
and re-checking the route for all entry.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: I148103e8a4b257d004b8f0f03455243a28ff0eec
Add inet_proto_csum_replace16 for incrementally updating IPv6 pseudo header
checksums for IPv6 NAT.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: David S. Miller <davem@davemloft.net>
Change-Id: I858e211dfe3e1f4dc9daa42db3bd6fc535887324
ICMPv6 error messages are tracked by extracting the conntrack tuple of
the inner packet and looking up the corresponding conntrack entry. Tuple
extraction uses the ->get_l4proto() callback, which in case of fragments
returns NEXTHDR_FRAGMENT instead of the upper protocol, even for the
first fragment when the entire next header is present, resulting in a
failure to find the correct connection tracking entry.
This patch changes ipv6_get_l4proto() to use ipv6_skip_exthdr() instead
of nf_ct_ipv6_skip_exthdr() in order to skip fragment headers when the
fragment offset is zero.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Change-Id: Ic8a86d7038676b044c89700226225ffa20e01079
IPv4 conntrack defragments incoming packet at the PRE_ROUTING hook and
(in case of forwarded packets) refragments them at POST_ROUTING
independent of the IP_DF flag. Refragmentation uses the dst_mtu() of
the local route without caring about the original fragment sizes,
thereby breaking PMTUD.
This patch fixes this by keeping track of the largest received fragment
with IP_DF set and generates an ICMP fragmentation required error during
refragmentation if that size exceeds the MTU.
Change-Id: Ibac77b728baba05841286ea5a8a2089d56e6ad65
Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: David S. Miller <davem@davemloft.net>
The IPv6 conntrack fragmentation currently has a couple of shortcomings.
Fragmentes are collected in PREROUTING/OUTPUT, are defragmented, the
defragmented packet is then passed to conntrack, the resulting conntrack
information is attached to each original fragment and the fragments then
continue their way through the stack.
Helper invocation occurs in the POSTROUTING hook, at which point only
the original fragments are available. The result of this is that
fragmented packets are never passed to helpers.
This patch improves the situation in the following way:
- If a reassembled packet belongs to a connection that has a helper
assigned, the reassembled packet is passed through the stack instead
of the original fragments.
- During defragmentation, the largest received fragment size is stored.
On output, the packet is refragmented if required. If the largest
received fragment size exceeds the outgoing MTU, a "packet too big"
message is generated, thus behaving as if the original fragments
were passed through the stack from an outside point of view.
- The ipv6_helper() hook function can't receive fragments anymore for
connections using a helper, so it is switched to use ipv6_skip_exthdr()
instead of the netfilter specific nf_ct_ipv6_skip_exthdr() and the
reassembled packets are passed to connection tracking helpers.
The result of this is that we can properly track fragmented packets, but
still generate ICMPv6 Packet too big messages if we would have before.
This patch is also required as a precondition for IPv6 NAT, where NAT
helpers might enlarge packets up to a point that they require
fragmentation. In that case we can't generate Packet too big messages
since the proper MTU can't be calculated in all cases (f.i. when
changing textual representation of a variable amount of addresses),
so the packet is transparently fragmented iff the original packet or
fragments would have fit the outgoing MTU.
IPVS parts by Jesper Dangaard Brouer <brouer@redhat.com>.
Change-Id: I75d83668e7de723fb271232f475f46f4037a4a4f
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch allows the FTP helper to pickup the sequence tracking from
the first packet seen. This is useful to fix the breakage of the first
FTP command after the failover while using conntrackd to synchronize
states.
The seq_aft_nl_num field in struct nf_ct_ftp_info has been shrinked to
16-bits (enough for what it does), so we can use the remaining 16-bits
to store the flags while using the same size for the private FTP helper
data.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: I4463de3e064fb13f7b1dc58ef1dd0f91d40f1bbc
Add IPv6 support to the SIP NAT helper. There are no functional differences
to IPv4 NAT, just different formats for addresses.
Change-Id: I151f527731d4724606203ca82244b5aad4b9e026
Signed-off-by: Patrick McHardy <kaber@trash.net>
In (c7232c9 netfilter: add protocol independent NAT core), the
hooks were accidentally modified:
SNAT hooks are POST_ROUTING and LOCAL_IN (before it was LOCAL_OUT).
DNAT hooks are PRE_ROUTING and LOCAL_OUT (before it was LOCAL_IN).
Signed-off-by: Elison Niven <elison.niven@cyberoam.com>
Signed-off-by: Sanket Shah <sanket.shah@cyberoam.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: Idf352cdc22185f53702efdef755412137e9f95f1
hlist walk in find_appropriate_src() is not protected anymore by rcu_read_lock(),
so rcu_read_unlock() is unnecessary if in_range() matches.
This bug was added in (c7232c9 netfilter: add protocol independent NAT core).
Signed-off-by: Ulrich Weber <ulrich.weber@sophos.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: I28d41758f0654cae9851d78a88f51c9f2f62234d
(c7232c9 netfilter: add protocol independent NAT core) added
incorrect locking for the module auto-load case in ctnetlink_parse_nat.
That function is always called from ctnetlink_create_conntrack which
requires no locking.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: I035dcfb9aa109541dc69cf53a145322af912b137
(c7232c9 netfilter: add protocol independent NAT core) introduced a
problem that leads to crashing during boot due to NULL pointer
dereference. It seems that xt_nat calls xt_register_target() before
xt_init():
net/netfilter/x_tables.c:static struct xt_af *xt; is NULL and we crash on
xt_register_target(struct xt_target *target)
{
u_int8_t af = target->family;
int ret;
ret = mutex_lock_interruptible(&xt[af].mutex);
...
Fix this by changing the linking order, to make sure that x_tables
comes before xt_nat.
Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: I10eff95a9afc013663404d8e7392221630d8e141
Convert the IPv4 NAT implementation to a protocol independent core and
address family specific modules.
Change-Id: I926b42af53b37c96fb654021e7f568450e8c63c0
Signed-off-by: Patrick McHardy <kaber@trash.net>
For mangling IPv6 packets the protocol header offset needs to be known
by the NAT packet mangling functions. Add a so far unused protoff argument
and convert the conntrack and NAT helpers to use it in preparation of
IPv6 NAT.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Change-Id: I34f8597c072b0e9d99d19ae958c1788e3d8d51ee
The NAT helpers currently only handle IPv4 packets correctly. Restrict
invocation of the helpers to IPv4 in preparation of IPv6 NAT.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Change-Id: Ided2edb286d4098d4835bf739aafce71cecb3876
Within SIP messages IPv6 addresses are enclosed in square brackets in most
cases, with the exception of the "received=" header parameter. Currently
the helper fails to parse enclosed addresses.
This patch:
- changes the SIP address parsing function to enforce square brackets
when required, and accept them when not required but present, as
recommended by RFC 5118.
- adds a new SDP address parsing function that never accepts square
brackets since SDP doesn't use them.
With these changes, the SIP helper correctly parses all test messages
from RFC 5118 (Session Initiation Protocol (SIP) Torture Test Messages
for Internet Protocol Version 6 (IPv6)).
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: I269237f68085de081ad6b3779f9806cd8380bbc9
Commit 3a8fc53a (netfilter: nf_ct_helper: allocate 16 bytes for the helper
and policy names) introduced a bug in the SIP helper, the helper name is
sprinted to the sip_names array instead of instead of into the helper
structure. This breaks the helper match and the /proc/net/nf_conntrack_expect
output.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: I3ba18ddaef444f0a50f49738f2f4718e338ec9b3
This patch fixes compilation with NF_CONNTRACK_EVENTS=n and
NETFILTER_NETLINK_QUEUE_CT=y.
I'm leaving all those static inline functions that calculate the size
of the event message out of the ifdef area of NF_CONNTRACK_EVENTS since
they will not be included by gcc in case they are unused.
Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: Ied1198a98d2fbbe81e836b317cc3a835644f6bef
This patch fixes a sparse warning due to missing include header file.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: Ieeeae7f9a48a69fa45e824d8d03feb3bd5faba3e
LD init/built-in.o
net/built-in.o:(.data+0x4408): undefined reference to `nf_nat_tcp_seq_adjust'
make: *** [vmlinux] Error 1
This patch adds a new pointer hook (nfq_ct_nat_hook) similar to other existing
in Netfilter to solve our complicated configuration dependencies.
Reported-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: I332521b61f008793381ba6eeb3d3a34369adaea7
In "9cb0176 netfilter: add glue code to integrate nfnetlink_queue and ctnetlink"
the compilation with NF_CONNTRACK disabled is broken. This patch fixes this
issue.
I have moved the conntrack part into nfnetlink_queue_ct.c to avoid
peppering the entire nfnetlink_queue.c code with ifdefs.
I also needed to rename nfnetlink_queue.c to nfnetlink_queue_pkt.c
to update the net/netfilter/Makefile to support conditional compilation
of the conntrack integration.
This patch also adds CONFIG_NETFILTER_QUEUE_CT in case you want to explicitly
disable the integration between nf_conntrack and nfnetlink_queue.
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: I211402286b8e154f13af067cfbf470ab6a635282
This patch fixes the compilation of net/netfilter/nfnetlink_cthelper.c
if CONFIG_NF_CONNTRACK is not set.
This patch also moves the definition of the cthelper infrastructure to
the scope of NF_CONNTRACK things.
I have also renamed NETFILTER_NETLINK_CTHELPER by NF_CT_NETLINK_HELPER,
to use similar names to other nf_conntrack_netlink extensions. Better now
that this has been only for two days in David's tree.
Two new dependencies have been added:
* NF_CT_NETLINK
* NETFILTER_NETLINK_QUEUE
Since these infrastructure requires both ctnetlink and nfqueue.
Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: I7e5a5893dcc11f4da6765d4716ba1399356e246b
This patch modifies __nf_ct_try_assign_helper in a way that invalidates support
for the following scenario:
1) attach the helper A for first time when the conntrack is created
2) attach new (different) helper B due to changes the reply tuple caused by NAT
eg. port redirection from TCP/21 to TCP/5060 with both FTP and SIP helpers
loaded, which seems to be a quite unorthodox scenario.
I can provide a more elaborated patch to support this scenario but explicit
helper attachment provides a better solution for this since now the use can
attach the helpers consistently, without relying on the automatic helper
lookup magic.
This patch fixes a possible out of bound zeroing of the conntrack helper
extension if the helper B uses more memory for its private data than
helper A.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: If4b9e3226358dc3e77e1b1cf1a47204ab6643e88
The patch 1afc56794e: "netfilter: nf_ct_helper: implement variable
length helper private data" from Jun 7, 2012, leads to the following
Smatch complaint:
net/netfilter/nf_conntrack_netlink.c:1231 ctnetlink_change_helper()
error: we previously assumed 'help->helper' could be null (see line 1228)
This NULL dereference can be triggered with the following sequence:
1) attach the helper for first time when the conntrack is created.
2) remove the helper module or detach the helper from the conntrack
via ctnetlink.
3) attach helper again (the same or different one, no matter) to the
that existing conntrack again via ctnetlink.
This patch fixes the problem by removing the use case that allows you
to re-assign again a helper for one conntrack entry via ctnetlink since
I cannot find any practical use for it.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: Id7c741812457c98032b8d52c7fbf0ea9e059c388
We can now define conntrack extensions of variable size. This
patch is useful to get rid of these unions:
union nf_conntrack_help
union nf_conntrack_proto
union nf_conntrack_nat_help
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: Icc8a18ec32f495a1e18afebc748bfdcff6b9c658
This attribute can be used to modify and to dump the internal
protocol information.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: Ifa704afd118b8f9b860ebf8ffba843ea804f815d
This patch uses the new variable length conntrack extensions.
Instead of using union nf_conntrack_help that contain all the
helper private data information, we allocate variable length
area to store the private helper data.
This patch includes the modification of all existing helpers.
It also includes a couple of include header to avoid compilation
warnings.
Change-Id: I2b855f3687c16ac0996053006d0543ad05411acd
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch modifies the struct nf_conntrack_helper to allocate
the room for the helper name. The maximum length is 16 bytes
(this was already introduced in 2.6.24).
For the maximum length for expectation policy names, I have
also selected 16 bytes.
This patch is required by the follow-up patch to support
user-space connection tracking helpers.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: I9ed449622e5c101c87519604b1c6ea37c0bd91cd
User-space programs that receive traffic via NFQUEUE may mangle packets.
If NAT is enabled, this usually puzzles sequence tracking, leading to
traffic disruptions.
With this patch, nfnl_queue will make the corresponding NAT TCP sequence
adjustment if:
1) The packet has been mangled,
2) the NFQA_CFG_F_CONNTRACK flag has been set, and
3) NAT is detected.
There are some records on the Internet complaning about this issue:
http://stackoverflow.com/questions/260757/packet-mangling-utilities-besides-iptables
By now, we only support TCP since we have no helpers for DCCP or SCTP.
Better to add this if we ever have some helper over those layer 4 protocols.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: Id74b2db6fa3264a322d682b64a3104ce6ff42eaf
This patch allows you to include the conntrack information together
with the packet that is sent to user-space via NFQUEUE.
Previously, there was no integration between ctnetlink and
nfnetlink_queue. If you wanted to access conntrack information
from your libnetfilter_queue program, you required to query
ctnetlink from user-space to obtain it. Thus, delaying the packet
processing even more.
Including the conntrack information is optional, you can set it
via NFQA_CFG_F_CONNTRACK flag with the new NFQA_CFG_FLAGS attribute.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: Ibbc9436d0dada42bc780314dd762e635c4a15b71
There are good reasons to supports helpers in user-space instead:
* Rapid connection tracking helper development, as developing code
in user-space is usually faster.
* Reliability: A buggy helper does not crash the kernel. Moreover,
we can monitor the helper process and restart it in case of problems.
* Security: Avoid complex string matching and mangling in kernel-space
running in privileged mode. Going further, we can even think about
running user-space helpers as a non-root process.
* Extensibility: It allows the development of very specific helpers (most
likely non-standard proprietary protocols) that are very likely not to be
accepted for mainline inclusion in the form of kernel-space connection
tracking helpers.
This patch adds the infrastructure to allow the implementation of
user-space conntrack helpers by means of the new nfnetlink subsystem
`nfnetlink_cthelper' and the existing queueing infrastructure
(nfnetlink_queue).
I had to add the new hook NF_IP6_PRI_CONNTRACK_HELPER to register
ipv[4|6]_helper which results from splitting ipv[4|6]_confirm into
two pieces. This change is required not to break NAT sequence
adjustment and conntrack confirmation for traffic that is enqueued
to our user-space conntrack helpers.
Basic operation, in a few steps:
1) Register user-space helper by means of `nfct':
nfct helper add ftp inet tcp
[ It must be a valid existing helper supported by conntrack-tools ]
2) Add rules to enable the FTP user-space helper which is
used to track traffic going to TCP port 21.
For locally generated packets:
iptables -I OUTPUT -t raw -p tcp --dport 21 -j CT --helper ftp
For non-locally generated packets:
iptables -I PREROUTING -t raw -p tcp --dport 21 -j CT --helper ftp
3) Run the test conntrackd in helper mode (see example files under
doc/helper/conntrackd.conf
conntrackd
4) Generate FTP traffic going, if everything is OK, then conntrackd
should create expectations (you can check that with `conntrack':
conntrack -E expect
[NEW] 301 proto=6 src=192.168.1.136 dst=130.89.148.12 sport=0 dport=54037 mask-src=255.255.255.255 mask-dst=255.255.255.255 sport=0 dport=65535 master-src=192.168.1.136 master-dst=130.89.148.12 sport=57127 dport=21 class=0 helper=ftp
[DESTROY] 301 proto=6 src=192.168.1.136 dst=130.89.148.12 sport=0 dport=54037 mask-src=255.255.255.255 mask-dst=255.255.255.255 sport=0 dport=65535 master-src=192.168.1.136 master-dst=130.89.148.12 sport=57127 dport=21 class=0 helper=ftp
This confirms that our test helper is receiving packets including the
conntrack information, and adding expectations in kernel-space.
The user-space helper can also store its private tracking information
in the conntrack structure in the kernel via the CTA_HELP_INFO. The
kernel will consider this a binary blob whose layout is unknown. This
information will be included in the information that is transfered
to user-space via glue code that integrates nfnetlink_queue and
ctnetlink.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: Ifad12a30d86a6eb0b72d20f079336a24348d711b
Implement a new "fail-open" mode where packets are not dropped
upon queue-full condition. This mode can be enabled/disabled per
queue using netlink NFQA_CFG_FLAGS & NFQA_CFG_MASK attributes.
Change-Id: Ibe58610239603d0080fabab640b142d6c15f257e
Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Vivek Kashyap <vivk@us.ibm.com>
Signed-off-by: Sridhar Samudrala <samudrala@us.ibm.com>
It was scheduled to be removed.
Change-Id: I8ad3a555a10f2159d8bc7bd658e43aaa5ebfc519
Cc: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch addresses two issues:
a) Fix usage of u32 and __be32 that causes endianess warnings via sparse.
b) Ensure consistent hashing in a cluster that is composed of big and
little endian systems. Thus, we obtain the same hash mark in an
heterogeneous cluster.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: I34dc3aec1d437097ce73fac64e4c4e8e02efd1a8
nf_conntrack_l4proto.h is included twice.
Signed-off-by: Eldad Zack <eldad@fogrefinery.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: Ie187dbb40294216188d9b03a78376209b0f15757
Use:
((u64)(HASH_VAL * HASH_SIZE)) >> 32
as suggested by David S. Miller.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: Ic24e8221499b767fb7200b7c83017f573fd4bc4d
There is a typo in the error checking and "&&" was used instead of "||".
If skb_header_pointer() returns NULL then it leads to a NULL
dereference.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: I6663bf06e012a5b338e0bf8faafea445a5247b7f
The target allows you to create rules in the "raw" and "mangle" tables
which set the skbuff mark by means of hash calculation within a given
range. The nfmark can influence the routing method (see "Use netfilter
MARK value as routing key") and can also be used by other subsystems to
change their behaviour.
[ Part of this patch has been refactorized and modified by Pablo Neira Ayuso ]
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: I4ab0fcff9b5d4637f480e1146e5bc4100cf2cf7b
This patch adds the flags parameter to ipv6_find_hdr. This flags
allows us to:
* know if this is a fragment.
* stop at the AH header, so the information contained in that header
can be used for some specific packet handling.
This patch also adds the offset parameter for inspection of one
inner IPv6 header that is contained in error messages.
Change-Id: I8aa13399597dcb72c73084bcd7f8ca4156326357
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch removes ip_queue support which was marked as obsolete
years ago. The nfnetlink_queue modules provides more advanced
user-space packet queueing mechanism.
This patch also removes capability code included in SELinux that
refers to ip_queue. Otherwise, we break compilation.
Several warning has been sent regarding this to the mailing list
in the past month without anyone rising the hand to stop this
with some strong argument.
Change-Id: I62ab355af31e708b3c1000f2252c8196fb8ba428
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Explicit helper attachment via the CT target is broken with NAT
if non-standard ports are used. This problem was hidden behind
the automatic helper assignment routine. Thus, it becomes more
noticeable now that we can disable the automatic helper assignment
with Eric Leblond's:
9e8ac5a netfilter: nf_ct_helper: allow to disable automatic helper assignment
Basically, nf_conntrack_alter_reply asks for looking up the helper
up if NAT is enabled. Unfortunately, we don't have the conntrack
template at that point anymore.
Since we don't want to rely on the automatic helper assignment,
we can skip the second look-up and stick to the helper that was
attached by iptables. With the CT target, the user is in full
control of helper attachment, thus, the policy is to trust what
the user explicitly configures via iptables (no automatic magic
anymore).
Interestingly, this bug was hidden by the automatic helper look-up
code. But it can be easily trigger if you attach the helper in
a non-standard port, eg.
iptables -I PREROUTING -t raw -p tcp --dport 8888 \
-j CT --helper ftp
And you disabled the automatic helper assignment.
I added the IPS_HELPER_BIT that allows us to differenciate between
a helper that has been explicitly attached and those that have been
automatically assigned. I didn't come up with a better solution
(having backward compatibility in mind).
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: Iff14005c0627ce7e76c9fa3b0b6c95b892901ca3
This refreshes the "timeout" attribute in existing expectations if one is
given.
The use case for this would be for userspace helpers to extend the lifetime
of the expectation when requested, as this is not possible right now
without deleting/recreating the expectation.
I use this specifically for forwarding DCERPC traffic through:
DCERPC has a port mapper daemon that chooses a (seemingly) random port for
future traffic to go to. We expect this traffic (with a reasonable
timeout), but sometimes the port mapper will tell the client to continue
using the same port. This allows us to extend the expectation accordingly.
Signed-off-by: Kelvie Wong <kelvie@ieee.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: I1b4f4479aaef63103a665ee7cb1ff70001912756
This patch allows you to disable automatic conntrack helper
lookup based on TCP/UDP ports, eg.
echo 0 > /proc/sys/net/netfilter/nf_conntrack_helper
[ Note: flows that already got a helper will keep using it even
if automatic helper assignment has been disabled ]
Once this behaviour has been disabled, you have to explicitly
use the iptables CT target to attach helper to flows.
There are good reasons to stop supporting automatic helper
assignment, for further information, please read:
http://www.netfilter.org/news.html#2012-04-03
This patch also adds one message to inform that automatic helper
assignment is deprecated and it will be removed soon (this is
spotted only once, with the first flow that gets a helper attached
to make it as less annoying as possible).
Signed-off-by: Eric Leblond <eric@regit.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Change-Id: Ie1356fa575971063d579cf943ac0354d1513cfc3
In certain 802.11 wireless deployments, there will be NA proxies
that use knowledge of the network to correctly answer requests.
To prevent unsolicitd advertisements on the shared medium from
being a problem, on such deployments wireless needs to drop them.
Enable this by providing an option called "drop_unsolicited_na".
Change-Id: I2567a9973e72165a8e546f3638b509fbd1c95298
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to solve a problem with 802.11, the so-called hole-196 attack,
add an option (sysctl) called "drop_unicast_in_l2_multicast" which, if
enabled, causes the stack to drop IPv6 unicast packets encapsulated in
link-layer multi- or broadcast frames. Such frames can (as an attack)
be created by any member of the same wireless network and transmitted
as valid encrypted frames since the symmetric key for broadcast frames
is shared between all stations.
Change-Id: I8a0b45fbd533236fbd785e6e8aa20fb780aa1397
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In certain 802.11 wireless deployments, there will be ARP proxies
that use knowledge of the network to correctly answer requests.
To prevent gratuitous ARP frames on the shared medium from being
a problem, on such deployments wireless needs to drop them.
Enable this by providing an option called "drop_gratuitous_arp".
Change-Id: Ic0ed4c7e520b1d973eb1ae206af0f882badc21ce
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to solve a problem with 802.11, the so-called hole-196 attack,
add an option (sysctl) called "drop_unicast_in_l2_multicast" which, if
enabled, causes the stack to drop IPv4 unicast packets encapsulated in
link-layer multi- or broadcast frames. Such frames can (as an attack)
be created by any member of the same wireless network and transmitted
as valid encrypted frames since the symmetric key for broadcast frames
is shared between all stations.
Additionally, enabling this option provides compliance with a SHOULD
clause of RFC 1122.
Change-Id: Ib0c44d9e36d879be4f073db1936a986003390b78
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As suggested by Julian:
Simply, flowi4_iif must not contain 0, it does not
look logical to ignore all ip rules with specified iif.
because in fib_rule_match() we do:
if (rule->iifindex && (rule->iifindex != fl->flowi_iif))
goto out;
flowi4_iif should be LOOPBACK_IFINDEX by default.
We need to move LOOPBACK_IFINDEX to include/net/flow.h:
1) It is mostly used by flowi_iif
2) Fix the following compile error if we use it in flow.h
by the patches latter:
In file included from include/linux/netfilter.h:277:0,
from include/net/netns/netfilter.h:5,
from include/net/net_namespace.h:21,
from include/linux/netdevice.h:43,
from include/linux/icmpv6.h:12,
from include/linux/ipv6.h:61,
from include/net/ipv6.h:16,
from include/linux/sunrpc/clnt.h:27,
from include/linux/nfs_fs.h:30,
from init/do_mounts.c:32:
include/net/flow.h: In function ‘flowi4_init_output’:
include/net/flow.h:84:32: error: ‘LOOPBACK_IFINDEX’ undeclared (first use in this function)
[Backport of net-next 6a662719c9]
Change-Id: Ib7a0a08d78c03800488afa1b2c170cb70e34cfd9
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Alex Efros reported rpfilter module doesn't match following packets:
IN=br.qemu SRC=192.168.2.1 DST=192.168.2.255 [ .. ]
(netfilter bugzilla #814).
Problem is that network stack arranges for the locally generated broadcasts
to appear on the interface they were sent out, so the IFF_LOOPBACK check
doesn't trigger.
As -m rpfilter is restricted to PREROUTING, we can check for existing
rtable instead, it catches locally-generated broad/multicast case, too.
Change-Id: I2d921ac4d53e5b1ca9a5249e489c33e4fa4a4b3a
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
As pointed out, there are places, that access net->loopback_dev->ifindex
and after ifindex generation is made per-net this value becomes constant
equals 1. So go ahead and introduce the LOOPBACK_IFINDEX constant and use
it where appropriate.
Change-Id: I29fd08fa01a9522240ab654d436b02a577bb610c
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The 'dir' parameter in xfrm_migrate() is a user-controlled byte which is used
as an array index. This can lead to an out-of-bound access, kernel lockup and
DoS. Add a check for the 'dir' value.
This fixes CVE-2017-11600.
Change-Id: Ic55eec5b4767ad1bd8328b382c35f7b213abc38d
References: https://bugzilla.redhat.com/show_bug.cgi?id=1474928
Fixes: 80c9abaabf ("[XFRM]: Extension for dynamic update of endpoint address(es)")
Cc: <stable@vger.kernel.org> # v2.6.21-rc1
Reported-by: "bo Zhang" <zhangbo5891001@gmail.com>
Signed-off-by: Vladis Dronov <vdronov@redhat.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
We should call ipxitf_put() if the copy_to_user() fails.
Reported-by: 李强 <liqiang6-s@360.cn>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: Ib541c679cc5f4242713eb035aed458043b8ce97e
(cherry picked from commit ee0d8d8482345ff97a75a7d747efc309f13b0d80)