Commit Graph

126 Commits

Author SHA1 Message Date
Lorenzo Colitti 1e944b8e1b net: inet: Support UID-based routing in IP protocols.
- Use the UID in routing lookups made by protocol connect() and
  sendmsg() functions.
- Make sure that routing lookups triggered by incoming packets
  (e.g., Path MTU discovery) take the UID of the socket into
  account.
- For packets not associated with a userspace socket, (e.g., ping
  replies) use UID 0 inside the user namespace corresponding to
  the network namespace the socket belongs to. This allows
  all namespaces to apply routing and iptables rules to
  kernel-originated traffic in that namespaces by matching UID 0.
  This is better than using the UID of the kernel socket that is
  sending the traffic, because the UID of kernel sockets created
  at namespace creation time (e.g., the per-processor ICMP and
  TCP sockets) is the UID of the user that created the socket,
  which might not be mapped in the namespace.

[Backport of net-next e2d118a1cb5e60d077131a09db1d81b90a5295fe]

Bug: 16355602
Change-Id: I910504b508948057912bc188fd1e8aca28294de3
Tested: compiles allnoconfig, allyesconfig, allmodconfig
Tested: https://android-review.googlesource.com/253302
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Git-commit: 327455146c7467670e7c94b089ef88f57bc57311
Git-repo: https://android.googlesource.com/kernel/common.git
[resolved trivial merge conflicts]
Signed-off-by: Srinivasarao P <spathi@codeaurora.org>
2019-07-27 21:50:59 +02:00
Lorenzo Colitti c03bb93953 Revert "net: core: Support UID-based routing."
This reverts commit 99a6ea48b591877d1cd6a51732c40a1d5321d961.

Bug: 16355602
Change-Id: I7d75b52d8863e932707daf391892480542c2e965

Git-commit: a11e5dc0a6000c5691d2e267893831abc68bd5d9
Git-repo: https://android.googlesource.com/kernel/common.git
[resolved trivial merge conflicts and compilation errors]
Signed-off-by: Srinivasarao P <spathi@codeaurora.org>
2019-07-27 21:50:58 +02:00
Lorenzo Colitti 8d9e329179 Revert "Handle 'sk' being NULL in UID-based routing."
This reverts commit 455b09d66a9ccfc572497ae88375ae343ff9ae66.

Bug: 16355602
Change-Id: Ibbb543d9ffaf0eb8e231b5dcf502e2d0f8916572
Git-commit: 9e17a9f81750144c4f2108d7d197391414fdd565
Git-repo: https://android.googlesource.com/kernel/common.git
Signed-off-by: Srinivasarao P <spathi@codeaurora.org>
2019-07-27 21:50:58 +02:00
Sreeram Ramachandran c841195caa Handle 'sk' being NULL in UID-based routing.
Bug: 15413527
Change-Id: If33bebb7b52c0ebfa8dac2452607bce0c2b0faa0
Signed-off-by: Sreeram Ramachandran <sreeram@google.com>
Git-commit: 0836a0c191f580ed69254e0b287cdce58481e978
Git-repo: https://android.googlesource.com/kernel/common.git
[imaund@codeaurora.org: Additional ternary added to prevent
  dereferencing NULL in QCOM code.]
Signed-off-by: Ian Maund <imaund@codeaurora.org>
2014-11-04 13:08:33 -08:00
Lorenzo Colitti a2a8c36364 net: core: Support UID-based routing.
This contains the following commits:

1. 0149763 net: core: Add a UID range to fib rules.
2. 1650474 net: core: Use the socket UID in routing lookups.
3. 0b16771 net: ipv4: Add the UID to the route cache.
4. ee058f1 net: core: Add a RTA_UID attribute to routes.
    This is so that userspace can do per-UID route lookups.

Bug: 15413527
Change-Id: I1285474c6734614d3bda6f61d88dfe89a4af7892
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Git-commit: 0b428749ce5969bc06c73855e360141b4e7126e8
Git-repo: https://android.googlesource.com/kernel/common.git
[imaund@codeaurora.org: Resolved conflicts related to removal
  of oif and mark, as well as refactoring of files.]
Signed-off-by: Ian Maund <imaund@codeaurora.org>
2014-11-04 13:06:30 -08:00
Eric Dumazet 6da025fa23 ipv4: avoid a test in ip_rt_put()
We can save a test in ip_rt_put(), considering dst_release() accepts
a NULL parameter, and dst is first element in rtable.

Add a BUILD_BUG_ON() to catch any change that could break this
assertion.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Cong Wang <amwang@redhat.com>
Acked-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-03 14:59:04 -04:00
Julian Anastasov 155e8336c3 ipv4: introduce rt_uses_gateway
Add new flag to remember when route is via gateway.
We will use it to allow rt_gateway to contain address of
directly connected host for the cases when DST_NOCACHE is
used or when the NH exception caches per-destination route
without DST_NOCACHE flag, i.e. when routes are not used for
other destinations. By this way we force the neighbour
resolving to work with the routed destination but we
can use different address in the packet, feature needed
for IPVS-DR where original packet for virtual IP is routed
via route to real IP.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-08 17:42:36 -04:00
Nicolas Dichtel bafa6d9d89 ipv4/route: arg delay is useless in rt_cache_flush()
Since route cache deletion (89aef8921b), delay is no
more used. Remove it.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-18 15:44:34 -04:00
David S. Miller caacf05e5a ipv4: Properly purge netdev references on uncached routes.
When a device is unregistered, we have to purge all of the
references to it that may exist in the entire system.

If a route is uncached, we currently have no way of accomplishing
this.

So create a global list that is scanned when a network device goes
down.  This mirrors the logic in net/core/dst.c's dst_ifdown().

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-31 15:06:50 -07:00
David S. Miller c6cffba4ff ipv4: Fix input route performance regression.
With the routing cache removal we lost the "noref" code paths on
input, and this can kill some routing workloads.

Reinstate the noref path when we hit a cached route in the FIB
nexthops.

With help from Eric Dumazet.

Reported-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-26 15:50:39 -07:00
David S. Miller 13378cad02 ipv4: Change rt->rt_iif encoding.
On input packet processing, rt->rt_iif will be zero if we should
use skb->dev->ifindex.

Since we access rt->rt_iif consistently via inet_iif(), that is
the only spot whose interpretation have to adjust.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-23 16:36:27 -07:00
David S. Miller 2860583fe8 ipv4: Kill rt->fi
It's not really needed.

We only grabbed a reference to the fib_info for the sake of fib_info
local metrics.

However, fib_info objects are freed using RCU, as are therefore their
private metrics (if any).

We would have triggered a route cache flush if we eliminated a
reference to a fib_info object in the routing tables.

Therefore, any existing cached routes will first check and see that
they have been invalidated before an errant reference to these
metric values would occur.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 13:40:07 -07:00
David S. Miller 9917e1e876 ipv4: Turn rt->rt_route_iif into rt->rt_is_input.
That is this value's only use, as a boolean to indicate whether
a route is an input route or not.

So implement it that way, using a u16 gap present in the struct
already.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 13:40:02 -07:00
David S. Miller 4fd551d7be ipv4: Kill rt->rt_oif
Never actually used.

It was being set on output routes to the original OIF specified in the
flow key used for the lookup.

Adjust the only user, ipmr_rt_fib_lookup(), for greater correctness of
the flowi4_oif and flowi4_iif values, thanks to feedback from Julian
Anastasov.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 13:38:34 -07:00
David S. Miller f8126f1d51 ipv4: Adjust semantics of rt->rt_gateway.
In order to allow prefixed routes, we have to adjust how rt_gateway
is set and interpreted.

The new interpretation is:

1) rt_gateway == 0, destination is on-link, nexthop is iph->daddr

2) rt_gateway != 0, destination requires a nexthop gateway

Abstract the fetching of the proper nexthop value using a new
inline helper, rt_nexthop(), as suggested by Joe Perches.

Signed-off-by: David S. Miller <davem@davemloft.net>
Tested-by: Vijay Subramanian <subramanian.vijay@gmail.com>
2012-07-20 13:31:20 -07:00
David S. Miller f1ce3062c5 ipv4: Remove 'rt_dst' from 'struct rtable'
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 13:31:19 -07:00
David Miller b48698895d ipv4: Remove 'rt_mark' from 'struct rtable'
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 13:31:18 -07:00
David Miller d6c0a4f609 ipv4: Kill 'rt_src' from 'struct rtable'
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 13:31:00 -07:00
David Miller 1a00fee4ff ipv4: Remove rt_key_{src,dst,tos} from struct rtable.
They are always used in contexts where they can be reconstituted,
or where the finally resolved rt->rt_{src,dst} is semantically
equivalent.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 13:30:59 -07:00
David Miller 38a424e465 ipv4: Kill ip_route_input_noref().
The "noref" argument to ip_route_input_common() is now always ignored
because we do not cache routes, and in that case we must always grab
a reference to the resulting 'dst'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 13:30:59 -07:00
David S. Miller 89aef8921b ipv4: Delete routing cache.
The ipv4 routing cache is non-deterministic, performance wise, and is
subject to reasonably easy to launch denial of service attacks.

The routing cache works great for well behaved traffic, and the world
was a much friendlier place when the tradeoffs that led to the routing
cache's design were considered.

What it boils down to is that the performance of the routing cache is
a product of the traffic patterns seen by a system rather than being a
product of the contents of the routing tables.  The former of which is
controllable by external entitites.

Even for "well behaved" legitimate traffic, high volume sites can see
hit rates in the routing cache of only ~%10.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 13:30:27 -07:00
David S. Miller 1f42539d25 ipv4: Kill ip_rt_redirect().
No longer needed, as the protocol handlers now all properly
propagate the redirect back into the routing code.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-11 21:30:08 -07:00
David S. Miller b42597e2f3 ipv4: Add ipv4_redirect() and ipv4_sk_redirect() helper functions.
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-11 21:25:45 -07:00
David S. Miller 94206125c4 ipv4: Rearrange arguments to ip_rt_redirect()
Pass in the SKB rather than just the IP addresses, so that policy
and other aspects can reside in ip_rt_redirect() rather then
icmp_redirect().

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-11 20:38:08 -07:00
David S. Miller f185071ddf ipv4: Remove inetpeer from routes.
No longer used.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-10 22:40:18 -07:00
David S. Miller 5943634fc5 ipv4: Maintain redirect and PMTU info in struct rtable again.
Maintaining this in the inetpeer entries was not the right way to do
this at all.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-10 22:40:14 -07:00
David S. Miller 3e12939a2a inet: Kill FLOWI_FLAG_PRECOW_METRICS.
No longer needed.  TCP writes metrics, but now in it's own special
cache that does not dirty the route metrics.  Therefore there is no
longer any reason to pre-cow metrics in this way.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-10 22:40:12 -07:00
David S. Miller 41347dcdd8 ipv4: Kill rt->rt_spec_dst, no longer used.
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-28 04:05:27 -07:00
David S. Miller c10237e077 Revert "ipv4: tcp: dont cache unconfirmed intput dst"
This reverts commit c074da2810.

This change has several unwanted side effects:

1) Sockets will cache the DST_NOCACHE route in sk->sk_rx_dst and we'll
   thus never create a real cached route.

2) All TCP traffic will use DST_NOCACHE and never use the routing
   cache at all.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-27 17:05:06 -07:00
Eric Dumazet c074da2810 ipv4: tcp: dont cache unconfirmed intput dst
DDOS synflood attacks hit badly IP route cache.

On typical machines, this cache is allowed to hold up to 8 Millions dst
entries, 256 bytes for each, for a total of 2GB of memory.

rt_garbage_collect() triggers and tries to cleanup things.

Eventually route cache is disabled but machine is under fire and might
OOM and crash.

This patch exploits the new TCP early demux, to set a nocache
boolean in case incoming TCP frame is for a not yet ESTABLISHED or
TIMEWAIT socket.

This 'nocache' boolean is then used in case dst entry is not found in
route cache, to create an unhashed dst entry (DST_NOCACHE)

SYN-cookie-ACK sent use a similar mechanism (ipv4: tcp: dont cache
output dst for syncookies), so after this patch, a machine is able to
absorb a DDOS synflood attack without polluting its IP route cache.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-27 15:34:24 -07:00
David S. Miller 3639339553 ipv4: Handle PMTU in all ICMP error handlers.
With ip_rt_frag_needed() removed, we have to explicitly update PMTU
information in every ICMP error handler.

Create two helper functions to facilitate this.

1) ipv4_sk_update_pmtu()

   This updates the PMTU when we have a socket context to
   work with.

2) ipv4_update_pmtu()

   Raw version, used when no socket context is available.  For this
   interface, we essentially just pass in explicit arguments for
   the flow identity information we would have extracted from the
   socket.

   And you'll notice that ipv4_sk_update_pmtu() is simply implemented
   in terms of ipv4_update_pmtu()

Note that __ip_route_output_key() is used, rather than something like
ip_route_output_flow() or ip_route_output_key().  This is because we
absolutely do not want to end up with a route that does IPSEC
encapsulation and the like.  Instead, we only want the route that
would get us to the node described by the outermost IP header.

Reported-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-14 22:22:07 -07:00
David S. Miller 43b03f1f6d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	MAINTAINERS
	drivers/net/wireless/iwlwifi/pcie/trans.c

The iwlwifi conflict was resolved by keeping the code added
in 'net' that turns off the buggy chip feature.

The MAINTAINERS conflict was merely overlapping changes, one
change updated all the wireless web site URLs and the other
changed some GIT trees to be Johannes's instead of John's.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-12 21:59:18 -07:00
David S. Miller 55afabaa0d inet: Fix BUG triggered by __rt{,6}_get_peer().
If no peer actually gets attached (either because create is zero or
the peer allocation fails) we'll trigger a BUG because we
unconditionally do an rt{,6}_peer_ptr() afterwards.

Fix this by guarding it with the proper check.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-11 15:52:29 -07:00
David S. Miller 46517008e1 ipv4: Kill ip_rt_frag_needed().
There is zero point to this function.

It's only real substance is to perform an extremely outdated BSD4.2
ICMP check, which we can safely remove.  If you really have a MTU
limited link being routed by a BSD4.2 derived system, here's a nickel
go buy yourself a real router.

The other actions of ip_rt_frag_needed(), checking and conditionally
updating the peer, are done by the per-protocol handlers of the ICMP
event.

TCP, UDP, et al. have a handler which will receive this event and
transmit it back into the associated route via dst_ops->update_pmtu().

This simplification is important, because it eliminates the one place
where we do not have a proper route context in which to make an
inetpeer lookup.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-11 02:08:59 -07:00
David S. Miller 97bab73f98 inet: Hide route peer accesses behind helpers.
We encode the pointer(s) into an unsigned long with one state bit.

The state bit is used so we can store the inetpeer tree root to use
when resolving the peer later.

Later the peer roots will be per-FIB table, and this change works to
facilitate that.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-11 02:08:47 -07:00
Roland Dreier c5d21c4b2a net: Reorder initialization in ip_route_output to fix gcc warning
If I build with W=1, for every file that includes <net/route.h>, I get the warning

    include/net/route.h: In function 'ip_route_output':
    include/net/route.h:135:3: warning: initialized field overwritten [-Woverride-init]
    include/net/route.h:135:3: warning: (near initialization for 'fl4') [-Woverride-init]

(This is with "gcc (Debian 4.6.3-1) 4.6.3")

A fix seems pretty trivial: move the initialization of .flowi4_tos
earlier.  As far as I can tell, this has no effect on code generation.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-11 00:04:47 -07:00
David S. Miller fbfe95a42e inet: Create and use rt{,6}_get_peer_create().
There's a lot of places that open-code rt{,6}_get_peer() only because
they want to set 'create' to one.  So add an rt{,6}_get_peer_create()
for their sake.

There were also a few spots open-coding plain rt{,6}_get_peer() and
those are transformed here as well.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-08 23:24:18 -07:00
Eric Dumazet 95c9617472 net: cleanup unsigned to unsigned int
Use of "unsigned int" is preferred to bare "unsigned" in net tree.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-15 12:44:40 -04:00
Julian Anastasov e6b45241c5 ipv4: reset flowi parameters on route connect
Eric Dumazet found that commit 813b3b5db8
(ipv4: Use caller's on-stack flowi as-is in output
route lookups.) that comes in 3.0 added a regression.
The problem appears to be that resulting flowi4_oif is
used incorrectly as input parameter to some routing lookups.
The result is that when connecting to local port without
listener if the IP address that is used is not on a loopback
interface we incorrectly assign RTN_UNICAST to the output
route because no route is matched by oif=lo. The RST packet
can not be sent immediately by tcp_v4_send_reset because
it expects RTN_LOCAL.

	So, change ip_route_connect and ip_route_newports to
update the flowi4 fields that are input parameters because
we do not want unnecessary binding to oif.

	To make it clear what are the input parameters that
can be modified during lookup and to show which fields of
floiw4 are reused add a new function to update the flowi4
structure: flowi4_update_output.

Thanks to Yurij M. Plotnikov for providing a bug report including a
program to reproduce the problem.

Thanks to Eric Dumazet for tracking the problem down to
tcp_v4_send_reset and providing initial fix.

Reported-by: Yurij M. Plotnikov <Yurij.Plotnikov@oktetlabs.ru>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-04 19:29:48 -05:00
Steffen Klassert b8400f3718 route: struct rtable can be const in rt_is_input_route and rt_is_output_route
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-26 14:29:51 -05:00
David S. Miller a48eff1288 ipv4: Pass explicit destination address to rt_bind_peer().
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-18 18:42:43 -04:00
David S. Miller ed2361e66e ipv4: Pass explicit destination address to rt_get_peer().
This will next trickle down to rt_bind_peer().

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-18 18:38:54 -04:00
David S. Miller 8e36360ae8 ipv4: Remove route key identity dependencies in ip_rt_get_source().
Pass in the sk_buff so that we can fetch the necessary keys from
the packet header when working with input routes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-13 17:29:41 -04:00
David S. Miller cbb1e85f9c ipv4: Kill rt->rt_{src, dst} usage in IP GRE tunnels.
First, make callers pass on-stack flowi4 to ip_route_output_gre()
so they can get at the fully resolved flow key.

Next, use that in ipgre_tunnel_xmit() to avoid the need to use
rt->rt_{dst,src}.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-04 12:55:07 -07:00
David S. Miller 31e4543db2 ipv4: Make caller provide on-stack flow key to ip_route_output_ports().
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-03 20:25:42 -07:00
David S. Miller 475949d8e8 ipv4: Renamt struct rtable's rt_tos to rt_key_tos.
To more accurately reflect that it is purely a routing
cache lookup key and is used in no other context.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-03 19:45:15 -07:00
David S. Miller 6706b6ebab ipv4: Remove now superfluous code in ip_route_connect().
Now that output route lookups update the flow with
source address et al. selections, the fl4->{saddr,daddr}
assignments here are no longer necessary.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-28 23:13:17 -07:00
David S. Miller 813b3b5db8 ipv4: Use caller's on-stack flowi as-is in output route lookups.
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-28 22:26:00 -07:00
David S. Miller b678027cb7 ipv4: Kill RTO_CONN.
It's not used by anything in the kernel, and defined in net/route.h so
never exported to userspace.

Therefore we can safely remove it.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-27 13:59:05 -07:00
David S. Miller 2d7192d6cb ipv4: Sanitize and simplify ip_route_{connect,newports}()
These functions are used together as a unit for route resolution
during connect().  They address the chicken-and-egg problem that
exists when ports need to be allocated during connect() processing,
yet such port allocations require addressing information from the
routing code.

It's currently more heavy handed than it needs to be, and in
particular we allocate and initialize a flow object twice.

Let the callers provide the on-stack flow object.  That way we only
need to initialize it once in the ip_route_connect() call.

Later, if ip_route_newports() needs to do anything, it re-uses that
flow object as-is except for the ports which it updates before the
route re-lookup.

Also, describe why this set of facilities are needed and how it works
in a big comment.

Signed-off-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
2011-04-27 13:59:04 -07:00