Commit Graph

90 Commits

Author SHA1 Message Date
Eric Dumazet b01a8531d0 netns: provide pure entropy for net_hash_mix()
[ Upstream commit 355b98553789b646ed97ad801a619ff898471b92 ]

net_hash_mix() currently uses kernel address of a struct net,
and is used in many places that could be used to reveal this
address to a patient attacker, thus defeating KASLR, for
the typical case (initial net namespace, &init_net is
not dynamically allocated)

I believe the original implementation tried to avoid spending
too many cycles in this function, but security comes first.

Also provide entropy regardless of CONFIG_NET_NS.

Fixes: 0b4419162a ("netns: introduce the net_hash_mix "salt" for hashes")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Amit Klein <aksecurity@gmail.com>
Reported-by: Benny Pinkas <benny@pinkas.net>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-27 22:10:05 +02:00
Eric Dumazet f553bcd528 netfilter: ipv6: nf_defrag: reduce struct net memory waste
commit 9ce7bc036ae4cfe3393232c86e9e1fea2153c237 upstream.

It is a waste of memory to use a full "struct netns_sysctl_ipv6"
while only one pointer is really used, considering netns_sysctl_ipv6
keeps growing.

Also, since "struct netns_frags" has cache line alignment,
it is better to move the frags_hdr pointer outside, otherwise
we spend a full cache line for this pointer.

This saves 192 bytes of memory per netns.

Fixes: c038a767cd ("ipv6: add a new namespace for nf_conntrack_reasm")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:52:53 +02:00
Timo Teräs 369508d237 ipv4: use separate genid for next hop exceptions
commit 13d82bf5 (ipv4: Fix flushing of cached routing informations)
added the support to flush learned pmtu information.

However, using rt_genid is quite heavy as it is bumped on route
add/change and multicast events amongst other places. These can
happen quite often, especially if using dynamic routing protocols.

While this is ok with routes (as they are just recreated locally),
the pmtu information is learned from remote systems and the icmp
notification can come with long delays. It is worthy to have separate
genid to avoid excessive pmtu resets.

Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-27 21:45:24 +02:00
Cong Wang 773a447344 ipv4, fib: pass LOOPBACK_IFINDEX instead of 0 to flowi4_iif
As suggested by Julian:

	Simply, flowi4_iif must not contain 0, it does not
	look logical to ignore all ip rules with specified iif.

because in fib_rule_match() we do:

        if (rule->iifindex && (rule->iifindex != fl->flowi_iif))
                goto out;

flowi4_iif should be LOOPBACK_IFINDEX by default.

We need to move LOOPBACK_IFINDEX to include/net/flow.h:

1) It is mostly used by flowi_iif

2) Fix the following compile error if we use it in flow.h
by the patches latter:

In file included from include/linux/netfilter.h:277:0,
                 from include/net/netns/netfilter.h:5,
                 from include/net/net_namespace.h:21,
                 from include/linux/netdevice.h:43,
                 from include/linux/icmpv6.h:12,
                 from include/linux/ipv6.h:61,
                 from include/net/ipv6.h:16,
                 from include/linux/sunrpc/clnt.h:27,
                 from include/linux/nfs_fs.h:30,
                 from init/do_mounts.c:32:
include/net/flow.h: In function ‘flowi4_init_output’:
include/net/flow.h:84:32: error: ‘LOOPBACK_IFINDEX’ undeclared (first use in this function)

[Backport of net-next 6a662719c9868b3d6c7d26b3a085f0cd3cc15e64]

Change-Id: Ib7a0a08d78c03800488afa1b2c170cb70e34cfd9
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Git-commit: 9c086b4cf266e9ac1afabb86ff9ef54407b344e2
Git-repo: https://android.googlesource.com/kernel/common.git
Signed-off-by: Kaushal Kumar <kaushalk@codeaurora.org>
2015-09-16 18:20:08 +05:30
Gao feng f3c1a44a22 netfilter: make /proc/net/netfilter pernet
This patch makes this proc dentry pernet. So far only init_net
had a /proc/net/netfilter directory.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-05 19:35:02 +02:00
Eric W. Biederman 98f842e675 proc: Usable inode numbers for the namespace file descriptors.
Assign a unique proc inode to each namespace, and use that
inode number to ensure we only allocate at most one proc
inode for every namespace in proc.

A single proc inode per namespace allows userspace to test
to see if two processes are in the same namespace.

This has been a long requested feature and only blocked because
a naive implementation would put the id in a global space and
would ultimately require having a namespace for the names of
namespaces, making migration and certain virtualization tricks
impossible.

We still don't have per superblock inode numbers for proc, which
appears necessary for application unaware checkpoint/restart and
migrations (if the application is using namespace file descriptors)
but that is now allowd by the design if it becomes important.

I have preallocated the ipc and uts initial proc inode numbers so
their structures can be statically initialized.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-11-20 04:19:49 -08:00
Eric W. Biederman 038e7332b8 userns: make each net (net_ns) belong to a user_ns
The user namespace which creates a new network namespace owns that
namespace and all resources created in it.  This way we can target
capability checks for privileged operations against network resources to
the user_ns which created the network namespace in which the resource
lives.  Privilege to the user namespace which owns the network
namespace, or any parent user namespace thereof, provides the same
privilege to the network resource.

This patch is reworked from a version originally by
Serge E. Hallyn <serge.hallyn@canonical.com>

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-11-18 22:46:23 -08:00
Eric W. Biederman d727abcb23 netns: Deduplicate and fix copy_net_ns when !CONFIG_NET_NS
The copy of copy_net_ns used when the network stack is not
built is broken as it does not return -EINVAL when attempting
to create a new network namespace.  We don't even have
a previous network namespace.

Since we need a copy of copy_net_ns in net/net_namespace.h that is
available when the networking stack is not built at all move the
correct version of copy_net_ns from net_namespace.c into net_namespace.h
Leaving us with just 2 versions of copy_net_ns.  One version for when
we compile in network namespace suport and another stub for all other
occasions.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-11-18 22:46:19 -08:00
Andi Kleen 04a6f82cf0 sections: fix section conflicts in net
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:04:45 +09:00
David S. Miller 6a06e5e1bb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/team/team.c
	drivers/net/usb/qmi_wwan.c
	net/batman-adv/bat_iv_ogm.c
	net/ipv4/fib_frontend.c
	net/ipv4/route.c
	net/l2tp/l2tp_netlink.c

The team, fib_frontend, route, and l2tp_netlink conflicts were simply
overlapping changes.

qmi_wwan and bat_iv_ogm were of the "use HEAD" variety.

With help from Antonio Quartulli.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-28 14:40:49 -04:00
Amerigo Wang c038a767cd ipv6: add a new namespace for nf_conntrack_reasm
As pointed by Michal, it is necessary to add a new
namespace for nf_conntrack_reasm code, this prepares
for the second patch.

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Michal Kubeček <mkubecek@suse.cz>
Cc: David Miller <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: netfilter-devel@vger.kernel.org
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-19 17:23:28 -04:00
Nicolas Dichtel b42664f898 netns: move net->ipv4.rt_genid to net->rt_genid
This commit prepares the use of rt_genid by both IPv4 and IPv6.
Initialization is left in IPv4 part.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-18 15:57:03 -04:00
Eric W. Biederman 4db67e8086 sctp: Make the address lists per network namespace
- Move the address lists into struct net
- Add per network namespace initialization and cleanup
- Pass around struct net so it is everywhere I need it.
- Rename all of the global variable references into references
  to the variables moved into struct net

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 23:12:17 -07:00
Pavel Emelyanov 1fb9489bf1 net: Loopback ifindex is constant now
As pointed out, there are places, that access net->loopback_dev->ifindex
and after ifindex generation is made per-net this value becomes constant
equals 1. So go ahead and introduce the LOOPBACK_IFINDEX constant and use
it where appropriate.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-09 16:18:07 -07:00
Pavel Emelyanov aa79e66eee net: Make ifindex generation per-net namespace
Strictly speaking this is only _really_ required for checkpoint-restore to
make loopback device always have the same index.

This change appears to be safe wrt "ifindex should be unique per-system"
concept, as all the ifindex usage is either already made per net namespace
of is explicitly limited with init_net only.

There are two cool side effects of this. The first one -- ifindices of
devices in container are always small, regardless of how many containers
we've started (and re-started) so far. The second one is -- we can speed
up the loopback ifidex access as shown in the next patch.

v2: Place ifindex right after dev_base_seq : avoid two holes and use the
    same cache line, dirtied in list_netdevice()/unlist_netdevice()

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-09 16:18:07 -07:00
Andrey Vagin 51d7cccf07 net: make sock diag per-namespace
Before this patch sock_diag works for init_net only and dumps
information about sockets from all namespaces.

This patch expands sock_diag for all name-spaces.
It creates a netlink kernel socket for each netns and filters
data during dumping.

v2: filter accoding with netns in all places
    remove an unused variable.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Pavel Emelyanov <xemul@parallels.com>
CC: Eric Dumazet <eric.dumazet@gmail.com>
Cc: linux-kernel@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Andrew Vagin <avagin@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-16 22:31:34 -07:00
Eric W. Biederman 48c7495857 net sysctl: Add place holder functions for when sysctl support is compiled out of the kernel.
Randy Dunlap <rdunlap@xenotime.net> reported:
> On 04/23/2012 12:07 AM, Stephen Rothwell wrote:
>
>> Hi all,
>>
>> Changes since 20120420:
>
>
>
> ERROR: "unregister_net_sysctl_table" [net/phonet/phonet.ko] undefined!
> ERROR: "register_net_sysctl" [net/phonet/phonet.ko] undefined!
>
> when CONFIG_SYSCTL is not enabled.

Add static inline stub functions to gracefully handle the case when sysctl
support is not present.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-23 19:24:28 -04:00
Eric W. Biederman 5f568e5afe net: Remove register_net_sysctl_table
All of the users have been converted to use registera_net_sysctl so we
no longer need register_net_sysctl.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:22:30 -04:00
Eric W. Biederman 4344475797 net: Kill register_sysctl_rotable
register_sysctl_rotable never caught on as an interesting way to
register sysctls.  My take on the situation is that what we want are
sysctls that we can only see in the initial network namespace.  What we
have implemented with register_sysctl_rotable are sysctls that we can
see in all of the network namespaces and can only change in the initial
network namespace.

That is a very silly way to go.  Just register the network sysctls
in the initial network namespace and we don't have any weird special
cases to deal with.

The sysctls affected are:
/proc/sys/net/ipv4/ipfrag_secret_interval
/proc/sys/net/ipv4/ipfrag_max_dist
/proc/sys/net/ipv6/ip6frag_secret_interval
/proc/sys/net/ipv6/mld_max_msf

I really don't expect anyone will miss them if they can't read them in a
child user namespace.

CC: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:21:17 -04:00
Eric W. Biederman 2ca794e5e8 net sysctl: Initialize the network sysctls sooner to avoid problems.
If the netfilter code is modified to use register_net_sysctl_table the
kernel fails to boot because the per net sysctl infrasturce is not setup
soon enough.  So to avoid races call net_sysctl_init from sock_init().

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:21:16 -04:00
Eric W. Biederman ab41a2ca50 net: Implement register_net_sysctl.
Right now all of the networking sysctl registrations are running in a
compatibiity mode.  The natvie sysctl registration api takes a cstring
for a path and a simple ctl_table.  Implement register_net_sysctl so
that we can register network sysctls without needing to use
compatiblity code in the sysctl core.

Switching from a ctl_path to a cstring results in less boiler plate
and denser code that is a little easier to read.

I would simply have changed the arguments to register_net_sysctl_table
instead of keeping two functions in parallel but gcc will allow a
ctl_path pointer to be passed to a char * pointer with only issuing a
warning resulting in completely incorrect code can be built.  Since I
have to change the function name I am taking advantage of the situation
to let both register_net_sysctl and register_net_sysctl_table live for a
short time in parallel which makes clean conversion patches a bit easier
to read and write.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:21:15 -04:00
Eric Dumazet dfd56b8b38 net: use IS_ENABLED(CONFIG_IPV6)
Instead of testing defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-11 18:25:16 -05:00
Arun Sharma 60063497a9 atomic: use <linux/atomic.h>
This allows us to move duplicated code in <asm/atomic.h>
(atomic_inc_not_zero() for now) to <linux/atomic.h>

Signed-off-by: Arun Sharma <asharma@fb.com>
Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-26 16:49:47 -07:00
Thomas Graf 4e985adaa5 rtnl: provide link dump consistency info
This patch adds a change sequence counter to each net namespace
which is bumped whenever a netdevice is added or removed from
the list. If such a change occurred while a link dump took place,
the dump will have the NLM_F_DUMP_INTR flag set in the first
message which has been interrupted and in all subsequent messages
of the same dump.

Note that links may still be modified or renamed while a dump is
taking place but we can guarantee for userspace to receive a
complete list of links and not miss any.

Testing:
I have added 500 VLAN netdevices to make sure the dump is split
over multiple messages. Then while continuously dumping links in
one process I also continuously deleted and re-added a dummy
netdevice in another process. Multiple dumps per seconds have
had the NLM_F_DUMP_INTR flag set.

I guess we can wait for Johannes patch to hit net-next via the
wireless tree.  I just wanted to give this some testing right away.

Signed-off-by: Thomas Graf <tgraf@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-01 15:39:53 -07:00
Al Viro a685e08987 Delay struct net freeing while there's a sysfs instance refering to it
* new refcount in struct net, controlling actual freeing of the memory
	* new method in kobj_ns_type_operations (->drop_ns())
	* ->current_ns() semantics change - it's supposed to be followed by
corresponding ->drop_ns().  For struct net in case of CONFIG_NET_NS it bumps
the new refcount; net_drop_ns() decrements it and calls net_free() if the
last reference has been dropped.  Method renamed to ->grab_current_ns().
	* old net_free() callers call net_drop_ns() instead.
	* sysfs_exit_ns() is gone, along with a large part of callchain
leading to it; now that the references stored in ->ns[...] stay valid we
do not need to hunt them down and replace them with NULL.  That fixes
problems in sysfs_lookup() and sysfs_readdir(), along with getting rid
of sb->s_instances abuse.

	Note that struct net *shutdown* logics has not changed - net_cleanup()
is called exactly when it used to be called.  The only thing postponed by
having a sysfs instance refering to that struct net is actual freeing of
memory occupied by struct net.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-06-12 17:45:41 -04:00
Linus Torvalds 10799db60c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  net: Kill ratelimit.h dependency in linux/net.h
  net: Add linux/sysctl.h includes where needed.
  net: Kill ether_table[] declaration.
  inetpeer: fix race in unused_list manipulations
  atm: expose ATM device index in sysfs
  IPVS: bug in ip_vs_ftp, same list heaad used in all netns.
  bug.h: Move ratelimit warn interfaces to ratelimit.h
  bonding: cleanup module option descriptions
  net:8021q:vlan.c Fix pr_info to just give the vlan fullname and version.
  net: davinci_emac: fix dev_err use at probe
  can: convert to %pK for kptr_restrict support
  net: fix ETHTOOL_SFEATURES compatibility with old ethtool_ops.set_flags
  netfilter: Fix several warnings in compat_mtw_from_user().
  netfilter: ipset: fix ip_set_flush return code
  netfilter: ipset: remove unused variable from type_pf_tdel()
  netfilter: ipset: Use proper timeout value to jiffies conversion
2011-05-27 11:16:27 -07:00
David S. Miller bee95250f0 net: Add linux/sysctl.h includes where needed.
Several networking headers were depending upon the implicit
linux/sysctl.h include they get when including linux/net.h

Add explicit includes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-27 13:40:58 -04:00
Eric W. Biederman f063052947 net: Allow setting the network namespace by fd
Take advantage of the new abstraction and allow network devices
to be placed in any network namespace that we have a fd to talk
about.

Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Daniel Lezcano <daniel.lezcano@free.fr>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2011-05-10 14:36:03 -07:00
Julian Anastasov 2553d064ff ipvs: move struct netns_ipvs
Remove include/net/netns/ip_vs.h because it depends on
structures from include/net/ip_vs.h. As ipvs is pointer in
struct net it is better to move struct netns_ipvs into
include/net/ip_vs.h, so that we can easily use other structures
in struct netns_ipvs.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-03-15 09:36:50 +09:00
Hans Schillstrom 61b1ab4583 IPVS: netns, add basic init per netns.
Preparation for network name-space init, in this stage
some empty functions exists.

In most files there is a check if it is root ns i.e. init_net
if (!net_eq(net, &init_net))
        return ...
this will be removed by the last patch, when enabling name-space.

*v3
 ip_vs_conn.c merge error corrected.
 net_ipvs #ifdef removed as sugested by Jan Engelhardt

[ horms@verge.net.au: Removed whitespace-change-only hunks ]
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-01-13 10:30:26 +09:00
Eric Dumazet 1c87733d06 net_ns: add __rcu annotations
add __rcu annotation to (struct net)->gen, and use
rcu_dereference_protected() in net_assign_generic()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-25 14:18:27 -07:00
Eric Dumazet 8e602ce298 netns: reorder fields in struct net
In a network bench, I noticed an unfortunate false sharing between
'loopback_dev' and 'count' fields in "struct net".

'count' is written each time a socket is created or destroyed, while
loopback_dev might be often read in routing code.

Move loopback_dev in a read mostly section of "struct net"

Note: struct netns_xfrm is cache line aligned on SMP.
(It contains a "struct dst_ops")
Move it at the end to avoid holes, and reduce sizeof(struct net) by 128
bytes on ia32.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-17 13:49:14 -07:00
Alexey Dobriyan 8467005da3 nsproxy: remove INIT_NSPROXY()
Remove INIT_NSPROXY(), use C99 initializer.
Remove INIT_IPC_NS(), INIT_NET_NS() while I'm at it.

Note: headers trim will be done later, now it's quite pointless because
results will be invalidated by merge window.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-12 15:52:40 -08:00
Alexey Dobriyan cd8c20b650 netfilter: nfnetlink: netns support
Make nfnl socket per-petns.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-01-13 16:02:14 +01:00
Eric W. Biederman 72ad937abd net: Add support for batching network namespace cleanups
- Add exit_list to struct net to support building lists of network
  namespaces to cleanup.

- Add exit_batch to pernet_operations to allow running operations only
  once during a network namespace exit.  Instead of once per network
  namespace.

- Factor opt ops_exit_list and ops_exit_free so the logic with cleanup
  up a network namespace does not need to be duplicated.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-03 12:22:01 -08:00
Eric W. Biederman 65c0cfafce net: remove [un]register_pernet_gen_... and update the docs.
No that all of the callers have been updated to set fields in
struct pernet_operations, and simplified to let the network
namespace core handle the allocation and freeing of the storage
for them, remove the surpurpflous methods and update the docs
to the new style.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-01 16:16:00 -08:00
Eric W. Biederman f875bae065 net: Automatically allocate per namespace data.
To get the full benefit of batched network namespace cleanup netowrk
device deletion needs to be performed by the generic code.  When
using register_pernet_gen_device and freeing the data in exit_net
it is impossible to delay allocation until after exit_net has called
as the device uninit methods are no longer safe.

To correct this, and to simplify working with per network namespace data
I have moved allocation and deletion of per network namespace data into
the network namespace core.  The core now frees the data only after
all of the network namespace exit routines have run.

Now it is only required to set the new fields .id and .size
in the pernet_operations structure if you want network namespace
data to be managed for you automatically.

This makes the current register_pernet_gen_device and
register_pernet_gen_subsys routines unnecessary.  For the moment
I have left them as compatibility wrappers in net_namespace.h
They will be removed once all of the users have been updated.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-01 16:15:51 -08:00
Eric W. Biederman 2b035b3997 net: Batch network namespace destruction.
It is fairly common to kill several network namespaces at once.  Either
because they are nested one inside the other or because they are cooperating
in multiple machine networking experiments.  As the network stack control logic
does not parallelize easily batch up multiple network namespaces existing
together.

To get the full benefit of batching the virtual network devices to be
removed must be all removed in one batch.  For that purpose I have added
a loop after the last network device operations have run that batches
up all remaining network devices and deletes them.

An extra benefit is that the reorganization slightly shrinks the size
of the per network namespace data structures replaceing a work_struct
with a list_head.

In a trivial test with 4K namespaces this change reduced the cost of
a destroying 4K namespaces from 7+ minutes (at 12% cpu) to 44 seconds
(at 60% cpu).  The bulk of that 44s was spent in inet_twsk_purge.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-01 16:15:51 -08:00
Eric Dumazet 7c28bd0b8e rtnetlink: speedup rtnl_dump_ifinfo()
When handling large number of netdevice, rtnl_dump_ifinfo()
is very slow because it has O(N^2) complexity.

Instead of scanning one single list, we can use the 256 sub lists
of the dev_index hash table.

This considerably speedups "ip link" operations

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-24 06:13:17 -07:00
Johannes Berg 3d23e349d8 wext: refactor
Refactor wext to
 * split out iwpriv handling
 * split out iwspy handling
 * split out procfs support
 * allow cfg80211 to have wireless extensions compat code
   w/o CONFIG_WIRELESS_EXT

After this, drivers need to
 - select WIRELESS_EXT	- for wext support
 - select WEXT_PRIV	- for iwpriv support
 - select WEXT_SPY	- for iwspy support

except cfg80211 -- which gets new hooks in wext-core.c
and can then get wext handlers without CONFIG_WIRELESS_EXT.

Wireless extensions procfs support is auto-selected
based on PROC_FS and anything that requires the wext core
(i.e. WIRELESS_EXT or CFG80211_WEXT).

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-10-07 16:39:43 -04:00
Johannes Berg 4edf547b4d net: explain netns notifiers a little better
Eric explained this to me -- and afterwards the comment
made sense, but not before. Add the the critical point
about interfaces having to be gone from the netns before
subsys notifiers are called.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-20 08:03:00 -07:00
Johannes Berg b333b3d228 wireless extensions: make netns aware
This makes wireless extensions netns aware. The
tasklet sending the events is converted to a work
struct so that we can rtnl_lock() in it.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-15 08:53:32 -07:00
Johannes Berg 30ffee8480 net: move and export get_net_ns_by_pid
The function get_net_ns_by_pid(), to get a network
namespace from a pid_t, will be required in cfg80211
as well. Therefore, let's move it to net_namespace.c
and export it. We can't make it a static inline in
the !NETNS case because it needs to verify that the
given pid even exists (and return -ESRCH).

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-12 14:03:28 -07:00
Johannes Berg 134e63756d genetlink: make netns aware
This makes generic netlink network namespace aware. No
generic netlink families except for the controller family
are made namespace aware, they need to be checked one by
one and then set the family->netnsok member to true.

A new function genlmsg_multicast_netns() is introduced to
allow sending a multicast message in a given namespace,
for example when it applies to an object that lives in
that namespace, a new function genlmsg_multicast_allns()
to send a message to all network namespaces (for objects
that do not have an associated netns).

The function genlmsg_multicast() is changed to multicast
the message in just init_net, which is currently correct
for all generic netlink families since they only work in
init_net right now. Some will later want to work in all
net namespaces because they do not care about the netns
at all -- those will have to be converted to use one of
the new functions genlmsg_multicast_allns() or
genlmsg_multicast_netns() whenever they are made netns
aware in some way.

After this patch families can easily decide whether or
not they should be available in all net namespaces. Many
genl families us it for objects not related to networking
and should therefore be available in all namespaces, but
that will have to be done on a per family basis.

Note that this doesn't touch on the checkpoint/restart
problem where network namespaces could be used, genl
families and multicast groups are numbered globally and
I see no easy way of changing that, especially since it
must be possible to multicast to all network namespaces
for those families that do not care about netns.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-12 14:03:27 -07:00
Johannes Berg 11a28d373e net: make namespace iteration possible under RCU
All we need to take care of is using proper RCU list
add/del primitives and inserting a synchronize_rcu()
at one place to make sure the exit notifiers are run
after everybody has stopped iterating the list.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-12 14:03:25 -07:00
Eric W. Biederman 17edde5209 netns: Remove net_alive
It turns out that net_alive is unnecessary, and the original problem
that led to it being added was simply that the icmp code thought
it was a network device and wound up being unable to handle packets
while there were still packets in the network namespace.

Now that icmp and tcp have been fixed to properly register themselves
this problem is no longer present and we have a stronger guarantee
that packets will not arrive in a network namespace then that provided
by net_alive in netif_receive_skb.  So remove net_alive allowing
packet reception run a little faster.

Additionally document the strong reason why network namespace cleanup
is safe so that if something happens again someone else will have
a chance of figuring it out.

Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-03 01:14:27 -08:00
Alexey Dobriyan d62ddc21b6 netns xfrm: add netns boilerplate
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-25 17:14:31 -08:00
Eric Dumazet 8f424b5f32 net: Introduce read_pnet() and write_pnet() helpers
This patch introduces two helpers that deal with reading and writing
struct net pointers in various network structures.

Their implementation depends on CONFIG_NET_NS

For symmetry, both functions work with "struct net **pnet".

Their usage should reduce the number of #ifdef CONFIG_NET_NS,
without adding many helpers for each network structure
that hold a "struct net *pointer"

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-12 00:53:30 -08:00
Alexey Dobriyan 485ac57bc1 netns: add register_pernet_gen_subsys/unregister_pernet_gen_subsys
netns ops which are registered with register_pernet_gen_device() are
shutdown strictly before those which are registered with
register_pernet_subsys(). Sometimes this leads to opposite (read: buggy)
shutdown ordering between two modules.

Add register_pernet_gen_subsys()/unregister_pernet_gen_subsys() for modules
which aren't elite enough for entry in struct net, and which can't use
register_pernet_gen_device(). PPTP conntracking module is such one.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-30 23:55:16 -07:00
Alexey Dobriyan dfdb8d7918 netfilter: netns nf_conntrack: add netns boilerplate
One comment: #ifdefs around #include is necessary to overcome amazing compile
breakages in NOTRACK-in-netns patch (see below).

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-10-08 11:35:02 +02:00