Commit Graph

222 Commits

Author SHA1 Message Date
Eric Dumazet b01a8531d0 netns: provide pure entropy for net_hash_mix()
[ Upstream commit 355b98553789b646ed97ad801a619ff898471b92 ]

net_hash_mix() currently uses kernel address of a struct net,
and is used in many places that could be used to reveal this
address to a patient attacker, thus defeating KASLR, for
the typical case (initial net namespace, &init_net is
not dynamically allocated)

I believe the original implementation tried to avoid spending
too many cycles in this function, but security comes first.

Also provide entropy regardless of CONFIG_NET_NS.

Fixes: 0b4419162a ("netns: introduce the net_hash_mix "salt" for hashes")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Amit Klein <aksecurity@gmail.com>
Reported-by: Benny Pinkas <benny@pinkas.net>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-27 22:10:05 +02:00
Eric Dumazet f553bcd528 netfilter: ipv6: nf_defrag: reduce struct net memory waste
commit 9ce7bc036ae4cfe3393232c86e9e1fea2153c237 upstream.

It is a waste of memory to use a full "struct netns_sysctl_ipv6"
while only one pointer is really used, considering netns_sysctl_ipv6
keeps growing.

Also, since "struct netns_frags" has cache line alignment,
it is better to move the frags_hdr pointer outside, otherwise
we spend a full cache line for this pointer.

This saves 192 bytes of memory per netns.

Fixes: c038a767cd ("ipv6: add a new namespace for nf_conntrack_reasm")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:52:53 +02:00
Luca Stefani 82b37d9f2f Merge remote-tracking branch 'f2fs/linux-3.10.y' into HEAD
Change-Id: Ic2fe24529f029909ddd96490bd6d885d60f88be2
2017-04-18 17:02:28 +02:00
Marcelo Ricardo Leitner 7bf24986e3 sctp: fix ASCONF list handling
commit 2d45a02d0166caf2627fe91897c6ffc3b19514c4 upstream.

->auto_asconf_splist is per namespace and mangled by functions like
sctp_setsockopt_auto_asconf() which doesn't guarantee any serialization.

Also, the call to inet_sk_copy_descendant() was backuping
->auto_asconf_list through the copy but was not honoring
->do_auto_asconf, which could lead to list corruption if it was
different between both sockets.

This commit thus fixes the list handling by using ->addr_wq_lock
spinlock to protect the list. A special handling is done upon socket
creation and destruction for that. Error handlig on sctp_init_sock()
will never return an error after having initialized asconf, so
sctp_destroy_sock() can be called without addrq_wq_lock. The lock now
will be take on sctp_close_sock(), before locking the socket, so we
don't do it in inverse order compared to sctp_addr_wq_timeout_handler().

Instead of taking the lock on sctp_sock_migrate() for copying and
restoring the list values, it's preferred to avoid rewritting it by
implementing sctp_copy_descendant().

Issue was found with a test application that kept flipping sysctl
default_auto_asconf on and off, but one could trigger it by issuing
simultaneous setsockopt() calls on multiple sockets or by
creating/destroying sockets fast enough. This is only triggerable
locally.

Fixes: 9f7d653b67 ("sctp: Add Auto-ASCONF support (core).")
Reported-by: Ji Jianwen <jiji@redhat.com>
Suggested-by: Neil Horman <nhorman@tuxdriver.com>
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[wangkai: backport to 3.10: adjust context]
Signed-off-by: Wang Kai <morgan.wang@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-01 12:07:34 +02:00
Ian Maund 068b0551a9 This is the 3.10.73 stable release
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJVFBE+AAoJEDjbvchgkmk+oTkP/j2ipSvgXghFEipZbOJUQkqC
 fa8elfoF7riTKpKOuDtDU2WI1ttCGYs5gmTNpd4KaEt23eJOQgVqIpV8GhAkW5Af
 NVyGhjF3dXNqpBkxnyuIkk5OLrNKGRNS2xpz1U254iGObYrK+tr62IzGPxEcPAhX
 Y+58xPVSjLtNdTJW3YLT3DohUbnbHG6Br9geI1IHtlxg1oDiTxtnX2FmOFzzDpP5
 qu8gnPIekg/+1EE46nEiq0C59AwC3aCzNxwlYe1Kd41SY3LUFF1eZMzmOnnwyI5K
 3FslAzT6x/sOmGJFTYrKjFA4GKsW67xHVkB/hp/Mu768RqxiQCxV4kgmPsAFLbXb
 D5qbNwr3i0iQ/9AaD7h8HJkxC/KHmszMux00L/mgZ3SGdGMEIBxHg+oP8+nP8V6C
 WfXKSWA94dpdRyULEfWdnKnUnp2860C7kt7ASTkOl8rIgU8HgaRqeu+U/KPM2ovD
 ZJtXPVB5UXCRuVAhZwbvvrLOY8UMZTnv2auAaeLYG8YptcvGeN5Z398/8qdV/z7c
 A9kOsgebs74X+lR3rbVgSDPQaq2AEiuIvtX77SfmrWXBXGmc99i9+PikuFggRprz
 cJm5bCM9DaHu/3b77X9Fwl7vnpReB0zPHiwTdH/p7OPMf5m1uQt7SqegC6btLPHs
 iYgjLd4oW+6uiV/2X1Vx
 =L+mC
 -----END PGP SIGNATURE-----

Merge commit 'v3.10.73' into msm-3.10

This merge brings us up to date with upstream kernel.org tag v3.10.73.
As part of the conflict resolution, changes introduced by commit 72684eae7
("arm64: Fix up /proc/cpuinfo") have been intentionally dropped, as they
conflict with Android changes msm-3.10 kernel to solve the problems
in a different way. Since userspace readers of this file may depend on
the existing msm-3.10 implementation, it's left as-is for now. The
commit may later be introduced if it is found to not impact userspaces
paired with this kernel.

* commit 'v3.10.73' (264 commits):
  Linux 3.10.73
  target: Allow Write Exclusive non-reservation holders to READ
  target: Allow AllRegistrants to re-RESERVE existing reservation
  target: Fix R_HOLDER bit usage for AllRegistrants
  target/pscsi: Fix NULL pointer dereference in get_device_type
  iscsi-target: Avoid early conn_logout_comp for iser connections
  target: Fix reference leak in target_get_sess_cmd() error path
  ARM: at91: pm: fix at91rm9200 standby
  ipvs: rerouting to local clients is not needed anymore
  ipvs: add missing ip_vs_pe_put in sync code
  powerpc/smp: Wait until secondaries are active & online
  x86/vdso: Fix the build on GCC5
  x86/fpu: Drop_fpu() should not assume that tsk equals current
  x86/fpu: Avoid math_state_restore() without used_math() in __restore_xstate_sig()
  crypto: aesni - fix memory usage in GCM decryption
  libsas: Fix Kernel Crash in smp_execute_task
  xen-pciback: limit guest control of command register
  nilfs2: fix deadlock of segment constructor during recovery
  regulator: core: Fix enable GPIO reference counting
  regulator: Only enable disabled regulators on resume
  ALSA: hda - Treat stereo-to-mono mix properly
  ALSA: hda - Add workaround for MacBook Air 5,2 built-in mic
  ALSA: hda - Set single_adc_amp flag for CS420x codecs
  ALSA: hda - Don't access stereo amps for mono channel widgets
  ALSA: hda - Fix built-in mic on Compaq Presario CQ60
  ALSA: control: Add sanity checks for user ctl id name string
  spi: pl022: Fix race in giveback() leading to driver lock-up
  tpm/ibmvtpm: Additional LE support for tpm_ibmvtpm_send
  workqueue: fix hang involving racing cancel[_delayed]_work_sync()'s for PREEMPT_NONE
  can: add missing initialisations in CAN related skbuffs
  Change email address for 8250_pci
  virtio_console: init work unconditionally
  fuse: notify: don't move pages
  fuse: set stolen page uptodate
  drm/radeon: drop setting UPLL to sleep mode
  drm/radeon: do a posting read in rs600_set_irq
  drm/radeon: do a posting read in si_set_irq
  drm/radeon: do a posting read in r600_set_irq
  drm/radeon: do a posting read in r100_set_irq
  drm/radeon: do a posting read in evergreen_set_irq
  drm/radeon: fix DRM_IOCTL_RADEON_CS oops
  tcp: make connect() mem charging friendly
  net: compat: Update get_compat_msghdr() to match copy_msghdr_from_user() behaviour
  tcp: fix tcp fin memory accounting
  Revert "net: cx82310_eth: use common match macro"
  rxrpc: bogus MSG_PEEK test in rxrpc_recvmsg()
  caif: fix MSG_OOB test in caif_seqpkt_recvmsg()
  inet_diag: fix possible overflow in inet_diag_dump_one_icsk()
  rds: avoid potential stack overflow
  net: sysctl_net_core: check SNDBUF and RCVBUF for min length
  sparc64: Fix several bugs in memmove().
  sparc: Touch NMI watchdog when walking cpus and calling printk
  sparc: perf: Make counting mode actually work
  sparc: perf: Remove redundant perf_pmu_{en|dis}able calls
  sparc: semtimedop() unreachable due to comparison error
  sparc32: destroy_context() and switch_mm() needs to disable interrupts.
  Linux 3.10.72
  ath5k: fix spontaneus AR5312 freezes
  ACPI / video: Load the module even if ACPI is disabled
  drm/radeon: fix 1 RB harvest config setup for TN/RL
  Drivers: hv: vmbus: incorrect device name is printed when child device is unregistered
  HID: fixup the conflicting keyboard mappings quirk
  HID: input: fix confusion on conflicting mappings
  staging: comedi: cb_pcidas64: fix incorrect AI range code handling
  dm snapshot: fix a possible invalid memory access on unload
  dm: fix a race condition in dm_get_md
  dm io: reject unsupported DISCARD requests with EOPNOTSUPP
  dm mirror: do not degrade the mirror on discard error
  staging: comedi: comedi_compat32.c: fix COMEDI_CMD copy back
  clk: sunxi: Support factor clocks with N factor starting not from 0
  fixed invalid assignment of 64bit mask to host dma_boundary for scatter gather segment boundary limit.
  nilfs2: fix potential memory overrun on inode
  IB/qib: Do not write EEPROM
  sg: fix read() error reporting
  ALSA: hda - Add pin configs for ASUS mobo with IDT 92HD73XX codec
  ALSA: pcm: Don't leave PREPARED state after draining
  tty: fix up atime/mtime mess, take four
  sunrpc: fix braino in ->poll()
  procfs: fix race between symlink removals and traversals
  debugfs: leave freeing a symlink body until inode eviction
  autofs4 copy_dev_ioctl(): keep the value of ->size we'd used for allocation
  USB: serial: fix potential use-after-free after failed probe
  TTY: fix tty_wait_until_sent on 64-bit machines
  USB: serial: fix infinite wait_until_sent timeout
  net: irda: fix wait_until_sent poll timeout
  xhci: fix reporting of 0-sized URBs in control endpoint
  xhci: Allocate correct amount of scratchpad buffers
  usb: ftdi_sio: Add jtag quirk support for Cyber Cortex AV boards
  USB: usbfs: don't leak kernel data in siginfo
  USB: serial: cp210x: Adding Seletek device id's
  KVM: MIPS: Fix trace event to save PC directly
  KVM: emulate: fix CMPXCHG8B on 32-bit hosts
  Btrfs:__add_inode_ref: out of bounds memory read when looking for extended ref.
  Btrfs: fix data loss in the fast fsync path
  btrfs: fix lost return value due to variable shadowing
  iio: imu: adis16400: Fix sign extension
  x86/asm/entry/64: Remove a bogus 'ret_from_fork' optimization
  PM / QoS: remove duplicate call to pm_qos_update_target
  target: Check for LBA + sectors wrap-around in sbc_parse_cdb
  mm/memory.c: actually remap enough memory
  mm/compaction: fix wrong order check in compact_finished()
  mm/nommu.c: fix arithmetic overflow in __vm_enough_memory()
  mm/mmap.c: fix arithmetic overflow in __vm_enough_memory()
  mm/hugetlb: add migration entry check in __unmap_hugepage_range
  team: don't traverse port list using rcu in team_set_mac_address
  udp: only allow UFO for packets from SOCK_DGRAM sockets
  usb: plusb: Add support for National Instruments host-to-host cable
  macvtap: make sure neighbour code can push ethernet header
  net: compat: Ignore MSG_CMSG_COMPAT in compat_sys_{send, recv}msg
  team: fix possible null pointer dereference in team_handle_frame
  net: reject creation of netdev names with colons
  ematch: Fix auto-loading of ematch modules.
  net: phy: Fix verification of EEE support in phy_init_eee
  ipv4: ip_check_defrag should not assume that skb_network_offset is zero
  ipv4: ip_check_defrag should correctly check return value of skb_copy_bits
  gen_stats.c: Duplicate xstats buffer for later use
  rtnetlink: call ->dellink on failure when ->newlink exists
  ipv6: fix ipv6_cow_metrics for non DST_HOST case
  rtnetlink: ifla_vf_policy: fix misuses of NLA_BINARY
  Linux 3.10.71
  libceph: fix double __remove_osd() problem
  libceph: change from BUG to WARN for __remove_osd() asserts
  libceph: assert both regular and lingering lists in __remove_osd()
  MIPS: Export FP functions used by lose_fpu(1) for KVM
  x86, mm/ASLR: Fix stack randomization on 64-bit systems
  blk-throttle: check stats_cpu before reading it from sysfs
  jffs2: fix handling of corrupted summary length
  md/raid1: fix read balance when a drive is write-mostly.
  md/raid5: Fix livelock when array is both resyncing and degraded.
  metag: Fix KSTK_EIP() and KSTK_ESP() macros
  gpio: tps65912: fix wrong container_of arguments
  arm64: compat Fix siginfo_t -> compat_siginfo_t conversion on big endian
  hx4700: regulator: declare full constraints
  KVM: x86: update masterclock values on TSC writes
  KVM: MIPS: Don't leak FPU/DSP to guest
  ARC: fix page address calculation if PAGE_OFFSET != LINUX_LINK_BASE
  ntp: Fixup adjtimex freq validation on 32-bit systems
  kdb: fix incorrect counts in KDB summary command output
  ARM: pxa: add regulator_has_full_constraints to poodle board file
  ARM: pxa: add regulator_has_full_constraints to corgi board file
  vt: provide notifications on selection changes
  usb: core: buffer: smallest buffer should start at ARCH_DMA_MINALIGN
  USB: fix use-after-free bug in usb_hcd_unlink_urb()
  USB: cp210x: add ID for RUGGEDCOM USB Serial Console
  tty: Prevent untrappable signals from malicious program
  axonram: Fix bug in direct_access
  cfq-iosched: fix incorrect filing of rt async cfqq
  cfq-iosched: handle failure of cfq group allocation
  iscsi-target: Drop problematic active_ts_list usage
  NFSv4.1: Fix a kfree() of uninitialised pointers in decode_cb_sequence_args
  Added Little Endian support to vtpm module
  tpm/tpm_i2c_stm_st33: Fix potential bug in tpm_stm_i2c_send
  tpm: Fix NULL return in tpm_ibmvtpm_get_desired_dma
  tpm_tis: verify interrupt during init
  ARM: 8284/1: sa1100: clear RCSR_SMR on resume
  tracing: Fix unmapping loop in tracing_mark_write
  MIPS: KVM: Deliver guest interrupts after local_irq_disable()
  nfs: don't call blocking operations while !TASK_RUNNING
  mmc: sdhci-pxav3: fix setting of pdata->clk_delay_cycles
  power_supply: 88pm860x: Fix leaked power supply on probe fail
  ALSA: hdspm - Constrain periods to 2 on older cards
  ALSA: off by one bug in snd_riptide_joystick_probe()
  lmedm04: Fix usb_submit_urb BOGUS urb xfer, pipe 1 != type 3 in interrupt urb
  cpufreq: speedstep-smi: enable interrupts when waiting
  PCI: Fix infinite loop with ROM image of size 0
  PCI: Generate uppercase hex for modalias var in uevent
  HID: i2c-hid: Limit reads to wMaxInputLength bytes for input events
  iwlwifi: mvm: always use mac color zero
  iwlwifi: mvm: fix failure path when power_update fails in add_interface
  iwlwifi: mvm: validate tid and sta_id in ba_notif
  iwlwifi: pcie: disable the SCD_BASE_ADDR when we resume from WoWLAN
  fsnotify: fix handling of renames in audit
  xfs: set superblock buffer type correctly
  xfs: inode unlink does not set AGI buffer type
  xfs: ensure buffer types are set correctly
  Bluetooth: ath3k: workaround the compatibility issue with xHCI controller
  Linux 3.10.70
  rbd: drop an unsafe assertion
  media/rc: Send sync space information on the lirc device
  net: sctp: fix passing wrong parameter header to param_type2af in sctp_process_param
  ppp: deflate: never return len larger than output buffer
  ipv4: tcp: get rid of ugly unicast_sock
  tcp: ipv4: initialize unicast_sock sk_pacing_rate
  bridge: dont send notification when skb->len == 0 in rtnl_bridge_notify
  ipv6: replacing a rt6_info needs to purge possible propagated rt6_infos too
  ping: Fix race in free in receive path
  udp_diag: Fix socket skipping within chain
  ipv4: try to cache dst_entries which would cause a redirect
  net: sctp: fix slab corruption from use after free on INIT collisions
  netxen: fix netxen_nic_poll() logic
  ipv6: stop sending PTB packets for MTU < 1280
  net: rps: fix cpu unplug
  ip: zero sockaddr returned on error queue
  Linux 3.10.69
  crypto: crc32c - add missing crypto module alias
  x86,kvm,vmx: Preserve CR4 across VM entry
  kvm: vmx: handle invvpid vm exit gracefully
  smpboot: Add missing get_online_cpus() in smpboot_register_percpu_thread()
  ALSA: ak411x: Fix stall in work callback
  ASoC: sgtl5000: add delay before first I2C access
  ASoC: atmel_ssc_dai: fix start event for I2S mode
  lib/checksum.c: fix build for generic csum_tcpudp_nofold
  ext4: prevent bugon on race between write/fcntl
  arm64: Fix up /proc/cpuinfo
  nilfs2: fix deadlock of segment constructor over I_SYNC flag
  lib/checksum.c: fix carry in csum_tcpudp_nofold
  mm: pagewalk: call pte_hole() for VM_PFNMAP during walk_page_range
  MIPS: Fix kernel lockup or crash after CPU offline/online
  MIPS: IRQ: Fix disable_irq on CPU IRQs
  PCI: Add NEC variants to Stratus ftServer PCIe DMI check
  gpio: sysfs: fix memory leak in gpiod_sysfs_set_active_low
  gpio: sysfs: fix memory leak in gpiod_export_link
  Linux 3.10.68
  target: Drop arbitrary maximum I/O size limit
  iser-target: Fix implicit termination of connections
  iser-target: Handle ADDR_CHANGE event for listener cm_id
  iser-target: Fix connected_handler + teardown flow race
  iser-target: Parallelize CM connection establishment
  iser-target: Fix flush + disconnect completion handling
  iscsi,iser-target: Initiate termination only once
  vhost-scsi: Add missing virtio-scsi -> TCM attribute conversion
  tcm_loop: Fix wrong I_T nexus association
  vhost-scsi: Take configfs group dependency during VHOST_SCSI_SET_ENDPOINT
  ib_isert: Add max_send_sge=2 minimum for control PDU responses
  IB/isert: Adjust CQ size to HW limits
  workqueue: fix subtle pool management issue which can stall whole worker_pool
  gpio: squelch a compiler warning
  efi-pstore: Make efi-pstore return a unique id
  pstore/ram: avoid atomic accesses for ioremapped regions
  pstore: Fix NULL pointer fault if get NULL prz in ramoops_get_next_prz
  pstore: skip zero size persistent ram buffer in traverse
  pstore: clarify clearing of _read_cnt in ramoops_context
  pstore: d_alloc_name() doesn't return an ERR_PTR
  pstore: Fail to unlink if a driver has not defined pstore_erase
  ARM: 8109/1: mm: Modify pte_write and pmd_write logic for LPAE
  ARM: 8108/1: mm: Introduce {pte,pmd}_isset and {pte,pmd}_isclear
  ARM: DMA: ensure that old section mappings are flushed from the TLB
  ARM: 7931/1: Correct virt_addr_valid
  ARM: fix asm/memory.h build error
  ARM: 7867/1: include: asm: use 'int' instead of 'unsigned long' for 'oldval' in atomic_cmpxchg().
  ARM: 7866/1: include: asm: use 'long long' instead of 'u64' within atomic.h
  ARM: lpae: fix definition of PTE_HWTABLE_PTRS
  ARM: fix type of PHYS_PFN_OFFSET to unsigned long
  ARM: LPAE: use phys_addr_t in alloc_init_pud()
  ARM: LPAE: use signed arithmetic for mask definitions
  ARM: mm: correct pte_same behaviour for LPAE.
  ARM: 7829/1: Add ".text.unlikely" and ".text.hot" to arm unwind tables
  drivers: net: cpsw: discard dual emac default vlan configuration
  regulator: core: fix race condition in regulator_put()
  spi/pxa2xx: Clear cur_chip pointer before starting next message
  dm cache: fix missing ERR_PTR returns and handling
  dm thin: don't allow messages to be sent to a pool target in READ_ONLY or FAIL mode
  nl80211: fix per-station group key get/del and memory leak
  NFSv4.1: Fix an Oops in nfs41_walk_client_list
  nfs: fix dio deadlock when O_DIRECT flag is flipped
  Input: i8042 - add noloop quirk for Medion Akoya E7225 (MD98857)
  ALSA: seq-dummy: remove deadlock-causing events on close
  powerpc/xmon: Fix another endiannes issue in RTAS call from xmon
  can: kvaser_usb: Fix state handling upon BUS_ERROR events
  can: kvaser_usb: Retry the first bulk transfer on -ETIMEDOUT
  can: kvaser_usb: Send correct context to URB completion
  can: kvaser_usb: Do not sleep in atomic context
  ASoC: wm8960: Fix capture sample rate from 11250 to 11025
  spi: dw-mid: fix FIFO size

Signed-off-by: Ian Maund <imaund@codeaurora.org>
2015-04-24 18:14:57 -07:00
Eric Dumazet 6bed3166d0 ipv4: tcp: get rid of ugly unicast_sock
[ Upstream commit bdbbb8527b6f6a358dbcb70dac247034d665b8e4 ]

In commit be9f4a44e7 ("ipv4: tcp: remove per net tcp_sock")
I tried to address contention on a socket lock, but the solution
I chose was horrible :

commit 3a7c384ffd ("ipv4: tcp: unicast_sock should not land outside
of TCP stack") addressed a selinux regression.

commit 0980e56e50 ("ipv4: tcp: set unicast_sock uc_ttl to -1")
took care of another regression.

commit b5ec8eeac4 ("ipv4: fix ip_send_skb()") fixed another regression.

commit 811230cd85 ("tcp: ipv4: initialize unicast_sock sk_pacing_rate")
was another shot in the dark.

Really, just use a proper socket per cpu, and remove the skb_orphan()
call, to re-enable flow control.

This solves a serious problem with FQ packet scheduler when used in
hostile environments, as we do not want to allocate a flow structure
for every RST packet sent in response to a spoofed packet.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-02-26 17:48:48 -08:00
Lorenzo Colitti c5f40c905b net: support marking accepting TCP sockets
When using mark-based routing, sockets returned from accept()
may need to be marked differently depending on the incoming
connection request.

This is the case, for example, if different socket marks identify
different networks: a listening socket may want to accept
connections from all networks, but each connection should be
marked with the network that the request came in on, so that
subsequent packets are sent on the correct network.

This patch adds a sysctl to mark TCP sockets based on the fwmark
of the incoming SYN packet. If enabled, and an unmarked socket
receives a SYN, then the SYN packet's fwmark is written to the
connection's inet_request_sock, and later written back to the
accepted socket when the connection is established.  If the
socket already has a nonzero mark, then the behaviour is the same
as it is today, i.e., the listening socket's fwmark is used.

Black-box tested using user-mode linux:

- IPv4/IPv6 SYN+ACK, FIN, etc. packets are routed based on the
  mark of the incoming SYN packet.
- The socket returned by accept() is marked with the mark of the
  incoming SYN packet.
- Tested with syncookies=1 and syncookies=2.

Change-Id: I26bc1eceefd2c588d73b921865ab70e4645ade57
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Git-commit: 6ba3a0e3b112bdb47858e97aa763706ba26ca5ea
Git-repo: https://android.googlesource.com/kernel/common.git
Signed-off-by: Ian Maund <imaund@codeaurora.org>
2014-06-23 15:21:22 -07:00
Lorenzo Colitti 4ecfac314b net: add a sysctl to reflect the fwmark on replies
Kernel-originated IP packets that have no user socket associated
with them (e.g., ICMP errors and echo replies, TCP RSTs, etc.)
are emitted with a mark of zero. Add a sysctl to make them have
the same mark as the packet they are replying to.

This allows an administrator that wishes to do so to use
mark-based routing, firewalling, etc. for these replies by
marking the original packets inbound.

Tested using user-mode linux:
 - ICMP/ICMPv6 echo replies and errors.
 - TCP RST packets (IPv4 and IPv6).

Change-Id: I6873d973196797bcf32e2e91976df647c7e8b85a
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Git-commit: 5a87fa6a43733e241406e8d62fe28fdc0735bf93
Git-repo: https://android.googlesource.com/kernel/common.git
[imaund@codeaurora.org: Resolve trivial merge conflicts]
Signed-off-by: Ian Maund <imaund@codeaurora.org>
2014-06-23 15:20:28 -07:00
Gao feng 30e0c6a6be netfilter: nf_log: prepare net namespace support for loggers
This patch adds netns support to nf_log and it prepares netns
support for existing loggers. It is composed of four major
changes.

1) nf_log_register has been split to two functions: nf_log_register
   and nf_log_set. The new nf_log_register is used to globally
   register the nf_logger and nf_log_set is used for enabling
   pernet support from nf_loggers.

   Per netns is not yet complete after this patch, it comes in
   separate follow up patches.

2) Add net as a parameter of nf_log_bind_pf. Per netns is not
   yet complete after this patch, it only allows to bind the
   nf_logger to the protocol family from init_net and it skips
   other cases.

3) Adapt all nf_log_packet callers to pass netns as parameter.
   After this patch, this function only works for init_net.

4) Make the sysctl net/netfilter/nf_log pernet.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-05 20:12:54 +02:00
Gao feng f3c1a44a22 netfilter: make /proc/net/netfilter pernet
This patch makes this proc dentry pernet. So far only init_net
had a /proc/net/netfilter directory.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-05 19:35:02 +02:00
Nicolas Dichtel 63998ac24f ipv6: provide addr and netconf dump consistency info
This patch adds a dev_addr_genid for IPv6. The goal is to use it, combined with
dev_base_seq to check if a change occurs during a netlink dump.
If a change is detected, the flag NLM_F_DUMP_INTR is set in the first message
after the dump was interrupted.

Note that only dump of unicast addresses is checked (multicast and anycast are
not checked).

Reported-by: Junwei Zhang <junwei.zhang@6wind.com>
Reported-by: Hongjun Li <hongjun.li@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-24 17:16:29 -04:00
Michal Kubecek 8d068875ca xfrm: make gc_thresh configurable in all namespaces
The xfrm gc threshold can be configured via xfrm{4,6}_gc_thresh
sysctl but currently only in init_net, other namespaces always
use the default value. This can substantially limit the number
of IPsec tunnels that can be effectively used.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-02-06 11:36:29 +01:00
Florian Westphal c539f01717 netfilter: add connlabel conntrack extension
similar to connmarks, except labels are bit-based; i.e.
all labels may be attached to a flow at the same time.

Up to 128 labels are supported.  Supporting more labels
is possible, but requires increasing the ct offset delta
from u8 to u16 type due to increased extension sizes.

Mapping of bit-identifier to label name is done in userspace.

The extension is enabled at run-time once "-m connlabel" netfilter
rules are added.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-18 00:28:15 +01:00
David S. Miller 4b87f92259 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	Documentation/networking/ip-sysctl.txt
	drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c

Both conflicts were simply overlapping context.

A build fix for qlcnic is in here too, simply removing the added
devinit annotations which no longer exist.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-15 15:05:59 -05:00
Hannes Frederic Sowa 5d134f1c1f tcp: make sysctl_tcp_ecn namespace aware
As per suggestion from Eric Dumazet this patch makes tcp_ecn sysctl
namespace aware.  The reason behind this patch is to ease the testing
of ecn problems on the internet and allows applications to tune their
own use of ecn.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-06 21:09:56 -08:00
Pablo Neira Ayuso 10db9069eb netfilter: xt_CT: recover NOTRACK target support
Florian Westphal reported that the removal of the NOTRACK target
(9655050 netfilter: remove xt_NOTRACK) is breaking some existing
setups.

That removal was scheduled for removal since long time ago as
described in Documentation/feature-removal-schedule.txt

What:  xt_NOTRACK
Files: net/netfilter/xt_NOTRACK.c
When:  April 2011
Why:   Superseded by xt_CT

Still, people may have not notice / may have decided to stick to an
old iptables version. I agree with him in that some more conservative
approach by spotting some printk to warn users for some time is less
agressive.

Current iptables 1.4.16.3 already contains the aliasing support
that makes it point to the CT target, so upgrading would fix it.
Still, the policy so far has been to avoid pushing our users to
upgrade.

As a solution, this patch recovers the NOTRACK target inside the CT
target and it now spots a warning.

Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-12-24 12:55:09 +01:00
Pablo Neira Ayuso 252b3e8c1b netfilter: xt_CT: fix crash while destroy ct templates
In (d871bef netfilter: ctnetlink: dump entries from the dying and
unconfirmed lists), we assume that all conntrack objects are
inserted in any of the existing lists. However, template conntrack
objects were not. This results in hitting BUG_ON in the
destroy_conntrack path while removing a rule that uses the CT target.

This patch fixes the situation by adding the template lists, which
is where template conntrack objects reside now.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-12-16 23:44:12 +01:00
Neil Horman 3c68198e75 sctp: Make hmac algorithm selection for cookie generation dynamic
Currently sctp allows for the optional use of md5 of sha1 hmac algorithms to
generate cookie values when establishing new connections via two build time
config options.  Theres no real reason to make this a static selection.  We can
add a sysctl that allows for the dynamic selection of these algorithms at run
time, with the default value determined by the corresponding crypto library
availability.
This comes in handy when, for example running a system in FIPS mode, where use
of md5 is disallowed, but SHA1 is permitted.

Note: This new sysctl has no corresponding socket option to select the cookie
hmac algorithm.  I chose not to implement that intentionally, as RFC 6458
contains no option for this value, and I opted not to pollute the socket option
namespace.

Change notes:
v2)
	* Updated subject to have the proper sctp prefix as per Dave M.
	* Replaced deafult selection options with new options that allow
	  developers to explicitly select available hmac algs at build time
	  as per suggestion by Vlad Y.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: netdev@vger.kernel.org
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-26 02:22:18 -04:00
David S. Miller 6a06e5e1bb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/team/team.c
	drivers/net/usb/qmi_wwan.c
	net/batman-adv/bat_iv_ogm.c
	net/ipv4/fib_frontend.c
	net/ipv4/route.c
	net/l2tp/l2tp_netlink.c

The team, fib_frontend, route, and l2tp_netlink conflicts were simply
overlapping changes.

qmi_wwan and bat_iv_ogm were of the "use HEAD" variety.

With help from Antonio Quartulli.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-28 14:40:49 -04:00
Amerigo Wang c038a767cd ipv6: add a new namespace for nf_conntrack_reasm
As pointed by Michal, it is necessary to add a new
namespace for nf_conntrack_reasm code, this prepares
for the second patch.

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Michal Kubeček <mkubecek@suse.cz>
Cc: David Miller <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: netfilter-devel@vger.kernel.org
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-19 17:23:28 -04:00
Nicolas Dichtel b42664f898 netns: move net->ipv4.rt_genid to net->rt_genid
This commit prepares the use of rt_genid by both IPv4 and IPv6.
Initialization is left in IPv4 part.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-18 15:57:03 -04:00
Pablo Neira Ayuso ace1fe1231 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
This merges (3f509c6 netfilter: nf_nat_sip: fix incorrect handling
of EBUSY for RTCP expectation) to Patrick McHardy's IPv6 NAT changes.
2012-09-03 15:34:51 +02:00
Patrick McHardy 58a317f106 netfilter: ipv6: add IPv6 NAT support
Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-30 03:00:17 +02:00
Patrick McHardy c7232c9979 netfilter: add protocol independent NAT core
Convert the IPv4 NAT implementation to a protocol independent core and
address family specific modules.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-30 03:00:14 +02:00
David S. Miller e6acb38480 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
This is an initial merge in of Eric Biederman's work to start adding
user namespace support to the networking.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-24 18:54:37 -04:00
Rami Rosen f63c45e0e6 packet: fix broken build.
This patch fixes a broken build due to a missing header:
...
  CC      net/ipv4/proc.o
In file included from include/net/net_namespace.h:15,
                 from net/ipv4/proc.c:35:
include/net/netns/packet.h:11: error: field 'sklist_lock' has incomplete type
...

The lock of netns_packet has been replaced by a recent patch to be a mutex instead of a spinlock,
but we need to replace the header file to be linux/mutex.h instead of linux/spinlock.h as well.

See commit 0fa7fa98dbcc2789409ed24e885485e645803d7f:
packet: Protect packet sk list with mutex (v2) patch,

Signed-off-by: Rami Rosen <rosenr@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-23 09:29:45 -07:00
Pavel Emelyanov 0fa7fa98db packet: Protect packet sk list with mutex (v2)
Change since v1:

* Fixed inuse counters access spotted by Eric

In patch eea68e2f (packet: Report socket mclist info via diag module) I've
introduced a "scheduling in atomic" problem in packet diag module -- the
socket list is traversed under rcu_read_lock() while performed under it sk
mclist access requires rtnl lock (i.e. -- mutex) to be taken.

[152363.820563] BUG: scheduling while atomic: crtools/12517/0x10000002
[152363.820573] 4 locks held by crtools/12517:
[152363.820581]  #0:  (sock_diag_mutex){+.+.+.}, at: [<ffffffff81a2dcb5>] sock_diag_rcv+0x1f/0x3e
[152363.820613]  #1:  (sock_diag_table_mutex){+.+.+.}, at: [<ffffffff81a2de70>] sock_diag_rcv_msg+0xdb/0x11a
[152363.820644]  #2:  (nlk->cb_mutex){+.+.+.}, at: [<ffffffff81a67d01>] netlink_dump+0x23/0x1ab
[152363.820693]  #3:  (rcu_read_lock){.+.+..}, at: [<ffffffff81b6a049>] packet_diag_dump+0x0/0x1af

Similar thing was then re-introduced by further packet diag patches (fanount
mutex and pgvec mutex for rings) :(

Apart from being terribly sorry for the above, I propose to change the packet
sk list protection from spinlock to mutex. This lock currently protects two
modifications:

* sklist
* prot inuse counters

The sklist modifications can be just reprotected with mutex since they already
occur in a sleeping context. The inuse counters modifications are trickier -- the
__this_cpu_-s are used inside, thus requiring the caller to handle the potential
issues with contexts himself. Since packet sockets' counters are modified in two
places only (packet_create and packet_release) we only need to protect the context
from being preempted. BH disabling is not required in this case.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-22 22:58:27 -07:00
Eric W. Biederman e1fc3b14f9 sctp: Make sysctl tunables per net
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 23:32:16 -07:00
Eric W. Biederman ebb7e95d93 sctp: Add infrastructure for per net sysctls
Start with an empty sctp_net_table that will be populated as the various
tunable sysctls are made per net.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 23:30:37 -07:00
Eric W. Biederman b01a24078f sctp: Make the mib per network namespace
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 23:30:36 -07:00
Eric W. Biederman 13d782f6b4 sctp: Make the proc files per network namespace.
- Convert all of the files under /proc/net/sctp to be per
  network namespace.

- Don't print anything for /proc/net/sctp/snmp except in
  the initial network namespaces as the snmp counters still
  have to be converted to be per network namespace.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 23:29:53 -07:00
Eric W. Biederman 2ce9550350 sctp: Make the ctl_sock per network namespace
- Kill sctp_get_ctl_sock, it is useless now.
- Pass struct net where needed so net->sctp.ctl_sock is accessible.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 23:17:26 -07:00
Eric W. Biederman 4db67e8086 sctp: Make the address lists per network namespace
- Move the address lists into struct net
- Add per network namespace initialization and cleanup
- Pass around struct net so it is everywhere I need it.
- Rename all of the global variable references into references
  to the variables moved into struct net

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 23:12:17 -07:00
Eric W. Biederman 7064d16e16 userns: Use kgids for sysctl_ping_group_range
- Store sysctl_ping_group_range as a paire of kgid_t values
  instead of a pair of gid_t values.
- Move the kgid conversion work from ping_init_sock into ipv4_ping_group_range
- For invalid cases reset to the default disabled state.

With the kgid_t conversion made part of the original value sanitation
from userspace understand how the code will react becomes clearer
and it becomes possible to set the sysctl ping group range from
something other than the initial user namespace.

Cc: Vasiliy Kulikov <segoon@openwall.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-08-14 21:49:10 -07:00
Eric Dumazet 0c7462a235 ipv4: remove rt_cache_rebuild_count
After IP route cache removal, rt_cache_rebuild_count is no longer
used.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-30 14:53:22 -07:00
Eric Dumazet 5815d5e7aa tcp: use hash_32() in tcp_metrics
Fix a missing roundup_pow_of_two(), since tcpmhash_entries is not
guaranteed to be a power of two.

Uses hash_32() instead of custom hash.

tcpmhash_entries should be an unsigned int.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20 10:59:41 -07:00
Eric Dumazet be9f4a44e7 ipv4: tcp: remove per net tcp_sock
tcp_v4_send_reset() and tcp_v4_send_ack() use a single socket
per network namespace.

This leads to bad behavior on multiqueue NICS, because many cpus
contend for the socket lock and once socket lock is acquired, extra
false sharing on various socket fields slow down the operations.

To better resist to attacks, we use a percpu socket. Each cpu can
run without contention, using appropriate memory (local node)

Additional features :

1) We also mirror the queue_mapping of the incoming skb, so that
answers use the same queue if possible.

2) Setting SOCK_USE_WRITE_QUEUE socket flag speedup sock_wfree()

3) We now limit the number of in-flight RST/ACK [1] packets
per cpu, instead of per namespace, and we honor the sysctl_wmem_default
limit dynamically. (Prior to this patch, sysctl_wmem_default value was
copied at boot time, so any further change would not affect tcp_sock
limit)

[1] These packets are only generated when no socket was matched for
the incoming packet.

Reported-by: Bill Sommerfeld <wsommerfeld@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 10:35:30 -07:00
David S. Miller 51c5d0c4b1 tcp: Maintain dynamic metrics in local cache.
Maintain a local hash table of TCP dynamic metrics blobs.

Computed TCP metrics are no longer maintained in the route metrics.

The table uses RCU and an extremely simple hash so that it has low
latency and low overhead.  A simple hash is legitimate because we only
make metrics blobs for fully established connections.

Some tweaking of the default hash table sizes, metric timeouts, and
the hash chain length limit certainly could use some tweaking.  But
the basic design seems sound.

With help from Eric Dumazet and Joe Perches.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-10 22:39:57 -07:00
David S. Miller f4530fa574 ipv4: Avoid overhead when no custom FIB rules are installed.
If the user hasn't actually installed any custom rules, or fiddled
with the default ones, don't go through the whole FIB rules layer.

It's just pure overhead.

Instead do what we do with CONFIG_IP_MULTIPLE_TABLES disabled, check
the individual tables by hand, one by one.

Also, move fib_num_tclassid_users into the ipv4 network namespace.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-05 22:13:13 -07:00
David S. Miller 67da255210 Merge branch 'master' of git://1984.lsi.us.es/net-next 2012-06-11 12:56:14 -07:00
Gao feng c8a627ed06 inetpeer: add namespace support for inetpeer
now inetpeer doesn't support namespace,the information will
be leaking across namespace.

this patch move the global vars v4_peers and v6_peers to
netns_ipv4 and netns_ipv6 as a field peers.

add struct pernet_operations inetpeer_ops to initial pernet
inetpeer data.

and change family_to_base and inet_getpeer to support namespace.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-08 14:27:23 -07:00
Gao feng 7080ba0955 netfilter: nf_ct_icmp: add namespace support
This patch adds namespace support for ICMPv6 protocol tracker.

Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07 14:58:40 +02:00
Gao feng 4b626b9c5d netfilter: nf_ct_icmp: add namespace support
This patch adds namespace support for ICMP protocol tracker.

Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07 14:58:40 +02:00
Gao feng 0ce490ad43 netfilter: nf_ct_udp: add namespace support
This patch adds namespace support for UDP protocol tracker.

Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07 14:58:40 +02:00
Gao feng d2ba1fde42 netfilter: nf_ct_tcp: add namespace support
This patch adds namespace support for TCP protocol tracker.

Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07 14:58:39 +02:00
Gao feng 15f585bd76 netfilter: nf_ct_generic: add namespace support
This patch adds namespace support for the generic layer 4 protocol
tracker.

Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07 14:58:39 +02:00
Gao feng 524a53e5ad netfilter: nf_conntrack: prepare namespace support for l3 protocol trackers
This patch prepares the namespace support for layer 3 protocol trackers.
Basically, this modifies the following interfaces:

* nf_ct_l3proto_[un]register_sysctl.
* nf_conntrack_l3proto_[un]register.

We add a new nf_ct_l3proto_net is used to get the pernet data of l3proto.

This adds rhe new struct nf_ip_net that is used to store the sysctl header
and l3proto_ipv4,l4proto_tcp(6),l4proto_udp(6),l4proto_icmp(v6) because the
protos such tcp and tcp6 use the same data,so making nf_ip_net as a field
of netns_ct is the easiest way to manager it.

This patch also adds init_net to struct nf_conntrack_l3proto to initial
the layer 3 protocol pernet data.

Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07 14:58:39 +02:00
Gao feng 2c352f444c netfilter: nf_conntrack: prepare namespace support for l4 protocol trackers
This patch prepares the namespace support for layer 4 protocol trackers.
Basically, this modifies the following interfaces:

* nf_ct_[un]register_sysctl
* nf_conntrack_l4proto_[un]register

to include the namespace parameter. We still use init_net in this patch
to prepare the ground for follow-up patches for each layer 4 protocol
tracker.

We add a new net_id field to struct nf_conntrack_l4proto that is used
to store the pernet_operations id for each layer 4 protocol tracker.

Note that AF_INET6's protocols do not need to do sysctl compat. Thus,
we only register compat sysctl when l4proto.l3proto != AF_INET6.

Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-06-07 14:58:39 +02:00
Eric Leblond a900689264 netfilter: nf_ct_helper: allow to disable automatic helper assignment
This patch allows you to disable automatic conntrack helper
lookup based on TCP/UDP ports, eg.

echo 0 > /proc/sys/net/netfilter/nf_conntrack_helper

[ Note: flows that already got a helper will keep using it even
  if automatic helper assignment has been disabled ]

Once this behaviour has been disabled, you have to explicitly
use the iptables CT target to attach helper to flows.

There are good reasons to stop supporting automatic helper
assignment, for further information, please read:

http://www.netfilter.org/news.html#2012-04-03

This patch also adds one message to inform that automatic helper
assignment is deprecated and it will be removed soon (this is
spotted only once, with the first flow that gets a helper attached
to make it as less annoying as possible).

Signed-off-by: Eric Leblond <eric@regit.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-05-08 19:35:18 +02:00
Eric W. Biederman 6dceb03687 net ipv6: Don't use sysctl tables with .child entries.
The sysctl core no longer natively understands sysctl tables
with .child entries.

Split the ipv6_table to remove the .child entries.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:22:29 -04:00