commit 8742dc86d0c7a9628117a989c11f04a9b6b898f3 upstream.
We currently don't reload pointers pointing into skb header
after doing pskb_may_pull() in _decode_session4(). So in case
pskb_may_pull() changed the pointers, we read from random
memory. Fix this by putting all the needed infos on the
stack, so that we don't need to access the header pointers
after doing pskb_may_pull().
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Change-Id: I8da27fd751dfe161d39054a332b5c4cc898eaf52
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ea673a4d3a337184f3c314dcc6300bf02f39e077 upstream.
A call to pskb_may_pull may change the pointers into the packet,
so reload the pointers after the call.
Change-Id: Ic4fdcc11666f1157f1c95cc3144719113ba54f6b
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1a14f1e5550a341f76e5c8f596e9b5f8a886dfbc upstream.
We skip the header informations if the data pointer points
already behind the header in question for some protocols.
This is because we call pskb_may_pull with a negative value
converted to unsigened int from pskb_may_pull in this case.
Skipping the header informations can lead to incorrect policy
lookups, so fix it by a check of the data pointer position
before we call pskb_may_pull.
Change-Id: I1d6f36aad29087ed8ccbf2425f8d2a7cae2b0344
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
On systems that use mark-based routing it may be necessary for
routing lookups to use marks in order for packets to be routed
correctly. An example of such a system is Android, which uses
socket marks to route packets via different networks.
Currently, routing lookups in tunnel mode always use a mark of
zero, making routing incorrect on such systems.
This patch adds a new output_mark element to the xfrm state and
a corresponding XFRMA_OUTPUT_MARK netlink attribute. The output
mark differs from the existing xfrm mark in two ways:
1. The xfrm mark is used to match xfrm policies and states, while
the xfrm output mark is used to set the mark (and influence
the routing) of the packets emitted by those states.
2. The existing mark is constrained to be a subset of the bits of
the originating socket or transformed packet, but the output
mark is arbitrary and depends only on the state.
The use of a separate mark provides additional flexibility. For
example:
- A packet subject to two transforms (e.g., transport mode inside
tunnel mode) can have two different output marks applied to it,
one for the transport mode SA and one for the tunnel mode SA.
- On a system where socket marks determine routing, the packets
emitted by an IPsec tunnel can be routed based on a mark that
is determined by the tunnel, not by the marks of the
unencrypted packets.
- Support for setting the output marks can be introduced without
breaking any existing setups that employ both mark-based
routing and xfrm tunnel mode. Simply changing the code to use
the xfrm mark for routing output packets could xfrm mark could
change behaviour in a way that breaks these setups.
If the output mark is unspecified or set to zero, the mark is not
set or changed.
[backport of upstream 077fbac405bfc6d41419ad6c1725804ad4e9887c]
Bug: 63589535
Test: https://android-review.googlesource.com/452776/ passes
Tested: make allyesconfig; make -j64
Tested: https://android-review.googlesource.com/452776
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Change-Id: I76120fba036e21780ced31ad390faf491ea81e52
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJVoAOcAAoJEDjbvchgkmk+UhcP/1EOwnsJDcZ/sZkkclNgRmrJ
yLBCW65caLAI2E3SmIdKvHQwIx7lHzX5gmWRBrvx+fIl4KhaNKEQ0NCOf1ATaVuQ
MkYMdkicXWpLiFNdKokezryevGS8T1RME+2QlPFv3++Rby1Gy90YD5tu7YlIrEn7
sPRJQHEPCzVAQ7Lqhd66yHICM6/QvdefXj4pjh7vV8IMb2YwnY4vqYt7RxnJCUfP
tqljxrT274kzpA2awzALNh+o3B3/Y4W9ROmlDWviw3JBc9gEqFXYwbDf8KDwA5c0
sp9GPGed/dV5DFuqRcAHksJenFnE3E4gZjo/R5hluHQU27peBuRfXev2hZyBfZqG
796eUOky8fb0OiyxHfT2vhfGeD7CHI/asvIAORjDBVUqzJy9nkkby3XJ0U4tW+pz
VkcilD2oHw1uRIFH3JoBWTJ9W6CYSNFG1qxw+brgfKT5otJG/dBiI8kBABx+aTq7
V+A2cvf11oVwDEb93dnVypMGsfCywqzJUwEIRli9fTFjK7Fg9CBSGX38nwVGUaRv
M2/NeloTyWqUQE41Nd11gCu+hKQRtUU77nxpZcSeKn1XsbpO9/7dHTwcELRuKnTD
9XDksqPznXmC9KXGj7XMcRkLyWyB//JHjay0FCS6b4S6v7R5nrEIRjcpdB+H1WLd
zMOXRH4ZlcOAS/Yt2QMd
=8AB3
-----END PGP SIGNATURE-----
Merge upstream tag 'v3.10.84' into LA.BR.1.3.3
This merge brings us up-to-date as of upstream tag v3.10.84
* tag 'v3.10.84' (317 commits):
Linux 3.10.84
fs: Fix S_NOSEC handling
KVM: x86: make vapics_in_nmi_mode atomic
MIPS: Fix KVM guest fixmap address
x86/PCI: Use host bridge _CRS info on Foxconn K8M890-8237A
powerpc/perf: Fix book3s kernel to userspace backtraces
arm: KVM: force execution of HCPTR access on VM exit
Revert "crypto: talitos - convert to use be16_add_cpu()"
crypto: talitos - avoid memleak in talitos_alg_alloc()
sctp: Fix race between OOTB responce and route removal
packet: avoid out of bounds read in round robin fanout
packet: read num_members once in packet_rcv_fanout()
bridge: fix br_stp_set_bridge_priority race conditions
bridge: fix multicast router rlist endless loop
sparc: Use GFP_ATOMIC in ldc_alloc_exp_dring() as it can be called in softirq context
Linux 3.10.83
bus: mvebu: pass the coherency availability information at init time
KVM: nSVM: Check for NRIPS support before updating control field
ARM: clk-imx6q: refine sata's parent
d_walk() might skip too much
ipv6: update ip6_rt_last_gc every time GC is run
ipv6: prevent fib6_run_gc() contention
xfrm: Increase the garbage collector threshold
Btrfs: make xattr replace operations atomic
x86/microcode/intel: Guard against stack overflow in the loader
fs: take i_mutex during prepare_binprm for set[ug]id executables
hpsa: add missing pci_set_master in kdump path
hpsa: refine the pci enable/disable handling
sb_edac: Fix erroneous bytes->gigabytes conversion
ACPICA: Utilities: Cleanup to remove useless ACPI_PRINTF/FORMAT_xxx helpers.
ACPICA: Utilities: Cleanup to convert physical address printing formats.
__ptrace_may_access() should not deny sub-threads
include/linux/sched.h: don't use task->pid/tgid in same_thread_group/has_group_leader_pid
netfilter: Zero the tuple in nfnl_cthelper_parse_tuple()
netfilter: nfnetlink_cthelper: Remove 'const' and '&' to avoid warnings
config: Enable NEED_DMA_MAP_STATE by default when SWIOTLB is selected
get rid of s_files and files_lock
fput: turn "list_head delayed_fput_list" into llist_head
Linux 3.10.82
lpfc: Add iotag memory barrier
pipe: iovec: Fix memory corruption when retrying atomic copy as non-atomic
drm/mgag200: Reject non-character-cell-aligned mode widths
tracing: Have filter check for balanced ops
crypto: caam - fix RNG buffer cache alignment
Linux 3.10.81
btrfs: cleanup orphans while looking up default subvolume
btrfs: incorrect handling for fiemap_fill_next_extent return
cfg80211: wext: clear sinfo struct before calling driver
mm/memory_hotplug.c: set zone->wait_table to null after freeing it
drm/i915: Fix DDC probe for passive adapters
pata_octeon_cf: fix broken build
ozwpan: unchecked signed subtraction leads to DoS
ozwpan: divide-by-zero leading to panic
ozwpan: Use proper check to prevent heap overflow
MIPS: Fix enabling of DEBUG_STACKOVERFLOW
ring-buffer-benchmark: Fix the wrong sched_priority of producer
USB: serial: ftdi_sio: Add support for a Motion Tracker Development Board
USB: cp210x: add ID for HubZ dual ZigBee and Z-Wave dongle
block: fix ext_dev_lock lockdep report
Input: elantech - fix detection of touchpads where the revision matches a known rate
ALSA: usb-audio: add MAYA44 USB+ mixer control names
ALSA: usb-audio: Add mic volume fix quirk for Logitech Quickcam Fusion
ALSA: hda/realtek - Add a fixup for another Acer Aspire 9420
iio: adis16400: Compute the scan mask from channel indices
iio: adis16400: Use != channel indices for the two voltage channels
iio: adis16400: Report pressure channel scale
xen: netback: read hotplug script once at start of day.
udp: fix behavior of wrong checksums
net_sched: invoke ->attach() after setting dev->qdisc
unix/caif: sk_socket can disappear when state is unlocked
net: dp83640: fix broken calibration routine.
bridge: fix parsing of MLDv2 reports
ipv4: Avoid crashing in ip_error
net: phy: Allow EEE for all RGMII variants
Linux 3.10.80
fs/binfmt_elf.c:load_elf_binary(): return -EINVAL on zero-length mappings
vfs: read file_handle only once in handle_to_path
ACPI / init: Fix the ordering of acpi_reserve_resources()
Input: elantech - fix semi-mt protocol for v3 HW
rtlwifi: rtl8192cu: Fix kernel deadlock
md/raid5: don't record new size if resize_stripes fails.
svcrpc: fix potential GSSX_ACCEPT_SEC_CONTEXT decoding failures
ARM: fix missing syscall trace exit
ARM: dts: imx27: only map 4 Kbyte for fec registers
crypto: s390/ghash - Fix incorrect ghash icv buffer handling.
rt2x00: add new rt2800usb device DWA 130
libata: Ignore spurious PHY event on LPM policy change
libata: Add helper to determine when PHY events should be ignored
ext4: check for zero length extent explicitly
ext4: convert write_begin methods to stable_page_writes semantics
mmc: atmel-mci: fix bad variable type for clkdiv
powerpc: Align TOC to 256 bytes
usb: gadget: configfs: Fix interfaces array NULL-termination
usb-storage: Add NO_WP_DETECT quirk for Lacie 059f:0651 devices
USB: cp210x: add ID for KCF Technologies PRN device
USB: pl2303: Remove support for Samsung I330
USB: visor: Match I330 phone more precisely
xhci: gracefully handle xhci_irq dead device
xhci: Solve full event ring by increasing TRBS_PER_SEGMENT to 256
xhci: fix isoc endpoint dequeue from advancing too far on transaction error
target/pscsi: Don't leak scsi_host if hba is VIRTUAL_HOST
ASoC: wm8994: correct BCLK DIV 348 to 384
ASoC: wm8960: fix "RINPUT3" audio route error
ASoC: mc13783: Fix wrong mask value used in mc13xxx_reg_rmw() calls
ALSA: hda - Add headphone quirk for Lifebook E752
ALSA: hda - Add Conexant codecs CX20721, CX20722, CX20723 and CX20724
d_walk() might skip too much
lib: Fix strnlen_user() to not touch memory after specified maximum
hwmon: (ntc_thermistor) Ensure iio channel is of type IIO_VOLTAGE
libceph: request a new osdmap if lingering request maps to no osd
lguest: fix out-by-one error in address checking.
fs, omfs: add NULL terminator in the end up the token list
KVM: MMU: fix CR4.SMEP=1, CR0.WP=0 with shadow pages
net: socket: Fix the wrong returns for recvmsg and sendmsg
kernel: use the gnu89 standard explicitly
staging, rtl8192e, LLVMLinux: Remove unused inline prototype
staging: rtl8712, rtl8712: avoid lots of build warnings
staging, rtl8192e, LLVMLinux: Change extern inline to static inline
drm/i915: Fix declaration of intel_gmbus_{is_forced_bit/is_port_falid}
staging: wlags49_h2: fix extern inline functions
Linux 3.10.79
ACPICA: Utilities: Cleanup to enforce ACPI_PHYSADDR_TO_PTR()/ACPI_PTR_TO_PHYSADDR().
ACPICA: Tables: Change acpi_find_root_pointer() to use acpi_physical_address.
revert "softirq: Add support for triggering softirq work on softirqs"
sound/oss: fix deadlock in sequencer_ioctl(SNDCTL_SEQ_OUTOFBAND)
mmc: card: Don't access RPMB partitions for normal read/write
pinctrl: Don't just pretend to protect pinctrl_maps, do it for real
drm/i915: Add missing MacBook Pro models with dual channel LVDS
ARM: mvebu: armada-xp-openblocks-ax3-4: Disable internal RTC
ARM: dts: imx23-olinuxino: Fix dr_mode of usb0
ARM: dts: imx28: Fix AUART4 TX-DMA interrupt name
ARM: dts: imx25: Add #pwm-cells to pwm4
gpio: sysfs: fix memory leaks and device hotplug
gpio: unregister gpiochip device before removing it
xen/console: Update console event channel on resume
mm/memory-failure: call shake_page() when error hits thp tail page
nilfs2: fix sanity check of btree level in nilfs_btree_root_broken()
ocfs2: dlm: fix race between purge and get lock resource
Linux 3.10.78
ARC: signal handling robustify
UBI: fix soft lockup in ubi_check_volume()
Drivers: hv: vmbus: Don't wait after requesting offers
ARM: dts: dove: Fix uart[23] reg property
staging: panel: fix lcd type
usb: gadget: printer: enqueue printer's response for setup request
usb: host: oxu210hp: use new USB_RESUME_TIMEOUT
3w-sas: fix command completion race
3w-9xxx: fix command completion race
3w-xxxx: fix command completion race
ext4: fix data corruption caused by unwritten and delayed extents
rbd: end I/O the entire obj_request on error
serial: of-serial: Remove device_type = "serial" registration
ALSA: hda - Fix mute-LED fixed mode
ALSA: emu10k1: Emu10k2 32 bit DMA mode
ALSA: emu10k1: Fix card shortname string buffer overflow
ALSA: emux: Fix mutex deadlock in OSS emulation
ALSA: emux: Fix mutex deadlock at unloading
ipv4: Missing sk_nulls_node_init() in ping_unhash().
Linux 3.10.77
s390: Fix build error
nosave: consolidate __nosave_{begin,end} in <asm/sections.h>
memstick: mspro_block: add missing curly braces
C6x: time: Ensure consistency in __init
wl18xx: show rx_frames_per_rates as an array as it really is
lib: memzero_explicit: use barrier instead of OPTIMIZER_HIDE_VAR
e1000: add dummy allocator to fix race condition between mtu change and netpoll
ksoftirqd: Enable IRQs and call cond_resched() before poking RCU
RCU pathwalk breakage when running into a symlink overmounting something
drm/i915: cope with large i2c transfers
drm/radeon: fix doublescan modes (v2)
i2c: core: Export bus recovery functions
IB/mlx4: Fix WQE LSO segment calculation
IB/core: don't disallow registering region starting at 0x0
IB/core: disallow registering 0-sized memory region
stk1160: Make sure current buffer is released
mvsas: fix panic on expander attached SATA devices
Drivers: hv: vmbus: Fix a bug in the error path in vmbus_open()
xtensa: provide __NR_sync_file_range2 instead of __NR_sync_file_range
xtensa: xtfpga: fix hardware lockup caused by LCD driver
ACPICA: Utilities: split IO address types from data type models.
drivers: parport: Kconfig: exclude arm64 for PARPORT_PC
scsi: storvsc: Fix a bug in copy_from_bounce_buffer()
UBI: fix check for "too many bytes"
UBI: initialize LEB number variable
UBI: fix out of bounds write
UBI: account for bitflips in both the VID header and data
tools/power turbostat: Use $(CURDIR) instead of $(PWD) and add support for O= option in Makefile
powerpc/perf: Cap 64bit userspace backtraces to PERF_MAX_STACK_DEPTH
ext4: make fsync to sync parent dir in no-journal for real this time
arm64: kernel: compiling issue, need delete read_current_timer()
video: vgacon: Don't build on arm64
console: Disable VGA text console support on cris
drivers: parport: Kconfig: exclude h8300 for PARPORT_PC
parport: disable PC-style parallel port support on cris
rtlwifi: rtl8192cu: Add new device ID
rtlwifi: rtl8192cu: Add new USB ID
ptrace: fix race between ptrace_resume() and wait_task_stopped()
fs/binfmt_elf.c: fix bug in loading of PIE binaries
Input: elantech - fix absolute mode setting on some ASUS laptops
ALSA: emu10k1: don't deadlock in proc-functions
usb: core: hub: use new USB_RESUME_TIMEOUT
usb: host: sl811: use new USB_RESUME_TIMEOUT
usb: host: xhci: use new USB_RESUME_TIMEOUT
usb: host: isp116x: use new USB_RESUME_TIMEOUT
usb: host: r8a66597: use new USB_RESUME_TIMEOUT
usb: define a generic USB_RESUME_TIMEOUT macro
usb: phy: Find the right match in devm_usb_phy_match
ARM: S3C64XX: Use fixed IRQ bases to avoid conflicts on Cragganmore
ARM: 8320/1: fix integer overflow in ELF_ET_DYN_BASE
power_supply: lp8788-charger: Fix leaked power supply on probe fail
ring-buffer: Replace this_cpu_*() with __this_cpu_*()
spi: spidev: fix possible arithmetic overflow for multi-transfer message
cdc-wdm: fix endianness bug in debug statements
MIPS: Hibernate: flush TLB entries earlier
KVM: use slowpath for cross page cached accesses
s390/hibernate: fix save and restore of kernel text section
KVM: s390: Zero out current VMDB of STSI before including level3 data.
usb: gadget: composite: enable BESL support
Btrfs: fix inode eviction infinite loop after cloning into it
Btrfs: fix log tree corruption when fs mounted with -o discard
tcp: avoid looping in tcp_send_fin()
tcp: fix possible deadlock in tcp_send_fin()
ip_forward: Drop frames with attached skb->sk
Linux 3.10.76
dcache: Fix locking bugs in backported "deal with deadlock in d_walk()"
arc: mm: Fix build failure
sb_edac: avoid INTERNAL ERROR message in EDAC with unspecified channel
x86: mm: move mmap_sem unlock from mm_fault_error() to caller
vm: make stack guard page errors return VM_FAULT_SIGSEGV rather than SIGBUS
vm: add VM_FAULT_SIGSEGV handling support
deal with deadlock in d_walk()
move d_rcu from overlapping d_child to overlapping d_alias
kconfig: Fix warning "‘jump’ may be used uninitialized"
KVM: x86: SYSENTER emulation is broken
netfilter: conntrack: disable generic tracking for known protocols
Bluetooth: Ignore isochronous endpoints for Intel USB bootloader
Bluetooth: Add support for Intel bootloader devices
Bluetooth: btusb: Add IMC Networks (Broadcom based)
Bluetooth: Add firmware update for Atheros 0cf3:311f
Bluetooth: Enable Atheros 0cf3:311e for firmware upload
mm: Fix NULL pointer dereference in madvise(MADV_WILLNEED) support
splice: Apply generic position and size checks to each write
jfs: fix readdir regression
serial: 8250_dw: Fix deadlock in LCR workaround
benet: Call dev_kfree_skby_any instead of kfree_skb.
ixgb: Call dev_kfree_skby_any instead of dev_kfree_skb.
tg3: Call dev_kfree_skby_any instead of dev_kfree_skb.
bnx2: Call dev_kfree_skby_any instead of dev_kfree_skb.
r8169: Call dev_kfree_skby_any instead of dev_kfree_skb.
8139too: Call dev_kfree_skby_any instead of dev_kfree_skb.
8139cp: Call dev_kfree_skby_any instead of kfree_skb.
tcp: tcp_make_synack() should clear skb->tstamp
tcp: fix FRTO undo on cumulative ACK of SACKed range
ipv6: Don't reduce hop limit for an interface
tcp: prevent fetching dst twice in early demux code
remove extra definitions of U32_MAX
conditionally define U32_MAX
Linux 3.10.75
pagemap: do not leak physical addresses to non-privileged userspace
console: Fix console name size mismatch
IB/mlx4: Saturate RoCE port PMA counters in case of overflow
kernel.h: define u8, s8, u32, etc. limits
net: llc: use correct size for sysctl timeout entries
net: rds: use correct size for max unacked packets and bytes
ipc: fix compat msgrcv with negative msgtyp
core, nfqueue, openvswitch: fix compilation warning
media: s5p-mfc: fix mmap support for 64bit arch
iscsi target: fix oops when adding reject pdu
ocfs2: _really_ sync the right range
be2iscsi: Fix kernel panic when device initialization fails
cifs: fix use-after-free bug in find_writable_file
usb: xhci: apply XHCI_AVOID_BEI quirk to all Intel xHCI controllers
cpuidle: ACPI: do not overwrite name and description of C0
dmaengine: omap-dma: Fix memory leak when terminating running transfer
iio: imu: Use iio_trigger_get for indio_dev->trig assignment
iio: inv_mpu6050: Clear timestamps fifo while resetting hardware fifo
Defer processing of REQ_PREEMPT requests for blocked devices
USB: ftdi_sio: Use jtag quirk for SNAP Connect E10
USB: ftdi_sio: Added custom PID for Synapse Wireless product
radeon: Do not directly dereference pointers to BIOS area.
writeback: fix possible underflow in write bandwidth calculation
writeback: add missing INITIAL_JIFFIES init in global_update_bandwidth()
mm/memory hotplug: postpone the reset of obsolete pgdat
nbd: fix possible memory leak
iwlwifi: dvm: run INIT firmware again upon .start()
IB/uverbs: Prevent integer overflow in ib_umem_get address arithmetic
IB/core: Avoid leakage from kernel to user space
tcp: Fix crash in TCP Fast Open
selinux: fix sel_write_enforce broken return value
ALSA: hda - Fix headphone pin config for Lifebook T731
ALSA: usb - Creative USB X-Fi Pro SB1095 volume knob support
ALSA: hda - Add one more node in the EAPD supporting candidate list
Linux 3.10.74
net: ethernet: pcnet32: Setup the SRAM and NOUFLO on Am79C97{3, 5}
powerpc/mpc85xx: Add ranges to etsec2 nodes
hfsplus: fix B-tree corruption after insertion at position 0
dm: hold suspend_lock while suspending device during device deletion
vt6655: RFbSetPower fix missing rate RATE_12M
perf: Fix irq_work 'tail' recursion
Revert "iwlwifi: mvm: fix failure path when power_update fails in add_interface"
mac80211: drop unencrypted frames in mesh fwding
mac80211: disable u-APSD queues by default
nl80211: ignore HT/VHT capabilities without QoS/WMM
tcm_qla2xxx: Fix incorrect use of __transport_register_session
tcm_fc: missing curly braces in ft_invl_hw_context()
ASoC: wm8955: Fix wrong value references for boolean kctl
ASoC: adav80x: Fix wrong value references for boolean kctl
ASoC: ak4641: Fix wrong value references for boolean kctl
ASoC: wm8904: Fix wrong value references for boolean kctl
ASoC: wm8903: Fix wrong value references for boolean kctl
ASoC: wm2000: Fix wrong value references for boolean kctl
ASoC: wm8731: Fix wrong value references for boolean kctl
ASoC: tas5086: Fix wrong value references for boolean kctl
ASoC: wm8960: Fix wrong value references for boolean kctl
ASoC: cs4271: Fix wrong value references for boolean kctl
ASoC: sgtl5000: remove useless register write clearing CHRGPUMP_POWERUP
Change-Id: Ib7976ee2c7224e39074157e28db4158db40b00db
Signed-off-by: Kaushal Kumar <kaushalk@codeaurora.org>
commit eeb1b73378b560e00ff1da2ef09fed9254f4e128 upstream.
With the removal of the routing cache, we lost the
option to tweak the garbage collector threshold
along with the maximum routing cache size. So git
commit 703fb94ec ("xfrm: Fix the gc threshold value
for ipv4") moved back to a static threshold.
It turned out that the current threshold before we
start garbage collecting is much to small for some
workloads, so increase it from 1024 to 32768. This
means that we start the garbage collector if we have
more than 32768 dst entries in the system and refuse
new allocations if we are above 65536.
Reported-by: Wolfgang Walter <linux@stwm.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Stephen Hemminger <shemming@brocade.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This contains the following commits:
1. 0149763 net: core: Add a UID range to fib rules.
2. 1650474 net: core: Use the socket UID in routing lookups.
3. 0b16771 net: ipv4: Add the UID to the route cache.
4. ee058f1 net: core: Add a RTA_UID attribute to routes.
This is so that userspace can do per-UID route lookups.
Bug: 15413527
Change-Id: I1285474c6734614d3bda6f61d88dfe89a4af7892
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Git-commit: 0b428749ce5969bc06c73855e360141b4e7126e8
Git-repo: https://android.googlesource.com/kernel/common.git
[imaund@codeaurora.org: Resolved conflicts related to removal
of oif and mark, as well as refactoring of files.]
Signed-off-by: Ian Maund <imaund@codeaurora.org>
The xfrm gc threshold can be configured via xfrm{4,6}_gc_thresh
sysctl but currently only in init_net, other namespaces always
use the default value. This can substantially limit the number
of IPsec tunnels that can be effectively used.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Function xfrm4_policy_fini() is unused since xfrm4_fini() was
removed in 2.6.11.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
The xfrm gc threshold value depends on ip_rt_max_size. This
value was set to INT_MAX with the routing cache removal patch,
so we start doing garbage collecting when we have INT_MAX/2
IPsec routes cached. Fix this by going back to the static
threshold of 1024 routes.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Add new flag to remember when route is via gateway.
We will use it to allow rt_gateway to contain address of
directly connected host for the cases when DST_NOCACHE is
used or when the NH exception caches per-destination route
without DST_NOCACHE flag, i.e. when routes are not used for
other destinations. By this way we force the neighbour
resolving to work with the routed destination but we
can use different address in the packet, feature needed
for IPVS-DR where original packet for virtual IP is routed
via route to real IP.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a device is unregistered, we have to purge all of the
references to it that may exist in the entire system.
If a route is uncached, we currently have no way of accomplishing
this.
So create a global list that is scanned when a network device goes
down. This mirrors the logic in net/core/dst.c's dst_ifdown().
Signed-off-by: David S. Miller <davem@davemloft.net>
That is this value's only use, as a boolean to indicate whether
a route is an input route or not.
So implement it that way, using a u16 gap present in the struct
already.
Signed-off-by: David S. Miller <davem@davemloft.net>
Never actually used.
It was being set on output routes to the original OIF specified in the
flow key used for the lookup.
Adjust the only user, ipmr_rt_fib_lookup(), for greater correctness of
the flowi4_oif and flowi4_iif values, thanks to feedback from Julian
Anastasov.
Signed-off-by: David S. Miller <davem@davemloft.net>
They are always used in contexts where they can be reconstituted,
or where the finally resolved rt->rt_{src,dst} is semantically
equivalent.
Signed-off-by: David S. Miller <davem@davemloft.net>
This will be used so that we can compose a full flow key.
Even though we have a route in this context, we need more. In the
future the routes will be without destination address, source address,
etc. keying. One ipv4 route will cover entire subnets, etc.
In this environment we have to have a way to possess persistent storage
for redirects and PMTU information. This persistent storage will exist
in the FIB tables, and that's why we'll need to be able to rebuild a
full lookup flow key here. Using that flow key will do a fib_lookup()
and create/update the persistent entry.
Signed-off-by: David S. Miller <davem@davemloft.net>
We encode the pointer(s) into an unsigned long with one state bit.
The state bit is used so we can store the inetpeer tree root to use
when resolving the peer later.
Later the peer roots will be per-FIB table, and this change works to
facilitate that.
Signed-off-by: David S. Miller <davem@davemloft.net>
This results in code with less boiler plate that is a bit easier
to read.
Additionally stops us from using compatibility code in the sysctl
core, hastening the day when the compatibility code can be removed.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix checkpatch errors of the following type:
* ERROR: "foo * bar" should be "foo *bar"
* ERROR: "(foo*)" should be "(foo *)"
Signed-off-by: Daniel Baluta <dbaluta@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is bug in commit 5e2b61f(ipv4: Remove flowi from struct rtable).
It makes xfrm4_fill_dst() modify wrong data structure.
Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
Reported-by: Kim Phillips <kim.phillips@freescale.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are enough instances of this:
iph->frag_off & htons(IP_MF | IP_OFFSET)
that a helper function is probably warranted.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rearrange xfrm4_dst_lookup() so that it works by calling a helper
function __xfrm_dst_lookup() that takes an explicit flow key storage
area as an argument.
Use this new helper in xfrm4_get_saddr() so we can fetch the selected
source address from the flow instead of from rt->rt_src
Signed-off-by: David S. Miller <davem@davemloft.net>
To more accurately reflect that it is purely a routing
cache lookup key and is used in no other context.
Signed-off-by: David S. Miller <davem@davemloft.net>
Add const qualifiers to structs iphdr, ipv6hdr and in6_addr pointers
where possible, to make code intention more obvious.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 1018b5c016 ("Set rt->rt_iif more
sanely on output routes.") breaks rt_is_{output,input}_route.
This became the cause to return "IP_PKTINFO's ->ipi_ifindex == 0".
To fix it, this does:
1) Add "int rt_route_iif;" to struct rtable
2) For input routes, always set rt_route_iif to same value as rt_iif
3) For output routes, always set rt_route_iif to zero. Set rt_iif
as it is done currently.
4) Change rt_is_{output,input}_route() to test rt_route_iif
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
Create two sets of port member accessors, one set prefixed by fl4_*
and the other prefixed by fl6_*
This will let us to create AF optimal flow instances.
It will work because every context in which we access the ports,
we have to be fully aware of which AF the flowi is anyways.
Signed-off-by: David S. Miller <davem@davemloft.net>
I intend to turn struct flowi into a union of AF specific flowi
structs. There will be a common structure that each variant includes
first, much like struct sock_common.
This is the first step to move in that direction.
Signed-off-by: David S. Miller <davem@davemloft.net>
The only necessary parts are the src/dst addresses, the
interface indexes, the TOS, and the mark.
The rest is unnecessary bloat, which amounts to nearly
50 bytes on 64-bit.
Signed-off-by: David S. Miller <davem@davemloft.net>
Routing metrics are now copy-on-write.
Initially a route entry points it's metrics at a read-only location.
If a routing table entry exists, it will point there. Else it will
point at the all zero metric place-holder called 'dst_default_metrics'.
The writeability state of the metrics is stored in the low bits of the
metrics pointer, we have two bits left to spare if we want to store
more states.
For the initial implementation, COW is implemented simply via kmalloc.
However future enhancements will change this to place the writable
metrics somewhere else, in order to increase sharing. Very likely
this "somewhere else" will be the inetpeer cache.
Note also that this means that metrics updates may transiently fail
if we cannot COW the metrics successfully.
But even by itself, this patch should decrease memory usage and
increase cache locality especially for routing workloads. In those
cases the read-only metric copies stay in place and never get written
to.
TCP workloads where metrics get updated, and those rare cases where
PMTU triggers occur, will take a very slight performance hit. But
that hit will be alleviated when the long-term writable metrics
move to a more sharable location.
Since the metrics storage went from a u32 array of RTAX_MAX entries to
what is essentially a pointer, some retooling of the dst_entry layout
was necessary.
Most importantly, we need to preserve the alignment of the reference
count so that it doesn't share cache lines with the read-mostly state,
as per Eric Dumazet's alignment assertion checks.
The only non-trivial bit here is the move of the 'flags' member into
the writeable cacheline. This is OK since we are always accessing the
flags around the same moment when we made a modification to the
reference count.
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the macros defined for the members of flowi to clean the code up.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The GRE Key field is intended to be used for identifying an individual
traffic flow within a tunnel. It is useful to be able to have XFRM
policy selector matches to have different policies for different
GRE tunnels.
Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
It seems idev field in struct rtable has no special purpose, but adding
extra atomic ops.
We hold refcounts on the device itself (using percpu data, so pretty
cheap in current kernel).
infiniband case is solved using dst.dev instead of idev->dev
Removal of this field means routing without route cache is now using
shared data, percpu data, and only potential contention is a pair of
atomic ops on struct neighbour per forwarded packet.
About 5% speedup on routing test.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct dst_ops tracks number of allocated dst in an atomic_t field,
subject to high cache line contention in stress workload.
Switch to a percpu_counter, to reduce number of time we need to dirty a
central location. Place it on a separate cache line to avoid dirtying
read only fields.
Stress test :
(Sending 160.000.000 UDP frames,
IP route cache disabled, dual E5540 @2.53GHz,
32bit kernel, FIB_TRIE, SLUB/NUMA)
Before:
real 0m51.179s
user 0m15.329s
sys 10m15.942s
After:
real 0m45.570s
user 0m15.525s
sys 9m56.669s
With a small reordering of struct neighbour fields, subject of a
following patch, (to separate refcnt from other read mostly fields)
real 0m41.841s
user 0m15.261s
sys 8m45.949s
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>