Commit Graph

231 Commits

Author SHA1 Message Date
Shawn Bohrer 3542138c1c udp: ipv4: Add udp early demux
The removal of the routing cache introduced a performance regression for
some UDP workloads since a dst lookup must be done for each packet.
This change caches the dst per socket in a similar manner to what we do
for TCP by implementing early_demux.

For UDP multicast we can only cache the dst if there is only one
receiving socket on the host.  Since caching only works when there is
one receiving socket we do the multicast socket lookup using RCU.

For UDP unicast we only demux sockets with an exact match in order to
not break forwarding setups.  Additionally since the hash chains may be
long we only check the first socket to see if it is a match and not
waste extra time searching the whole chain when we might not find an
exact match.

Benchmark results from a netperf UDP_RR test:
Before 87961.22 transactions/s
After  89789.68 transactions/s

Benchmark results from a fio 1 byte UDP multicast pingpong test
(Multicast one way unicast response):
Before 12.97us RTT
After  12.63us RTT

Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-27 22:07:48 +02:00
Chenbo Feng 6219dd5a9a ANDROID: Add untag hacks to inet_release function
To prevent protential risk of memory leak caused by closing socket with
out untag it from qtaguid module, the qtaguid module now do not hold any
socket file reference count. Instead, it will increase the sk_refcnt of
the sk struct to prevent a reuse of the socket pointer.  And when a socket
is released. It will delete the tag if the socket is previously tagged so
no more resources is held by xt_qtaguid moudle. A flag is added to the untag
process to prevent possible kernel crash caused by fail to delete
corresponding socket_tag_entry list.
Bug: 36374484
Test: compile and run test under system/extra/test/iptables,
      run cts -m CtsNetTestCases -t android.net.cts.SocketRefCntTest

Signed-off-by: Chenbo Feng <fengc@google.com>
Change-Id: Iea7c3bf0c59b9774a5114af905b2405f6bc9ee52
Git-commit: 7be18b3a94f61e327b054be5f75bd6d3c975fbe7
Git-repo: https://android.googlesource.com/kernel/common.git
Signed-off-by: Srinivasarao P <spathi@codeaurora.org>
2019-07-27 21:50:41 +02:00
Eric Dumazet f30cc4da00 net: ping: do not abuse udp_poll()
commit 77d4b1d36926a9b8387c6b53eeba42bcaaffcea3 upstream.

Alexander reported various KASAN messages triggered in recent kernels

The problem is that ping sockets should not use udp_poll() in the first
place, and recent changes in UDP stack finally exposed this old bug.

Fixes: c319b4d76b ("net: ipv4: add IPPROTO_ICMP socket kind")
Fixes: 6d0bfe226116 ("net: ipv6: Add IPv6 support to the ping socket.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Sasha Levin <alexander.levin@verizon.com>
Cc: Solar Designer <solar@openwall.com>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Acked-By: Lorenzo Colitti <lorenzo@google.com>
Tested-By: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[wt: removed the parts related to ping6 as 6d0bfe226116 is not in 3.10]

Signed-off-by: Willy Tarreau <w@1wt.eu>
2019-07-27 21:44:33 +02:00
Luca Stefani 08d8c47ec4 This is the 3.10.95 stable release
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJWowK8AAoJEDjbvchgkmk+sFIP/3HvyY47jKTX7ykzRa78wJZK
 0ihPIOzV1OjgjvfRQZ4d6olGDMDuP5YbSAc0gHlIy71FO/cP7uPYSKZI9IrJAwSB
 ZEovaAS05nhbA1UuJFZo9V7JVYSc4IXNH/QoMvzJS+Zrpr0v0tlnxQSvP3kaeQpL
 Z5dbSd27XyzPp7gYM87Bn+OMkI1tPl+addyhqe7YwJ3MM7OUluLsZYxf30exoPjH
 bdckbaXVi1U+WUzA1OI7XboOuKQZh6NT+ZixheB7EQPvbN5kxZRDQKtNJWjnk24d
 ycU0KfGC1VntMULWhwJnn+elTxrQf0aVWkJcZM6xBri+g0BmGIli1DAD1WyYj3c7
 NSPDlTiNFcm95SUgDpB2PvT7Bue6T/0kRadpZJNgpjZgLtVMXo0r62Lo9Y11Y9Oa
 jRqSf7f7BsUJ+X3SDylcXXL60uiz5DOLpAyMp8TmI9JBh1hTymUhiHcEHR9iSUz+
 0QOw6P/XKfIXVe0qhzSeWXaRCKIFZIwWrNMztfj2U/SZtAmsoQ76Lpx2jCf/nqGz
 3IFAQ/dVhcfLRvOrcYPKFsMDWiLKMJNVTeKe2a9ywh8WCWajROfZvozm856dY42F
 gUTUn2MsAnm2T+wNnYcFZo0y2i8EaA4FfjEYfoUeEgyIDqc3w8+YjvgCFwDldLr4
 oMm63KBsozCC09L5rRpU
 =8AjQ
 -----END PGP SIGNATURE-----

Merge tag 'v3.10.95' into HEAD

This is the 3.10.95 stable release

Change-Id: I3b35d689a2e42fb39ddf132b7ba5414b0ff2fc4b
2017-04-18 17:14:54 +02:00
LuK1337 fc9499e55a Import latest Samsung release
* Package version: T713XXU2BQCO

Change-Id: I293d9e7f2df458c512d59b7a06f8ca6add610c99
2017-04-18 03:43:52 +02:00
Hannes Frederic Sowa 0b15bc2925 net: add validation for the socket syscall protocol argument
[ Upstream commit 79462ad02e861803b3840cc782248c7359451cd9 ]

郭永刚 reported that one could simply crash the kernel as root by
using a simple program:

	int socket_fd;
	struct sockaddr_in addr;
	addr.sin_port = 0;
	addr.sin_addr.s_addr = INADDR_ANY;
	addr.sin_family = 10;

	socket_fd = socket(10,3,0x40000000);
	connect(socket_fd , &addr,16);

AF_INET, AF_INET6 sockets actually only support 8-bit protocol
identifiers. inet_sock's skc_protocol field thus is sized accordingly,
thus larger protocol identifiers simply cut off the higher bits and
store a zero in the protocol fields.

This could lead to e.g. NULL function pointer because as a result of
the cut off inet_num is zero and we call down to inet_autobind, which
is NULL for raw sockets.

kernel: Call Trace:
kernel:  [<ffffffff816db90e>] ? inet_autobind+0x2e/0x70
kernel:  [<ffffffff816db9a4>] inet_dgram_connect+0x54/0x80
kernel:  [<ffffffff81645069>] SYSC_connect+0xd9/0x110
kernel:  [<ffffffff810ac51b>] ? ptrace_notify+0x5b/0x80
kernel:  [<ffffffff810236d8>] ? syscall_trace_enter_phase2+0x108/0x200
kernel:  [<ffffffff81645e0e>] SyS_connect+0xe/0x10
kernel:  [<ffffffff81779515>] tracesys_phase2+0x84/0x89

I found no particular commit which introduced this problem.

CVE: CVE-2015-8543
Cc: Cong Wang <cwang@twopensource.com>
Reported-by: 郭永刚 <guoyonggang@360.cn>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-22 19:47:55 -08:00
Subash Abhinov Kasiviswanathan 547ea416c3 net: Fail explicit bind to local reserved ports
Reserved ports may have some special use cases which are not suitable
for use by general userspace applications. Currently, ports specified
in ip_local_reserved_ports will not be returned only in case of
automatic port assignment.

Add a boolean sysctl flag 'reserved_port_bind'. Default value is 1
which preserves the existing behavior. Setting the value to 0 will
prevent userspace applications from binding to these ports even when
they are explicitly requested.

BUG=20663075
Change-Id: Ib1071ca5bd437cd3c4f71b56147e4858f3b9ebec
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: Mekala Natarajan <mekalan@codeaurora.org>
2015-09-09 15:26:20 -07:00
Ian Maund f1b32d4e47 Merge upstream linux-stable v3.10.28 into msm-3.10
The following commits have been reverted from this merge, as they are
known to introduce new bugs and are currently incompatible with our
audio implementation. Investigation of these commits is ongoing, and
they are expected to be brought in at a later time:

86e6de7 ALSA: compress: fix drain calls blocking other compress functions (v6)
16442d4 ALSA: compress: fix drain calls blocking other compress functions

This merge commit also includes a change in block, necessary for
compilation. Upstream has modified elevator_init_fn to prevent race
conditions, requring updates to row_init_queue and test_init_queue.

* commit 'v3.10.28': (1964 commits)
  Linux 3.10.28
  ARM: 7938/1: OMAP4/highbank: Flush L2 cache before disabling
  drm/i915: Don't grab crtc mutexes in intel_modeset_gem_init()
  serial: amba-pl011: use port lock to guard control register access
  mm: Make {,set}page_address() static inline if WANT_PAGE_VIRTUAL
  md/raid5: Fix possible confusion when multiple write errors occur.
  md/raid10: fix two bugs in handling of known-bad-blocks.
  md/raid10: fix bug when raid10 recovery fails to recover a block.
  md: fix problem when adding device to read-only array with bitmap.
  drm/i915: fix DDI PLLs HW state readout code
  nilfs2: fix segctor bug that causes file system corruption
  thp: fix copy_page_rep GPF by testing is_huge_zero_pmd once only
  ftrace/x86: Load ftrace_ops in parameter not the variable holding it
  SELinux: Fix possible NULL pointer dereference in selinux_inode_permission()
  writeback: Fix data corruption on NFS
  hwmon: (coretemp) Fix truncated name of alarm attributes
  vfs: In d_path don't call d_dname on a mount point
  staging: comedi: adl_pci9111: fix incorrect irq passed to request_irq()
  staging: comedi: addi_apci_1032: fix subdevice type/flags bug
  mm/memory-failure.c: recheck PageHuge() after hugetlb page migrate successfully
  GFS2: Increase i_writecount during gfs2_setattr_chown
  perf/x86/amd/ibs: Fix waking up from S3 for AMD family 10h
  perf scripting perl: Fix build error on Fedora 12
  ARM: 7815/1: kexec: offline non panic CPUs on Kdump panic
  Linux 3.10.27
  sched: Guarantee new group-entities always have weight
  sched: Fix hrtimer_cancel()/rq->lock deadlock
  sched: Fix cfs_bandwidth misuse of hrtimer_expires_remaining
  sched: Fix race on toggling cfs_bandwidth_used
  x86, fpu, amd: Clear exceptions in AMD FXSAVE workaround
  netfilter: nf_nat: fix access to uninitialized buffer in IRC NAT helper
  SCSI: sd: Reduce buffer size for vpd request
  intel_pstate: Add X86_FEATURE_APERFMPERF to cpu match parameters.
  mac80211: move "bufferable MMPDU" check to fix AP mode scan
  ACPI / Battery: Add a _BIX quirk for NEC LZ750/LS
  ACPI / TPM: fix memory leak when walking ACPI namespace
  mfd: rtsx_pcr: Disable interrupts before cancelling delayed works
  clk: exynos5250: fix sysmmu_mfc{l,r} gate clocks
  clk: samsung: exynos5250: Add CLK_IGNORE_UNUSED flag for the sysreg clock
  clk: samsung: exynos4: Correct SRC_MFC register
  clk: clk-divider: fix divisor > 255 bug
  ahci: add PCI ID for Marvell 88SE9170 SATA controller
  parisc: Ensure full cache coherency for kmap/kunmap
  drm/nouveau/bios: make jump conditional
  ARM: shmobile: mackerel: Fix coherent DMA mask
  ARM: shmobile: armadillo: Fix coherent DMA mask
  ARM: shmobile: kzm9g: Fix coherent DMA mask
  ARM: dts: exynos5250: Fix MDMA0 clock number
  ARM: fix "bad mode in ... handler" message for undefined instructions
  ARM: fix footbridge clockevent device
  net: Loosen constraints for recalculating checksum in skb_segment()
  bridge: use spin_lock_bh() in br_multicast_set_hash_max
  netpoll: Fix missing TXQ unlock and and OOPS.
  net: llc: fix use after free in llc_ui_recvmsg
  virtio-net: fix refill races during restore
  virtio_net: don't leak memory or block when too many frags
  virtio-net: make all RX paths handle errors consistently
  virtio_net: fix error handling for mergeable buffers
  vlan: Fix header ops passthru when doing TX VLAN offload.
  net: rose: restore old recvmsg behavior
  rds: prevent dereference of a NULL device
  ipv6: always set the new created dst's from in ip6_rt_copy
  net: fec: fix potential use after free
  hamradio/yam: fix info leak in ioctl
  drivers/net/hamradio: Integer overflow in hdlcdrv_ioctl()
  net: inet_diag: zero out uninitialized idiag_{src,dst} fields
  ip_gre: fix msg_name parsing for recvfrom/recvmsg
  net: unix: allow bind to fail on mutex lock
  ipv6: fix illegal mac_header comparison on 32bit
  netvsc: don't flush peers notifying work during setting mtu
  tg3: Initialize REG_BASE_ADDR at PCI config offset 120 to 0
  net: unix: allow set_peek_off to fail
  net: drop_monitor: fix the value of maxattr
  ipv6: don't count addrconf generated routes against gc limit
  packet: fix send path when running with proto == 0
  virtio: delete napi structures from netdev before releasing memory
  macvtap: signal truncated packets
  tun: update file current position
  macvtap: update file current position
  macvtap: Do not double-count received packets
  rds: prevent BUG_ON triggered on congestion update to loopback
  net: do not pretend FRAGLIST support
  IPv6: Fixed support for blackhole and prohibit routes
  HID: Revert "Revert "HID: Fix logitech-dj: missing Unifying device issue""
  gpio-rcar: R-Car GPIO IRQ share interrupt
  clocksource: em_sti: Set cpu_possible_mask to fix SMP broadcast
  irqchip: renesas-irqc: Fix irqc_probe error handling
  Linux 3.10.26
  sh: add EXPORT_SYMBOL(min_low_pfn) and EXPORT_SYMBOL(max_low_pfn) to sh_ksyms_32.c
  ext4: fix bigalloc regression
  arm64: Use Normal NonCacheable memory for writecombine
  arm64: Do not flush the D-cache for anonymous pages
  arm64: Avoid cache flushing in flush_dcache_page()
  ARM: KVM: arch_timers: zero CNTVOFF upon return to host
  ARM: hyp: initialize CNTVOFF to zero
  clocksource: arch_timer: use virtual counters
  arm64: Remove unused cpu_name ascii in arch/arm64/mm/proc.S
  arm64: dts: Reserve the memory used for secondary CPU release address
  arm64: check for number of arguments in syscall_get/set_arguments()
  arm64: fix possible invalid FPSIMD initialization state
  ...

Change-Id: Ia0e5d71b536ab49ec3a1179d59238c05bdd03106
Signed-off-by: Ian Maund <imaund@codeaurora.org>
2014-03-24 14:28:34 -07:00
Eric Dumazet bdf831a681 net: net_secret should not depend on TCP
[ Upstream commit 9a3bab6b05383f1e4c3716b3615500c51285959e ]

A host might need net_secret[] and never open a single socket.

Problem added in commit aebda156a5
("net: defer net_secret[] initialization")

Based on prior patch from Hannes Frederic Sowa.

Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Hannes Frederic Sowa <hannes@strressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-13 16:08:30 -07:00
Syed Rameez Mustafa 3994d4e010 net: change error print messages to generate warnings
Given the numerous clients of sockets, it is difficult to find the
offending client for the error that is being reported. Change the
kernel print messages to generate a warning instead so that we get
a complete call stack.

Change-Id: I4bfce3e0a5aecd88c6fa4a1f900482449a4b868d
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2013-10-04 19:08:13 -07:00
Chia-chi Yeh efa0d7175f net: Replace AID_NET_RAW checks with capable(CAP_NET_RAW).
Signed-off-by: Chia-chi Yeh <chiachi@android.com>
2013-07-01 13:40:27 -07:00
Robert Love 4b0158841f net: socket ioctl to reset connections matching local address
Introduce a new socket ioctl, SIOCKILLADDR, that nukes all sockets
bound to the same local address. This is useful in situations with
dynamic IPs, to kill stuck connections.

Signed-off-by: Brian Swetland <swetland@google.com>

net: fix tcp_v4_nuke_addr

Signed-off-by: Dima Zavin <dima@android.com>

net: ipv4: Fix a spinlock recursion bug in tcp_v4_nuke.

We can't hold the lock while calling to tcp_done(), so we drop
it before calling. We then have to start at the top of the chain again.

Signed-off-by: Dima Zavin <dima@android.com>

net: ipv4: Fix race in tcp_v4_nuke_addr().

To fix a recursive deadlock in 2.6.29, we stopped holding the hash table lock
across tcp_done() calls. This fixed the deadlock, but introduced a race where
the socket could die or change state.

Fix: Before unlocking the hash table, we grab a reference to the socket. We
can then unlock the hash table without risk of the socket going away. We then
lock the socket, which is safe because it is pinned. We can then call
tcp_done() without recursive deadlock and without race. Upon return, we unlock
the socket and then unpin it, killing it.

Change-Id: Idcdae072b48238b01bdbc8823b60310f1976e045
Signed-off-by: Robert Love <rlove@google.com>
Acked-by: Dima Zavin <dima@android.com>

ipv4: disable bottom halves around call to tcp_done().

Signed-off-by: Robert Love <rlove@google.com>
Signed-off-by: Colin Cross <ccross@android.com>

ipv4: Move sk_error_report inside bh_lock_sock in tcp_v4_nuke_addr

When sk_error_report is called, it wakes up the user-space thread, which then
calls tcp_close.  When the tcp_close is interrupted by the tcp_v4_nuke_addr
ioctl thread running tcp_done, it leaks 392 bytes and triggers a WARN_ON.

This patch moves the call to sk_error_report inside the bh_lock_sock, which
matches the locking used in tcp_v4_err.

Signed-off-by: Colin Cross <ccross@android.com>
2013-07-01 13:40:20 -07:00
Robert Love fd64bbf28e Paranoid network.
With CONFIG_ANDROID_PARANOID_NETWORK, require specific uids/gids to instantiate
network sockets.

Signed-off-by: Robert Love <rlove@google.com>

paranoid networking: Use in_egroup_p() to check group membership

The previous group_search() caused trouble for partners with module builds.
in_egroup_p() is also cleaner.

Signed-off-by: Nick Pelly <npelly@google.com>

Fix 2.6.29 build.

Signed-off-by: Arve Hjønnevåg <arve@android.com>

net: Fix compilation of the IPv6 module

Fix compilation of the IPv6 module -- current->euid does not exist anymore,
current_euid() is what needs to be used.

Signed-off-by: Steinar H. Gunderson <sesse@google.com>
2013-07-01 13:40:20 -07:00
Pravin B Shelar 9b3eb5edf3 gre: Fix GREv4 TCPv6 segmentation.
For ipv6 traffic, GRE can generate packet with strange GSO
bits, e.g. ipv4 packet with SKB_GSO_TCPV6 flag set.  Therefore
following patch relaxes check in inet gso handler to allow
such packet for segmentation.
This patch also fixes wrong skb->protocol set that was done in
gre_gso_segment() handler.

Reported-by: Steinar H. Gunderson <sesse@google.com>
CC: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-03 16:08:58 -04:00
Eric Dumazet aebda156a5 net: defer net_secret[] initialization
Instead of feeding net_secret[] at boot time, defer the init
at the point first socket is created.

This permits some platforms to use better entropy sources than
the ones available at boot time.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-29 15:14:02 -04:00
Pravin B Shelar c544193214 GRE: Refactor GRE tunneling code.
Following patch refactors GRE code into ip tunneling code and GRE
specific code. Common tunneling code is moved to ip_tunnel module.
ip_tunnel module is written as generic library which can be used
by different tunneling implementations.

ip_tunnel module contains following components:
 - packet xmit and rcv generic code. xmit flow looks like
   (gre_xmit/ipip_xmit)->ip_tunnel_xmit->ip_local_out.
 - hash table of all devices.
 - lookup for tunnel devices.
 - control plane operations like device create, destroy, ioctl, netlink
   operations code.
 - registration for tunneling modules, like gre, ipip etc.
 - define single pcpu_tstats dev->tstats.
 - struct tnl_ptk_info added to pass parsed tunnel packet parameters.

ipip.h header is renamed to ip_tunnel.h

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-26 12:27:18 -04:00
Pravin B Shelar 25c7704d8b ipv4: Fix ip-header identification for gso packets.
ip-header id needs to be incremented even if IP_DF flag is set.
This behaviour was changed in commit 490ab08127
(IP_GRE: Fix IP-Identification).

Following patch fixes it so that identification is always
incremented.

Reported-by: Cong Wang <amwang@redhat.com>
Acked-by: Cong Wang <amwang@redhat.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2013-03-25 12:30:25 -04:00
Pravin B Shelar 7313626745 tunneling: Add generic Tunnel segmentation.
Adds generic tunneling offloading support for IPv4-UDP based
tunnels.
GSO type is added to request this offload for a skb.
netdev feature NETIF_F_UDP_TUNNEL is added for hardware offloaded
udp-tunnel support. Currently no device supports this feature,
software offload is used.

This can be used by tunneling protocols like VXLAN.

CC: Jesse Gross <jesse@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-09 16:09:17 -05:00
Pravin B Shelar ec5f061564 net: Kill link between CSUM and SG features.
Earlier SG was unset if CSUM was not available for given device to
force skb copy to avoid sending inconsistent csum.
Commit c9af6db4c1 (net: Fix possible wrong checksum generation)
added explicit flag to force copy to fix this issue.  Therefore
there is no need to link SG and CSUM, following patch kills this
link between there two features.

This patch is also required following patch in series.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-09 16:08:57 -05:00
Pravin B Shelar 490ab08127 IP_GRE: Fix IP-Identification.
GRE-GSO generates ip fragments with id 0,2,3,4... for every
GSO packet, which is not correct. Following patch fixes it
by setting ip-header id unique id of fragments are allowed.
As Eric Dumazet suggested it is optimized by using inner ip-header
whenever inner packet is ipv4.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-25 15:47:41 -05:00
Li Wei 5b0520425e ipv4: fix error handling in icmp_protocol.
Now we handle icmp errors in each transport protocol's err_handler,
for icmp protocols, that is ping_err. Since this handler only care
of those icmp errors triggered by echo request, errors triggered
by echo reply(which sent by kernel) are sliently ignored.

So wrap ping_err() with icmp_err() to deal with those icmp errors.

Signed-off-by: Li Wei <lw@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-22 15:10:18 -05:00
Eric Dumazet 08dcdbf6a7 ipv6: use a stronger hash for tcp
It looks like its possible to open thousands of TCP IPv6
sessions on a server, all landing in a single slot of TCP hash
table. Incoming packets have to lookup sockets in a very
long list.

We should hash all bits from foreign IPv6 addresses, using
a salt and hash mix, not a simple XOR.

inet6_ehashfn() can also separately use the ports, instead
of xoring them.

Reported-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-21 18:15:58 -05:00
Pravin B Shelar 68c3316311 v4 GRE: Add TCP segmentation offload for GRE
Following patch adds GRE protocol offload handler so that
skb_gso_segment() can segment GRE packets.
SKB GSO CB is added to keep track of total header length so that
skb_segment can push entire header. e.g. in case of GRE, skb_segment
need to push inner and outer headers to every segment.
New NETIF_F_GRE_GSO feature is added for devices which support HW
GRE TSO offload. Currently none of devices support it therefore GRE GSO
always fall backs to software GSO.

[ Compute pkt_len before ip_local_out() invocation. -DaveM ]

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-15 15:17:11 -05:00
Pravin B Shelar c9af6db4c1 net: Fix possible wrong checksum generation.
Patch cef401de7b (net: fix possible wrong checksum
generation) fixed wrong checksum calculation but it broke TSO by
defining new GSO type but not a netdev feature for that type.
net_gso_ok() would not allow hardware checksum/segmentation
offload of such packets without the feature.

Following patch fixes TSO and wrong checksum. This patch uses
same logic that Eric Dumazet used. Patch introduces new flag
SKBTX_SHARED_FRAG if at least one frag can be modified by
the user. but SKBTX_SHARED_FRAG flag is kept in skb shared
info tx_flags rather than gso_type.

tx_flags is better compared to gso_type since we can have skb with
shared frag without gso packet. It does not link SHARED_FRAG to
GSO, So there is no need to define netdev feature for this.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-13 13:30:10 -05:00
David S. Miller 547472b8e1 ipv4: Disallow non-namespace aware protocols to register.
All in-tree ipv4 protocol implementations are now namespace
aware.  Therefore all the run-time checks are superfluous.

Reject registry of any non-namespace aware ipv4 protocol.
Eventually we'll remove prot->netns_ok and this registry
time check as well.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-05 14:42:23 -05:00
Eric Dumazet cef401de7b net: fix possible wrong checksum generation
Pravin Shelar mentioned that GSO could potentially generate
wrong TX checksum if skb has fragments that are overwritten
by the user between the checksum computation and transmit.

He suggested to linearize skbs but this extra copy can be
avoided for normal tcp skbs cooked by tcp_sendmsg().

This patch introduces a new SKB_GSO_SHARED_FRAG flag, set
in skb_shinfo(skb)->gso_type if at least one frag can be
modified by the user.

Typical sources of such possible overwrites are {vm}splice(),
sendfile(), and macvtap/tun/virtio_net drivers.

Tested:

$ netperf -H 7.7.8.84
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
7.7.8.84 () port 0 AF_INET
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    10.00    3959.52

$ netperf -H 7.7.8.84 -t TCP_SENDFILE
TCP SENDFILE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.8.84 ()
port 0 AF_INET
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    10.00    3216.80

Performance of the SENDFILE is impacted by the extra allocation and
copy, and because we use order-0 pages, while the TCP_STREAM uses
bigger pages.

Reported-by: Pravin Shelar <pshelar@nicira.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-28 00:27:15 -05:00
YOSHIFUJI Hideaki / 吉藤英明 50c3a487d5 ipv4: Use IS_ERR_OR_NULL().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-22 14:28:28 -05:00
YOSHIFUJI Hideaki / 吉藤英明 95c7e0e4d4 ipv4: Use FIELD_SIZEOF() in inet_init().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-09 23:38:23 -08:00
Eric W. Biederman 3594698a1f net: Make CAP_NET_BIND_SERVICE per user namespace
Allow privileged users in any user namespace to bind to
privileged sockets in network namespaces they control.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-18 20:33:37 -05:00
Eric W. Biederman 52e804c6df net: Allow userns root to control ipv4
Allow an unpriviled user who has created a user namespace, and then
created a network namespace to effectively use the new network
namespace, by reducing capable(CAP_NET_ADMIN) and
capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns,
CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls.

Settings that merely control a single network device are allowed.
Either the network device is a logical network device where
restrictions make no difference or the network device is hardware NIC
that has been explicity moved from the initial network namespace.

In general policy and network stack state changes are allowed
while resource control is left unchanged.

Allow creating raw sockets.
Allow the SIOCSARP ioctl to control the arp cache.
Allow the SIOCSIFFLAG ioctl to allow setting network device flags.
Allow the SIOCSIFADDR ioctl to allow setting a netdevice ipv4 address.
Allow the SIOCSIFBRDADDR ioctl to allow setting a netdevice ipv4 broadcast address.
Allow the SIOCSIFDSTADDR ioctl to allow setting a netdevice ipv4 destination address.
Allow the SIOCSIFNETMASK ioctl to allow setting a netdevice ipv4 netmask.
Allow the SIOCADDRT and SIOCDELRT ioctls to allow adding and deleting ipv4 routes.

Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for
adding, changing and deleting gre tunnels.

Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for
adding, changing and deleting ipip tunnels.

Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for
adding, changing and deleting ipsec virtual tunnel interfaces.

Allow setting the MRT_INIT, MRT_DONE, MRT_ADD_VIF, MRT_DEL_VIF, MRT_ADD_MFC,
MRT_DEL_MFC, MRT_ASSERT, MRT_PIM, MRT_TABLE socket options on multicast routing
sockets.

Allow setting and receiving IPOPT_CIPSO, IP_OPT_SEC, IP_OPT_SID and
arbitrary ip options.

Allow setting IP_SEC_POLICY/IP_XFRM_POLICY ipv4 socket option.
Allow setting the IP_TRANSPARENT ipv4 socket option.
Allow setting the TCP_REPAIR socket option.
Allow setting the TCP_CONGESTION socket option.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-18 20:32:45 -05:00
Vlad Yasevich f191a1d17f net: Remove code duplication between offload structures
Move the offload callbacks into its own structure.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-15 17:39:51 -05:00
Vlad Yasevich 808a8f8845 ipv4: Pull GSO registration out of inet_init()
Since GSO/GRO support is now separated, make IPv4 GSO a
stand-alone init call and not part of inet_init().

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-15 17:39:23 -05:00
Vlad Yasevich bca49f843e ipv4: Switch to using the new offload infrastructure.
Switch IPv4 code base to using the new GRO/GSO calls and data.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-15 17:36:17 -05:00
Vlad Yasevich de27d001d1 net: Add net protocol offload registration infrustructure
Create a new data structure for IPv4 protocols that holds GRO/GSO
callbacks and a new array to track the protocols that register GRO/GSO.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-15 17:36:17 -05:00
Vlad Yasevich 22061d8014 net: Switch to using the new packet offload infrustructure
Convert to using the new GSO/GRO registration mechanism and new
packet offload structure.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-15 17:36:17 -05:00
Christoph Paasch bb68b64724 ipv4: Don't add TCP-code in inet_sock_destruct
Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Acked-by: H.K. Jerry Chu <hkchu@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-20 17:12:27 -04:00
Eric Dumazet 7ab4551f3b tcp: fix TFO regression
Fengguang Wu reported various panics and bisected to commit
8336886f78 (tcp: TCP Fast Open Server - support TFO listeners)

Fix this by making sure socket is a TCP socket before accessing TFO data
structures.

[  233.046014] kfree_debugcheck: out of range ptr ea6000000bb8h.
[  233.047399] ------------[ cut here ]------------
[  233.048393] kernel BUG at /c/kernel-tests/src/stable/mm/slab.c:3074!
[  233.048393] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
[  233.048393] Modules linked in:
[  233.048393] CPU 0
[  233.048393] Pid: 3929, comm: trinity-watchdo Not tainted 3.6.0-rc3+
#4192 Bochs Bochs
[  233.048393] RIP: 0010:[<ffffffff81169653>]  [<ffffffff81169653>]
kfree_debugcheck+0x27/0x2d
[  233.048393] RSP: 0018:ffff88000facbca8  EFLAGS: 00010092
[  233.048393] RAX: 0000000000000031 RBX: 0000ea6000000bb8 RCX:
00000000a189a188
[  233.048393] RDX: 000000000000a189 RSI: ffffffff8108ad32 RDI:
ffffffff810d30f9
[  233.048393] RBP: ffff88000facbcb8 R08: 0000000000000002 R09:
ffffffff843846f0
[  233.048393] R10: ffffffff810ae37c R11: 0000000000000908 R12:
0000000000000202
[  233.048393] R13: ffffffff823dbd5a R14: ffff88000ec5bea8 R15:
ffffffff8363c780
[  233.048393] FS:  00007faa6899c700(0000) GS:ffff88001f200000(0000)
knlGS:0000000000000000
[  233.048393] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  233.048393] CR2: 00007faa6841019c CR3: 0000000012c82000 CR4:
00000000000006f0
[  233.048393] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[  233.048393] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[  233.048393] Process trinity-watchdo (pid: 3929, threadinfo
ffff88000faca000, task ffff88000faec600)
[  233.048393] Stack:
[  233.048393]  0000000000000000 0000ea6000000bb8 ffff88000facbce8
ffffffff8116ad81
[  233.048393]  ffff88000ff588a0 ffff88000ff58850 ffff88000ff588a0
0000000000000000
[  233.048393]  ffff88000facbd08 ffffffff823dbd5a ffffffff823dbcb0
ffff88000ff58850
[  233.048393] Call Trace:
[  233.048393]  [<ffffffff8116ad81>] kfree+0x5f/0xca
[  233.048393]  [<ffffffff823dbd5a>] inet_sock_destruct+0xaa/0x13c
[  233.048393]  [<ffffffff823dbcb0>] ? inet_sk_rebuild_header
+0x319/0x319
[  233.048393]  [<ffffffff8231c307>] __sk_free+0x21/0x14b
[  233.048393]  [<ffffffff8231c4bd>] sk_free+0x26/0x2a
[  233.048393]  [<ffffffff825372db>] sctp_close+0x215/0x224
[  233.048393]  [<ffffffff810d6835>] ? lock_release+0x16f/0x1b9
[  233.048393]  [<ffffffff823daf12>] inet_release+0x7e/0x85
[  233.048393]  [<ffffffff82317d15>] sock_release+0x1f/0x77
[  233.048393]  [<ffffffff82317d94>] sock_close+0x27/0x2b
[  233.048393]  [<ffffffff81173bbe>] __fput+0x101/0x20a
[  233.048393]  [<ffffffff81173cd5>] ____fput+0xe/0x10
[  233.048393]  [<ffffffff810a3794>] task_work_run+0x5d/0x75
[  233.048393]  [<ffffffff8108da70>] do_exit+0x290/0x7f5
[  233.048393]  [<ffffffff82707415>] ? retint_swapgs+0x13/0x1b
[  233.048393]  [<ffffffff8108e23f>] do_group_exit+0x7b/0xba
[  233.048393]  [<ffffffff8108e295>] sys_exit_group+0x17/0x17
[  233.048393]  [<ffffffff8270de10>] tracesys+0xdd/0xe2
[  233.048393] Code: 59 01 5d c3 55 48 89 e5 53 41 50 0f 1f 44 00 00 48
89 fb e8 d4 b0 f0 ff 84 c0 75 11 48 89 de 48 c7 c7 fc fa f7 82 e8 0d 0f
57 01 <0f> 0b 5f 5b 5d c3 55 48 89 e5 0f 1f 44 00 00 48 63 87 d8 00 00
[  233.048393] RIP  [<ffffffff81169653>] kfree_debugcheck+0x27/0x2d
[  233.048393]  RSP <ffff88000facbca8>

Reported-by: Fengguang Wu <wfg@linux.intel.com>
Tested-by: Fengguang Wu <wfg@linux.intel.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: "H.K. Jerry Chu" <hkchu@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: H.K. Jerry Chu <hkchu@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-06 14:21:10 -04:00
Jerry Chu 8336886f78 tcp: TCP Fast Open Server - support TFO listeners
This patch builds on top of the previous patch to add the support
for TFO listeners. This includes -

1. allocating, properly initializing, and managing the per listener
fastopen_queue structure when TFO is enabled

2. changes to the inet_csk_accept code to support TFO. E.g., the
request_sock can no longer be freed upon accept(), not until 3WHS
finishes

3. allowing a TCP_SYN_RECV socket to properly poll() and sendmsg()
if it's a TFO socket

4. properly closing a TFO listener, and a TFO socket before 3WHS
finishes

5. supporting TCP_FASTOPEN socket option

6. modifying tcp_check_req() to use to check a TFO socket as well
as request_sock

7. supporting TCP's TFO cookie option

8. adding a new SYN-ACK retransmit handler to use the timer directly
off the TFO socket rather than the listener socket. Note that TFO
server side will not retransmit anything other than SYN-ACK until
the 3WHS is completed.

The patch also contains an important function
"reqsk_fastopen_remove()" to manage the somewhat complex relation
between a listener, its request_sock, and the corresponding child
socket. See the comment above the function for the detail.

Signed-off-by: H.K. Jerry Chu <hkchu@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-31 20:02:19 -04:00
Eric Dumazet a9e050f4e7 net: tcp: GRO should be ECN friendly
While doing TCP ECN tests, I discovered GRO was reordering packets if it
receives one packet with CE set, while previous packets in same NAPI run
have ECT(0) for the same flow :

09:25:25.857620 IP (tos 0x2,ECT(0), ttl 64, id 27893, offset 0, flags
[DF], proto TCP (6), length 4396)
    172.30.42.19.54550 > 172.30.42.13.44139: Flags [.], seq
233801:238145, ack 1, win 115, options [nop,nop,TS val 3397779 ecr
1990627], length 4344

09:25:25.857626 IP (tos 0x3,CE, ttl 64, id 27892, offset 0, flags [DF],
proto TCP (6), length 1500)
    172.30.42.19.54550 > 172.30.42.13.44139: Flags [.], seq
232353:233801, ack 1, win 115, options [nop,nop,TS val 3397779 ecr
1990627], length 1448

09:25:25.857638 IP (tos 0x0, ttl 64, id 34581, offset 0, flags [DF],
proto TCP (6), length 64)
    172.30.42.13.44139 > 172.30.42.19.54550: Flags [.], cksum 0xac8f
(incorrect -> 0xca69), ack 232353, win 1271, options [nop,nop,TS val
1990627 ecr 3397779,nop,nop,sack 1 {233801:238145}], length 0

We have two problems here :

1) GRO reorders packets

  If NIC gave packet1, then packet2, which happen to be from "different
flows"  GRO feeds stack with packet2, then packet1. I have yet to
understand how to solve this problem.

2) GRO is not ECN friendly

Delivering packets out of order makes TCP stack not as fast as it could
be.

In this patch I suggest we make the tos test not part of the 'same_flow'
determination, but part of the 'should flush' logic

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-06 13:40:47 -07:00
Yuchung Cheng cf60af03ca net-tcp: Fast Open client - sendmsg(MSG_FASTOPEN)
sendmsg() (or sendto()) with MSG_FASTOPEN is a combo of connect(2)
and write(2). The application should replace connect() with it to
send data in the opening SYN packet.

For blocking socket, sendmsg() blocks until all the data are buffered
locally and the handshake is completed like connect() call. It
returns similar errno like connect() if the TCP handshake fails.

For non-blocking socket, it returns the number of bytes queued (and
transmitted in the SYN-data packet) if cookie is available. If cookie
is not available, it transmits a data-less SYN packet with Fast Open
cookie request option and returns -EINPROGRESS like connect().

Using MSG_FASTOPEN on connecting or connected socket will result in
simlar errno like repeating connect() calls. Therefore the application
should only use this flag on new sockets.

The buffer size of sendmsg() is independent of the MSS of the connection.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 11:02:03 -07:00
Yuchung Cheng 783237e8da net-tcp: Fast Open client - sending SYN-data
This patch implements sending SYN-data in tcp_connect(). The data is
from tcp_sendmsg() with flag MSG_FASTOPEN (implemented in a later patch).

The length of the cookie in tcp_fastopen_req, init'd to 0, controls the
type of the SYN. If the cookie is not cached (len==0), the host sends
data-less SYN with Fast Open cookie request option to solicit a cookie
from the remote. If cookie is not available (len > 0), the host sends
a SYN-data with Fast Open cookie option. If cookie length is negative,
  the SYN will not include any Fast Open option (for fall back operations).

To deal with middleboxes that may drop SYN with data or experimental TCP
option, the SYN-data is only sent once. SYN retransmits do not include
data or Fast Open options. The connection will fall back to regular TCP
handshake.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 11:02:03 -07:00
David S. Miller 41063e9dd1 ipv4: Early TCP socket demux.
Input packet processing for local sockets involves two major demuxes.
One for the route and one for the socket.

But we can optimize this down to one demux for certain kinds of local
sockets.

Currently we only do this for established TCP sockets, but it could
at least in theory be expanded to other kinds of connections.

If a TCP socket is established then it's identity is fully specified.

This means that whatever input route was used during the three-way
handshake must work equally well for the rest of the connection since
the keys will not change.

Once we move to established state, we cache the receive packet's input
route to use later.

Like the existing cached route in sk->sk_dst_cache used for output
packets, we have to check for route invalidations using dst->obsolete
and dst->ops->check().

Early demux occurs outside of a socket locked section, so when a route
invalidation occurs we defer the fixup of sk->sk_rx_dst until we are
actually inside of established state packet processing and thus have
the socket locked.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-19 21:22:05 -07:00
David S. Miller f9242b6b28 inet: Sanitize inet{,6} protocol demux.
Don't pretend that inet_protos[] and inet6_protos[] are hashes, thay
are just a straight arrays.  Remove all unnecessary hash masking.

Document MAX_INET_PROTOS.

Use RAW_HTABLE_SIZE when appropriate.

Reported-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-19 18:56:21 -07:00
Joe Perches e3192690a3 net: Remove casts to same type
Adding casts of objects to the same type is unnecessary
and confusing for a human reader.

For example, this cast:

	int y;
	int *p = (int *)&y;

I used the coccinelle script below to find and remove these
unnecessary casts.  I manually removed the conversions this
script produces of casts with __force and __user.

@@
type T;
T *p;
@@

-	(T *)p
+	p

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-04 11:45:11 -04:00
Pavel Emelyanov 4a17fd5229 sock: Introduce named constants for sk_reuse
Name them in a "backward compatible" manner, i.e. reuse or not
are still 1 and 0 respectively. The reuse value of 2 means that
the socket with it will forcibly reuse everyone else's port.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-21 15:52:25 -04:00
Daniel Baluta 5e73ea1a31 ipv4: fix checkpatch errors
Fix checkpatch errors of the following type:
	* ERROR: "foo * bar" should be "foo *bar"
	* ERROR: "(foo*)" should be "(foo *)"

Signed-off-by: Daniel Baluta <dbaluta@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-15 12:37:19 -04:00
David Howells 9ffc93f203 Remove all #inclusions of asm/system.h
Remove all #inclusions of asm/system.h preparatory to splitting and killing
it.  Performed with the following command:

perl -p -i -e 's!^#\s*include\s*<asm/system[.]h>.*\n!!' `grep -Irl '^#\s*include\s*<asm/system[.]h>' *`

Signed-off-by: David Howells <dhowells@redhat.com>
2012-03-28 18:30:03 +01:00
Joe Perches afd465030a net: ipv4: Standardize prefixes for message logging
Add #define pr_fmt(fmt) as appropriate.

Add "IPv4: ", "TCP: ", and "IPsec: " to appropriate files.
Standardize on "UDPLite: " for appropriate uses.
Some prefixes were previously "UDPLITE: " and "UDP-Lite: ".

Add KBUILD_MODNAME ": " to icmp and gre.
Remove embedded prefixes as appropriate.

Add missing "\n" to pr_info in gre.c.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-12 17:05:21 -07:00
Joe Perches 058bd4d2a4 net: Convert printks to pr_<level>
Use a more current kernel messaging style.

Convert a printk block to print_hex_dump.
Coalesce formats, align arguments.
Use %s, __func__ instead of embedding function names.

Some messages that were prefixed with <foo>_close are
now prefixed with <foo>_fini.  Some ah4 and esp messages
are now not prefixed with "ip ".

The intent of this patch is to later add something like
  #define pr_fmt(fmt) "IPv4: " fmt.
to standardize the output messages.

Text size is trivially reduced. (x86-32 allyesconfig)

$ size net/ipv4/built-in.o*
   text	   data	    bss	    dec	    hex	filename
 887888	  31558	 249696	1169142	 11d6f6	net/ipv4/built-in.o.new
 887934	  31558	 249800	1169292	 11d78c	net/ipv4/built-in.o.old

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-11 23:42:51 -07:00
Jiri Benc 4c507d2897 net: implement IP_RECVTOS for IP_PKTOPTIONS
Currently, it is not easily possible to get TOS/DSCP value of packets from
an incoming TCP stream. The mechanism is there, IP_PKTOPTIONS getsockopt
with IP_RECVTOS set, the same way as incoming TTL can be queried. This is
not actually implemented for TOS, though.

This patch adds this functionality, both for IPv4 (IP_PKTOPTIONS) and IPv6
(IPV6_2292PKTOPTIONS). For IPv4, like in the IP_RECVTTL case, the value of
the TOS field is stored from the other party's ACK.

This is needed for proxies which require DSCP transparency. One such example
is at http://zph.bratcheda.org/.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-13 00:46:41 -05:00