Highlights include:
- Fix NFSv4 infinite loops on open(O_TRUNC)
- Fix an Oops and an infinite loop in the NFSv4 flock code
- Don't register the PipeFS filesystem until it has been set up
- Fix an Oops in nfs_try_to_update_request
- Don't reuse NFSv4 open owners: fixes a bad sequence id storm.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJPlbzwAAoJEGcL54qWCgDy24oQALZE67vBft7M2j0BiWhVbV15
YLbCf6x/h+0BJAkKWdrBaw7N6GX6OYBOX2SsmrBkzYf5mgHeju5+dH0CmRAR5xib
5d+Lwxif1l+rABfdzzJf8gY1L1THyJCnfmarKKyYEJ5OC1pJyulKLanXSPzPfzlm
APV5Jf6NM2WRgkCqzP6zf61NG0HbDSR7C//HQ3k21Sdt9XDLf5qLHBSuPIQ+BlZY
EvpbERTtJgp7rPJsLQv1F2dgasDUQNg8G+tmZatGcqEiNxVyQ2YqwshaldOVqftv
3Kocs6OW5C1ESj1dFJZmeMZ/+GSHjRJx8fpqHJjmCsh4kPGgFviQDdYwu4FDhhPI
FZslC5nVi8JMTPNJAFmfvbwPQId/TSRPCWYO5PtW1LSfRT/+25b6M5duro1eGIbJ
/FDoOCYQmepNOfobU9Q3roDWyNSLYFaUaMJUrccRcAuS3S2NEXisTAT49kmqa1Vm
ZArOJBnXTgmGi30nKhqqLJ43P61ekhX0AQ6PycZAXkjeRlkQs7AAQbMJZMB2X0r9
KtRCDPiH2NuR0FwxNMkMP4BXdsaY7Sz/xiSZXLOUf1SeWBiBtYoDdrQ3z67SGOeG
qxI3qXXl0KC2+l2jnezcWhBf4CDpxftGIBi+rKWJt8stoYzbemB/M1lkoTCwrVzq
8Gwyy0QTVzE9VkY77oVW
=hQAK
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-3.4-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
- Fix NFSv4 infinite loops on open(O_TRUNC)
- Fix an Oops and an infinite loop in the NFSv4 flock code
- Don't register the PipeFS filesystem until it has been set up
- Fix an Oops in nfs_try_to_update_request
- Don't reuse NFSv4 open owners: fixes a bad sequence id storm.
* tag 'nfs-for-3.4-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFSv4: Keep dropped state owners on the LRU list for a while
NFSv4: Ensure that we don't drop a state owner more than once
NFSv4: Ensure we do not reuse open owner names
nfs: Enclose hostname in brackets when needed in nfs_do_root_mount
NFS: put open context on error in nfs_flush_multi
NFS: put open context on error in nfs_pagein_multi
NFSv4: Fix open(O_TRUNC) and ftruncate() error handling
NFSv4: Ensure that we check lock exclusive/shared type against open modes
NFSv4: Ensure that the LOCK code sets exception->inode
NFS: check for req==NULL in nfs_try_to_update_request cleanup
SUNRPC: register PipeFS file system after pernet sybsystem
Commit f5fff5d forgot to fix TCP_MAXSEG behavior IPv6 sockets, so IPv6
TCP server sockets that used TCP_MAXSEG would find that the advmss of
child sockets would be incorrect. This commit mirrors the advmss logic
from tcp_v4_syn_recv_sock in tcp_v6_syn_recv_sock. Eventually this
logic should probably be shared between IPv4 and IPv6, but this at
least fixes this issue.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It seems there is a logic error in trace_drop_common(), since we store
only 64 drops, even if they are from same location.
This fix is a one liner, but we probably need more work to avoid useless
atomic dec/inc
Now I can watch 1 Mpps drops through dropwatch...
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While reviewing the sysctl code in ax25 I spotted races in ax25_exit
where it is possible to receive notifications and packets after already
freeing up some of the data structures needed to process those
notifications and updates.
Call unregister_netdevice_notifier early so that the rest of the cleanup
code does not need to deal with network devices. This takes advantage
of my recent enhancement to unregister_netdevice_notifier to send
unregister notifications of all network devices that are current
registered.
Move the unregistration for packet types, socket types and protocol
types before we cleanup any of the ax25 data structures to remove the
possibilities of other races.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Beregalov reported skb_over_panic errors and provided stack
trace.
I occurs commit a21d45726a (tcp: avoid order-1 allocations on wifi and
tx path) added a regression, when a retransmit is done after a partial
ACK.
tcp_retransmit_skb() tries to aggregate several frames if the first one
has enough available room to hold the following ones payload. This is
controlled by /proc/sys/net/ipv4/tcp_retrans_collapse tunable (default :
enabled)
Problem is we must make sure _pskb_trim_head() doesnt fool
skb_availroom() when pulling some bytes from skb (this pull is done when
receiver ACK part of the frame).
Reported-by: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Marc MERLIN <marc@merlins.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When updating the stats for a given uid it would incorrectly assume
IPV4 and pick up the wrong protocol when IPV6.
Change-Id: Iea4a635012b4123bf7aa93809011b7b2040bb3d5
Signed-off-by: JP Abgrall <jpa@google.com>
There was a case that might have seemed like new_tag_stat was not
initialized and actually used.
Added comment explaining why it was impossible, and a BUG()
in case the logic gets changed.
Change-Id: I1eddd1b6f754c08a3bf89f7e9427e5dce1dfb081
Signed-off-by: JP Abgrall <jpa@google.com>
From John:
Another batch of fixes intended for 3.4...
First up, we have a minor signedness fix for libertas from Amitkumar
Karwar. Next, Arend gives us a brcm80211 fix for correctly enabling
Tx FIFOs on channels 12 and 13. Bing Zhao gives us some register
address corrections for mwifiex. Felix give us a trio of fixes --
one for ath9k to wake the hardware properly from full sleep, one for
mac80211 to properly handle packets in cooked monitor mode, and one
for ensuring that the proper HT mode selection is honored.
Hauke gives us a bcma fix for handling the lack of an sprom. Jonathon
Bither gives us an ath5k fix for a missing THIS_MODULE build issue,
and another ath5k fix for an io mapping leak. Lukasz Kucharczyk
fixes a bitwise check in cfg80211, and Sujith gives us an ath9k fix
for assigning sequence numbers for fragmented frames. Finally, we
have a MAINTAINERS change from Wey-Yi Guy -- congrats to Johannes
Berg for taking the lead on iwlwifi. :-)
Signed-off-by: David S. Miller <davem@davemloft.net>
PipeFS superblock creation routine relays on SUNRPC pernet data presense, which
is created on register_pernet_subsys() call in SUNRPC module init function.
Registering of PipeFS filesystem prior to registering of per-net subsystem
leads to races (mount of PipeFS can dereference uninitialized data).
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
ops_init should free the net_generic data on
init failure and __register_pernet_operations should not
call ops_free when NET_NS is not enabled.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcp_grow_window() has to grow rcv_ssthresh up to window_clamp, allowing
sender to increase its window.
tcp_grow_window() still assumes a tcp frame is under MSS, but its no
longer true with LRO/GRO.
This patch fixes one of the performance issue we noticed with GRO on.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The broken check leads to rate control attempting to use HT40 while
the driver is configured for HT20. This leads to interesting hardware
issues.
HT40 can only be used if the channel type is either HT40- or HT40+
and if the channel type of the cell matches the local type.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Cc: stable@vger.kernel.org
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Cooked monitor rx was recently changed to use ieee80211_add_rx_radiotap_header
instead of generating only limited radiotap information.
ieee80211_add_rx_radiotap_header assumes that FCS info is still present if
the hardware supports receiving it, however when cooked monitor rx packets
are processed, FCS info has already been stripped.
Fix this by adding an extra flag indicating FCS presence.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
A parameter set exists for WRED mode, called wred_set, to hold the same
values for qavg and qidlestart across all VQs. The WRED mode values had
been previously held in the VQ for the default DP. After these values
were moved to wred_set, the VQ for the default DP was no longer created
automatically (so that it could be omitted on purpose, to have packets
in the default DP enqueued directly to the device without using RED).
However, gred_dump() was overlooked during that change; in WRED mode it
still reads qavg/qidlestart from the VQ for the default DP, which might
not even exist. As a result, this command sequence will cause an oops:
tc qdisc add dev $DEV handle $HANDLE parent $PARENT gred setup \
DPs 3 default 2 grio
tc qdisc change dev $DEV handle $HANDLE gred DP 0 prio 8 $RED_OPTIONS
tc qdisc change dev $DEV handle $HANDLE gred DP 1 prio 8 $RED_OPTIONS
This fixes gred_dump() in WRED mode to use the values held in wred_set.
Signed-off-by: David Ward <david.ward@ll.mit.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the ipv6 dst cache which copy from the dst generated by ICMPV6 RA packet.
this dst cache will not check expire because it has no RTF_EXPIRES flag.
So this dst cache will always be used until the dst gc run.
Change the struct dst_entry,add a union contains new pointer from and expires.
When rt6_info.rt6i_flags has no RTF_EXPIRES flag,the dst.expires has no use.
we can use this field to point to where the dst cache copy from.
The dst.from is only used in IPV6.
rt6_check_expired check if rt6_info.dst.from is expired.
ip6_rt_copy only set dst.from when the ort has flag RTF_ADDRCONF
and RTF_DEFAULT.then hold the ort.
ip6_dst_destroy release the ort.
Add some functions to operate the RTF_EXPIRES flag and expires(from) together.
and change the code to use these new adding functions.
Changes from v5:
modify ip6_route_add and ndisc_router_discovery to use new adding functions.
Only set dst.from when the ort has flag RTF_ADDRCONF
and RTF_DEFAULT.then hold the ort.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added kfree_skb() calls in the chnk_net.c file on
the error paths.
Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Applications using L2TP/IP sockets want to be able to bind() an L2TP/IP
socket to set the local tunnel id while leaving the auto-assigned source
address alone. So if no source address is supplied, don't overwrite
the address already stored in the socket.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The l2tp_ip socket close handler does not update the module refcount
correctly which prevents module unload after the first bind() call on
an L2TPv3 IP encapulation socket.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
At the point of this error-handling code, alloc_skb has succeded, so free
the resulting skb by jumping to the err label.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Recently an oops was reported in phonet if there was a failure during
network namespace creation.
[ 163.733755] ------------[ cut here ]------------
[ 163.734501] kernel BUG at include/net/netns/generic.h:45!
[ 163.734501] invalid opcode: 0000 [#1] PREEMPT SMP
[ 163.734501] CPU 2
[ 163.734501] Pid: 19145, comm: trinity Tainted: G W 3.4.0-rc1-next-20120405-sasha-dirty #57
[ 163.734501] RIP: 0010:[<ffffffff824d6062>] [<ffffffff824d6062>] phonet_pernet+0x182/0x1a0
[ 163.734501] RSP: 0018:ffff8800674d5ca8 EFLAGS: 00010246
[ 163.734501] RAX: 000000003fffffff RBX: 0000000000000000 RCX: ffff8800678c88d8
[ 163.734501] RDX: 00000000003f4000 RSI: ffff8800678c8910 RDI: 0000000000000282
[ 163.734501] RBP: ffff8800674d5cc8 R08: 0000000000000000 R09: 0000000000000000
[ 163.734501] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880068bec920
[ 163.734501] R13: ffffffff836b90c0 R14: 0000000000000000 R15: 0000000000000000
[ 163.734501] FS: 00007f055e8de700(0000) GS:ffff88007d000000(0000) knlGS:0000000000000000
[ 163.734501] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 163.734501] CR2: 00007f055e6bb518 CR3: 0000000070c16000 CR4: 00000000000406e0
[ 163.734501] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 163.734501] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 163.734501] Process trinity (pid: 19145, threadinfo ffff8800674d4000, task ffff8800678c8000)
[ 163.734501] Stack:
[ 163.734501] ffffffff824d5f00 ffffffff810e2ec1 ffff880067ae0000 00000000ffffffd4
[ 163.734501] ffff8800674d5cf8 ffffffff824d667a ffff880067ae0000 00000000ffffffd4
[ 163.734501] ffffffff836b90c0 0000000000000000 ffff8800674d5d18 ffffffff824d707d
[ 163.734501] Call Trace:
[ 163.734501] [<ffffffff824d5f00>] ? phonet_pernet+0x20/0x1a0
[ 163.734501] [<ffffffff810e2ec1>] ? get_parent_ip+0x11/0x50
[ 163.734501] [<ffffffff824d667a>] phonet_device_destroy+0x1a/0x100
[ 163.734501] [<ffffffff824d707d>] phonet_device_notify+0x3d/0x50
[ 163.734501] [<ffffffff810dd96e>] notifier_call_chain+0xee/0x130
[ 163.734501] [<ffffffff810dd9d1>] raw_notifier_call_chain+0x11/0x20
[ 163.734501] [<ffffffff821cce12>] call_netdevice_notifiers+0x52/0x60
[ 163.734501] [<ffffffff821cd235>] rollback_registered_many+0x185/0x270
[ 163.734501] [<ffffffff821cd334>] unregister_netdevice_many+0x14/0x60
[ 163.734501] [<ffffffff823123e3>] ipip_exit_net+0x1b3/0x1d0
[ 163.734501] [<ffffffff82312230>] ? ipip_rcv+0x420/0x420
[ 163.734501] [<ffffffff821c8515>] ops_exit_list+0x35/0x70
[ 163.734501] [<ffffffff821c911b>] setup_net+0xab/0xe0
[ 163.734501] [<ffffffff821c9416>] copy_net_ns+0x76/0x100
[ 163.734501] [<ffffffff810dc92b>] create_new_namespaces+0xfb/0x190
[ 163.734501] [<ffffffff810dca21>] unshare_nsproxy_namespaces+0x61/0x80
[ 163.734501] [<ffffffff810afd1f>] sys_unshare+0xff/0x290
[ 163.734501] [<ffffffff8187622e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 163.734501] [<ffffffff82665539>] system_call_fastpath+0x16/0x1b
[ 163.734501] Code: e0 c3 fe 66 0f 1f 44 00 00 48 c7 c2 40 60 4d 82 be 01 00 00 00 48 c7 c7 80 d1 23 83 e8 48 2a c4 fe e8 73 06 c8 fe 48 85 db 75 0e <0f> 0b 0f 1f 40 00 eb fe 66 0f 1f 44 00 00 48 83 c4 10 48 89 d8
[ 163.734501] RIP [<ffffffff824d6062>] phonet_pernet+0x182/0x1a0
[ 163.734501] RSP <ffff8800674d5ca8>
[ 163.861289] ---[ end trace fb5615826c548066 ]---
After investigation it turns out there were two issues.
1) Phonet was not implementing network devices but was using register_pernet_device
instead of register_pernet_subsys.
This was allowing there to be cases when phonenet was not initialized and
the phonet net_generic was not set for a network namespace when network
device events were being reported on the netdevice_notifier for a network
namespace leading to the oops above.
2) phonet_exit_net was implementing a confusing and special case of handling all
network devices from going away that it was hard to see was correct, and would
only occur when the phonet module was removed.
Now that unregister_netdevice_notifier has been modified to synthesize unregistration
events for the network devices that are extant when called this confusing special
case in phonet_exit_net is no longer needed.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We already synthesize events in register_netdevice_notifier and synthesizing
events in unregister_netdevice_notifier allows to us remove the need for
special case cleanup code.
This change should be safe as it adds no new cases for existing callers
of unregiser_netdevice_notifier to handle.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
1) Fix bluetooth userland regression reported by Keith Packard, from
Gustavo Padovan.
2) Revert ath9k PS idle change, from Sujith Manoharan.
3) Correct default TCP memory limits (again), from Eric Dumazet.
4) Fix tcp_rcv_rtt_update() accidental use of unscaled RTT, from Neal
Cardwell.
5) We made a facility for layers like wireless to say how much tailroom
they need in the SKB for link layer stuff such as wireless
encryption etc., but TCP works hard to fill every SKB out to the end
defeating this specification.
This leads to every TCP packet getting reallocated by the wireless
code in order to have the right amount of tailroom available.
Fix TCP to only fill SKBs out to the real amount of data area it
asked for during the allocation, this way it won't eat into the
slack added for the device's tailroom needs.
Reported by Marc Merlin and fixed by Eric Dumazet.
6) Leaks, endian bugs, and new device IDs in bluetooth from Santosh
Nayak, João Paulo Rechi Vita, Cho, Yu-Chen, Andrei Emeltchenko,
AceLan Kao, and Andrei Emeltchenko.
7) OOPS on tty_close fix in bluetooth's hci_ldisc from Johan Hovold.
8) netfilter erroneously scales TCP window twice, fix from Changli Gao.
9) Memleak fix in wext-core from Julia Lawall.
10) Consistently handle invalid TCP packets in ipv4 vs. ipv6 conntrack,
from Jozsef Kadlecsik.
11) Validate IP header length properly in netfilter conntrack's
ipv4_get_l4proto().
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (39 commits)
NFC: Fix the LLCP Tx fragmentation loop
rtlwifi: Add missing DMA buffer unmapping for PCI drivers
rtlwifi: Preallocate USB read buffers and eliminate kalloc in read routine
tcp: avoid order-1 allocations on wifi and tx path
net: allow pskb_expand_head() to get maximum tailroom
bridge: Do not send queries on multicast group leaves
MAINTAINERS: Mark NATSEMI driver as orphan'd.
tcp: fix tcp_rcv_rtt_update() use of an unscaled RTT sample
tcp: restore correct limit
Revert "ath9k: fix going to full-sleep on PS idle"
rt2x00: Fix rfkill_polling register function.
bcma: fix build error on MIPS; implicit pcibios_enable_device
netfilter: nf_conntrack: fix incorrect logic in nf_conntrack_init_net
netfilter: nf_ct_ipv4: packets with wrong ihl are invalid
netfilter: nf_ct_ipv4: handle invalid IPv4 and IPv6 packets consistently
net/wireless/wext-core.c: add missing kfree
rtlwifi: Fix oops on rate-control failure
mac80211: Convert WARN_ON to WARN_ON_ONCE
rtlwifi: rtl8192de: Fix firmware initialization
nl80211: ensure interface is up in various APIs
...
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Marc Merlin reported many order-1 allocations failures in TX path on its
wireless setup, that dont make any sense with MTU=1500 network, and non
SG capable hardware.
After investigation, it turns out TCP uses sk_stream_alloc_skb() and
used as a convention skb_tailroom(skb) to know how many bytes of data
payload could be put in this skb (for non SG capable devices)
Note : these skb used kmalloc-4096 (MTU=1500 + MAX_HEADER +
sizeof(struct skb_shared_info) being above 2048)
Later, mac80211 layer need to add some bytes at the tail of skb
(IEEE80211_ENCRYPT_TAILROOM = 18 bytes) and since no more tailroom is
available has to call pskb_expand_head() and request order-1
allocations.
This patch changes sk_stream_alloc_skb() so that only
sk->sk_prot->max_header bytes of headroom are reserved, and use a new
skb field, avail_size to hold the data payload limit.
This way, order-0 allocations done by TCP stack can leave more than 2 KB
of tailroom and no more allocation is performed in mac80211 layer (or
any layer needing some tailroom)
avail_size is unioned with mark/dropcount, since mark will be set later
in IP stack for output packets. Therefore, skb size is unchanged.
Reported-by: Marc MERLIN <marc@merlins.org>
Tested-by: Marc MERLIN <marc@merlins.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marc Merlin reported many order-1 allocations failures in TX path on its
wireless setup, that dont make any sense with MTU=1500 network, and non
SG capable hardware.
Turns out part of the problem comes from pskb_expand_head() not using
ksize() to get exact head size given by kmalloc(). Doing the same thing
than __alloc_skb() allows more tailroom in skb and can prevent future
reallocations.
As a bonus, struct skb_shared_info becomes cache line aligned.
Reported-by: Marc MERLIN <marc@merlins.org>
Tested-by: Marc MERLIN <marc@merlins.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As it stands the bridge IGMP snooping system will respond to
group leave messages with queries for remaining membership.
This is both unnecessary and undesirable. First of all any
multicast routers present should be doing this rather than us.
What's more the queries that we send may end up upsetting other
multicast snooping swithces in the system that are buggy.
In fact, we can simply remove the code that send these queries
because the existing membership expiry mechanism doesn't rely
on them anyway.
So this patch simply removes all code associated with group
queries in response to group leave messages.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
1/ regression fix for Xen as it now trips over a broken assumption
about the dma address size on 32-bit builds
2/ new quirk for netdma to ignore dma channels that cannot meet
netdma alignment requirements
3/ fixes for two long standing issues in ioatdma (ring size overflow)
and iop-adma (potential stack corruption)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJPhIfhAAoJEB7SkWpmfYgCguIQAL4qF+RC9/JggSHIjfOrYiPd
yboV80GqqQHHBwy8hfZVUrIEPMebvD/xUIk6iUQNXR+6EA8Ln0jukvQMpWNnI+Cc
TXgA5Ok70an4PD1MqnCsWyCJjsyPyhprbRHurxBcesf+y96POJxhING0rcKvft50
mvYnbtrkYe9M9x3b8TBGc0JaTVeL29Ck3FtkTz4uUktbkhRNfCcfEd28NRQpf8MB
vkjbjRGBQmGsnKxYCaEhlF1GPJyTlYjg4BBWtseJgb2R9s7tvJrkotFea/NmSfjq
XCuVKjpiFp3YyJuxJERWdwqRWvyAZFfcYyZX440nG0b7GBgSn+T7A9XhUs8vMboi
tLwoDfBbJDlKMaFpHex7Z6RtZZmVl3gWDNZTqpG44n4pabd4RPip04f0k7Wfs+cp
tzU9hGAOvgsZ8w4/JgxH8YJOZbIGzbDGOA1IhWcbxIbmFTblMiFnV3TC7qfhoRbR
8qtScIE7bUck2MYVlMMn9utd9tvKFa6HNgo41+f78/4+U7zQ/VrsbA/DWQct40R5
5k+EEvyYFUzIXn79E0GVN5h4NHH5gfAs3MZ7jIgwgHedBp4Ki68XYKNu+pIV3YwG
CFTPn1mVOXnCdt+fsjG5tL9Jecx1Mij6w3nWU93ZU6cHmC77YmU+DLxPIGuyR1a2
EmpObwfq5peXzkgQpEsB
=F3IR
-----END PGP SIGNATURE-----
Merge tag 'dmaengine-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/dmaengine
Pull dmaengine fixes from Dan Williams:
1/ regression fix for Xen as it now trips over a broken assumption
about the dma address size on 32-bit builds
2/ new quirk for netdma to ignore dma channels that cannot meet
netdma alignment requirements
3/ fixes for two long standing issues in ioatdma (ring size overflow)
and iop-adma (potential stack corruption)
* tag 'dmaengine-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/dmaengine:
netdma: adding alignment check for NETDMA ops
ioatdma: DMA copy alignment needed to address IOAT DMA silicon errata
ioat: ring size variables need to be 32bit to avoid overflow
iop-adma: Corrected array overflow in RAID6 Xscale(R) test.
ioat: fix size of 'completion' for Xen
Fix a code path in tcp_rcv_rtt_update() that was comparing scaled and
unscaled RTT samples.
The intent in the code was to only use the 'm' measurement if it was a
new minimum. However, since 'm' had not yet been shifted left 3 bits
but 'new_sample' had, this comparison would nearly always succeed,
leading us to erroneously set our receive-side RTT estimate to the 'm'
sample when that sample could be nearly 8x too high to use.
The overall effect is to often cause the receive-side RTT estimate to
be significantly too large (up to 40% too large for brief periods in
my tests).
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit c43b874d5d (tcp: properly initialize tcp memory limits) tried
to fix a regression added in commits 4acb4190 & 3dc43e3,
but still get it wrong.
Result is machines with low amount of memory have too small tcp_rmem[2]
value and slow tcp receives : Per socket limit being 1/1024 of memory
instead of 1/128 in old kernels, so rcv window is capped to small
values.
Fix this to match comment and previous behavior.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Glauber Costa <glommer@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
in function nf_conntrack_init_net,when nf_conntrack_timeout_init falied,
we should call nf_conntrack_ecache_fini to do rollback.
but the current code calls nf_conntrack_timeout_fini.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
It was reported that the Linux kernel sometimes logs:
klogd: [2629147.402413] kernel BUG at net / netfilter /
nf_conntrack_proto_tcp.c: 447!
klogd: [1072212.887368] kernel BUG at net / netfilter /
nf_conntrack_proto_tcp.c: 392
ipv4_get_l4proto() in nf_conntrack_l3proto_ipv4.c and tcp_error() in
nf_conntrack_proto_tcp.c should catch malformed packets, so the errors
at the indicated lines - TCP options parsing - should not happen.
However, tcp_error() relies on the "dataoff" offset to the TCP header,
calculated by ipv4_get_l4proto(). But ipv4_get_l4proto() does not check
bogus ihl values in IPv4 packets, which then can slip through tcp_error()
and get caught at the TCP options parsing routines.
The patch fixes ipv4_get_l4proto() by invalidating packets with bogus
ihl value.
The patch closes netfilter bugzilla id 771.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
IPv6 conntrack marked invalid packets as INVALID and let the user
drop those by an explicit rule, while IPv4 conntrack dropped such
packets itself.
IPv4 conntrack is changed so that it marks INVALID packets and let
the user to drop them.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Since rx_bytes accounting does not include Ethernet Headers in
br_input.c, excluding ETH_HLEN on the transmit path for consistent
measurement of packet length on both the Tx and Rx chains.
The clean way would be for Rx to include the eth header, but the
skb len has already been adjusted by the time the br code sees the skb.
This is only a temporary workaround until we can completely ignore or
cleanly fix the skb->len handling.
Change-Id: I910de95a4686b2119da7f1f326e2154ef31f9972
Signed-off-by: Ashish Sharma <ashishsharma@google.com>
When calling:
ipv6_find_hdr(skb, &thoff, -1, NULL)
on a fragmented packet, thoff would be left with a random
value causing callers to read random memory offsets with:
skb_header_pointer(skb, thoff, ...)
Now we force ipv6_find_hdr() to return a failure in this case.
Calling:
ipv6_find_hdr(skb, &thoff, -1, &fragoff)
will set fragoff as expected, and not return a failure.
Change-Id: Ib474e8a4267dd2b300feca325811330329684a88
Signed-off-by: JP Abgrall <jpa@google.com>
The xt_quota2 came from
http://sourceforge.net/projects/xtables-addons/develop
It needed tweaking for it to compile within the kernel tree.
Fixed kmalloc() and create_proc_entry() invocations within
a non-interruptible context.
Removed useless copying of current quota back to the iptable's
struct matchinfo:
- those are per CPU: they will change randomly based on which
cpu gets to update the value.
- they prevent matching a rule: e.g.
-A chain -m quota2 --name q1 --quota 123
can't be followed by
-D chain -m quota2 --name q1 --quota 123
as the 123 will be compared to the struct matchinfo's quota member.
Use the NETLINK NETLINK_NFLOG family to log a single message
when the quota limit is reached.
It uses the same packet type as ipt_ULOG, but
- never copies skb data,
- uses 112 as the event number (ULOG's +1)
It doesn't log if the module param "event_num" is 0.
Change-Id: I021d3b743db3b22158cc49acb5c94d905b501492
Signed-off-by: JP Abgrall <jpa@google.com>
The original xt_quota in the kernel is plain broken:
- counts quota at a per CPU level
(was written back when ubiquitous SMP was just a dream)
- provides no way to count across IPV4/IPV6.
This patch is the original unaltered code from:
http://sourceforge.net/projects/xtables-addons
at commit e84391ce665cef046967f796dd91026851d6bbf3
Change-Id: I19d49858840effee9ecf6cff03c23b45a97efdeb
Signed-off-by: JP Abgrall <jpa@google.com>
Allow the REJECT --reject-with icmp*blabla to also set the matching error
locally on the socket affected by the reject.
This allows the process to see an error almost as if it received it
via ICMP.
It avoids the local process who's ingress packet is rejected to have to
wait for a pseudo-eternity until some timeout kicks in.
Ideally, this should be enabled with a new iptables flag similar to
--reject-with-sock-err
For now it is enabled with CONFIG_IP*_NF_TARGET_REJECT_SKERR option.
Change-Id: I649a4fd5940029ec0b3233e5abb205da6984891e
Signed-off-by: JP Abgrall <jpa@google.com>
This module allows tracking stats at the socket level for given UIDs.
It replaces xt_owner.
If the --uid-owner is not specified, it will just count stats based on
who the skb belongs to. This will even happen on incoming skbs as it
looks into the skb via xt_socket magic to see who owns it.
If an skb is lost, it will be assigned to uid=0.
To control what sockets of what UIDs are tagged by what, one uses:
echo t $sock_fd $accounting_tag $the_billed_uid \
> /proc/net/xt_qtaguid/ctrl
So whenever an skb belongs to a sock_fd, it will be accounted against
$the_billed_uid
and matching stats will show up under the uid with the given
$accounting_tag.
Because the number of allocations for the stats structs is not that big:
~500 apps * 32 per app
we'll just do it atomic. This avoids walking lists many times, and
the fancy worker thread handling. Slabs will grow when needed later.
It use netdevice and inetaddr notifications instead of hooks in the core dev
code to track when a device comes and goes. This removes the need for
exposed iface_stat.h.
Put procfs dirs in /proc/net/xt_qtaguid/
ctrl
stats
iface_stat/<iface>/...
The uid stats are obtainable in ./stats.
Change-Id: I01af4fd91c8de651668d3decb76d9bdc1e343919
Signed-off-by: JP Abgrall <jpa@google.com>
The socket matching function has some nifty logic to get the struct sock
from the skb or from the connection tracker.
We export this so other xt_* can use it, similarly to ho how
xt_socket uses nf_tproxy_get_sock.
Change-Id: I11c58f59087e7f7ae09e4abd4b937cd3370fa2fd
Signed-off-by: JP Abgrall <jpa@google.com>