android_kernel_samsung_msm8976

Commit Graph

Author	SHA1	Message	Date
Davidlohr Bueso	b6d5307265	mm,vmacache: add debug data Introduce a CONFIG_DEBUG_VM_VMACACHE option to enable counting the cache hit rate -- exported in /proc/vmstat. Any updates to the caching scheme needs this kind of data, thus it can save some work re-implementing the counting all the time. Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-07-27 22:08:07 +02:00
Linus Torvalds	3dbd0d29e6	mm: don't pointlessly use BUG_ON() for sanity check BUG_ON() is a big hammer, and should be used _only_ if there is some major corruption that you cannot possibly recover from, making it imperative that the current process (and possibly the whole machine) be terminated with extreme prejudice. The trivial sanity check in the vmacache code is not such a fatal error. Recovering from it is absolutely trivial, and using BUG_ON() just makes it harder to debug for no actual advantage. To make matters worse, the placement of the BUG_ON() (only if the range check matched) actually makes it harder to hit the sanity check to begin with, so _if_ there is a bug (and we just got a report from Srivatsa Bhat that this can indeed trigger), it is harder to debug not just because the machine is possibly dead, but because we don't have better coverage. BUG_ON() must die. Maybe we should add a checkpatch warning for it, because it is simply just about the worst thing you can ever do if you hit some "this cannot happen" situation. Reported-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-07-27 22:08:06 +02:00
Davidlohr Bueso	7c1a95e0ae	mm: per-thread vma caching This patch is a continuation of efforts trying to optimize find_vma(), avoiding potentially expensive rbtree walks to locate a vma upon faults. The original approach (https://lkml.org/lkml/2013/11/1/410), where the largest vma was also cached, ended up being too specific and random, thus further comparison with other approaches were needed. There are two things to consider when dealing with this, the cache hit rate and the latency of find_vma(). Improving the hit-rate does not necessarily translate in finding the vma any faster, as the overhead of any fancy caching schemes can be too high to consider. We currently cache the last used vma for the whole address space, which provides a nice optimization, reducing the total cycles in find_vma() by up to 250%, for workloads with good locality. On the other hand, this simple scheme is pretty much useless for workloads with poor locality. Analyzing ebizzy runs shows that, no matter how many threads are running, the mmap_cache hit rate is less than 2%, and in many situations below 1%. The proposed approach is to replace this scheme with a small per-thread cache, maximizing hit rates at a very low maintenance cost. Invalidations are performed by simply bumping up a 32-bit sequence number. The only expensive operation is in the rare case of a seq number overflow, where all caches that share the same address space are flushed. Upon a miss, the proposed replacement policy is based on the page number that contains the virtual address in question. Concretely, the following results are seen on an 80 core, 8 socket x86-64 box: 1) System bootup: Most programs are single threaded, so the per-thread scheme does improve ~50% hit rate by just adding a few more slots to the cache. +----------------+----------+------------------+ \| caching scheme \| hit-rate \| cycles (billion) \| +----------------+----------+------------------+ \| baseline \| 50.61% \| 19.90 \| \| patched \| 73.45% \| 13.58 \| +----------------+----------+------------------+ 2) Kernel build: This one is already pretty good with the current approach as we're dealing with good locality. +----------------+----------+------------------+ \| caching scheme \| hit-rate \| cycles (billion) \| +----------------+----------+------------------+ \| baseline \| 75.28% \| 11.03 \| \| patched \| 88.09% \| 9.31 \| +----------------+----------+------------------+ 3) Oracle 11g Data Mining (4k pages): Similar to the kernel build workload. +----------------+----------+------------------+ \| caching scheme \| hit-rate \| cycles (billion) \| +----------------+----------+------------------+ \| baseline \| 70.66% \| 17.14 \| \| patched \| 91.15% \| 12.57 \| +----------------+----------+------------------+ 4) Ebizzy: There's a fair amount of variation from run to run, but this approach always shows nearly perfect hit rates, while baseline is just about non-existent. The amounts of cycles can fluctuate between anywhere from ~60 to ~116 for the baseline scheme, but this approach reduces it considerably. For instance, with 80 threads: +----------------+----------+------------------+ \| caching scheme \| hit-rate \| cycles (billion) \| +----------------+----------+------------------+ \| baseline \| 1.06% \| 91.54 \| \| patched \| 99.97% \| 14.18 \| +----------------+----------+------------------+ [akpm@linux-foundation.org: fix nommu build, per Davidlohr] [akpm@linux-foundation.org: document vmacache_valid() logic] [akpm@linux-foundation.org: attempt to untangle header files] [akpm@linux-foundation.org: add vmacache_find() BUG_ON] [hughd@google.com: add vmacache_valid_mm() (from Oleg)] [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: adjust and enhance comments] Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Michel Lespinasse <walken@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Tested-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-07-27 22:08:06 +02:00
Sean Tranchetti	0d2f1604f9	af_key: unconditionally clone on broadcast Attempting to avoid cloning the skb when broadcasting by inflating the refcount with sock_hold/sock_put while under RCU lock is dangerous and violates RCU principles. It leads to subtle race conditions when attempting to free the SKB, as we may reference sockets that have already been freed by the stack. Unable to handle kernel paging request at virtual address 6b6b6b6b6b6c4b [006b6b6b6b6b6c4b] address between user and kernel address ranges Internal error: Oops: 96000004 [#1] PREEMPT SMP task: fffffff78f65b380 task.stack: ffffff8049a88000 pc : sock_rfree+0x38/0x6c lr : skb_release_head_state+0x6c/0xcc Process repro (pid: 7117, stack limit = 0xffffff8049a88000) Call trace: sock_rfree+0x38/0x6c skb_release_head_state+0x6c/0xcc skb_release_all+0x1c/0x38 __kfree_skb+0x1c/0x30 kfree_skb+0xd0/0xf4 pfkey_broadcast+0x14c/0x18c pfkey_sendmsg+0x1d8/0x408 sock_sendmsg+0x44/0x60 ___sys_sendmsg+0x1d0/0x2a8 __sys_sendmsg+0x64/0xb4 SyS_sendmsg+0x34/0x4c el0_svc_naked+0x34/0x38 Kernel panic - not syncing: Fatal exception CRs-Fixed: 2251019 Change-Id: Ib3b01f941a34a7df61fe9445f746b7df33f4656a Signed-off-by: Sean Tranchetti <stranche@codeaurora.org>	2019-07-27 22:08:06 +02:00
syphyr	f5cf52ee67	qcacld-2.0: Zero out context buffer in HDD This is a backport of: "qcacld-2.0: Dump Snapshot of the driver for LL"	2019-07-27 22:08:05 +02:00
syphyr	cc2c36e5a0	qcacld-2.0: core: Replace remaining instances of unadorned %p Replace instances of unadorned %p in CORE.	2019-07-27 22:08:05 +02:00
Jeff Johnson	5ce25b0d78	qcacld-2.0: hdd: Replace instances of unadorned %p Replace instances of unadorned %p in CORE/HDD. Change-Id: I32b89aaf6a8b1ca3177e0c1cb5cec5fbc5f5294a CRs-Fixed: 2111273	2019-07-27 22:08:04 +02:00
Nirav Shah	52b90c2dd2	qcacld-2.0: Add debug logs Add debug logs in WLANTL_RegisterSTAClient and WLANTL_ClearSTAClient. CRs-Fixed: 1036774 Change-Id: I70f19731e576c65432919588348c19ccbf7bca61	2019-07-27 22:08:04 +02:00
Govind Singh	74ddae80eb	qcacld-2.0: Restore 802.11 header pointer for PMF case In PMF case CCMP header and trailers of rx management frames are stripped out. After stripping security headers we are using old 802.11 header pointer, this is resulting in invalid dereference to 802.11 header fields. Restore 802.11 header pointer after security headers are stripped out in case of PMF. Change-Id: I6a26dbb0707b7981ea091526d1e49dc5bf8c9e91 CRs-Fixed: 1024097	2019-07-27 22:08:04 +02:00
Padma, Santhosh Kumar	2841d21310	qcacld-2.0: Send ESE becaon report if request is valid prima to qcacld-2.0 propagation Currently if connection is ESE and RRM beacon request is received, eseProcessBeaconReportXmit is invoked as part of sending report which results in error as there is no ese request. Add a check to invoke eseProcessBeaconReportXmit only if measurement request is valid. Change-Id: I3fe6101b888c70670a371a1eb45b47d756511b1d CRs-Fixed: 1002305	2019-07-27 22:08:03 +02:00
Himanshu Agarwal	b410c951a4	qcacld-2.0: Fix static code analysis error Fix static code analysis error in TLSHIM layer. Change-Id: I81e5b7d5910919573b69faf7cfa3210eace9d6d4 CRs-Fixed: 1008197	2019-07-27 22:08:03 +02:00
Krishna Kumaar Natarajan	5d6242a3b5	qcacld-2.0: Fix layering violation while handling management frames qcacld-3.0 to qcacld-2.0 propagation Fix layering violation while handling management frames. Currently LIM data structures are accessed before dropping Assoc, Disassoc and Deauth packets to avoid DoS attacks. Since the LIM data structures are accessed in different thread context, data present in them are out of sync resulting in a crash. Fix the layering violation by doing appropriate check in WMA instead of doing the same in LIM. Change-Id: I8876a4d4b99948cd9ab3ccec403cf5e4050b1cff CRs-Fixed: 977773	2019-07-27 22:08:03 +02:00
zhangq	cd3ae2df46	qcacld-2.0: Resolve memory leakage in OCB sme_utc buffer is not freed if message posting to WDA/WMA fails. Change-Id: Id91003198c2c06e45ec970cb9a23f4e8279220d4 CRs-Fixed: 1002063	2019-07-27 22:08:02 +02:00
Gao Wu	cb7fb582c1	qcacld-2.0: update payload length of MGMT frame There are some access points that have not included the capability field in the RSN ie's though the length for the RSN IE's indicate for the presence of this field. A workaround for this issue adds two default bytes as RSN capability, but it does not update payload length. This causes supplicant to get wrong RSN capability and then security mismatch in host driver when connecting to the AP. Change-Id: I03ea3e293df8cbe545a70af03b1038b6fad5a261 CRs-Fixed: 993795	2019-07-27 22:08:02 +02:00
Himanshu Agarwal	3a6de60eca	qcacld-2.0: Refactor intra bss forwarded packets count Initially, when a packet is forwarded from txrx layer, it is added in count only once although the count should increase by 2 as there is one rx packet and one tx packet that is not getting considered in the hdd packet count. Add code to ensure that when packet is forwarded from lower layers, it get considered accurately in the packet count. Change-Id: I47bc1e0ecfa2e831438534cf34d37086a306b4e9 CRs-Fixed: 996735	2019-07-27 22:08:01 +02:00
Himanshu Agarwal	d9f3dc97b1	qcacld-2.0: Remove error print from kmsg Remove error print from kmsg as this print is unnecessary and may flood the kmsg. Change-Id: I0978f88af6677cb0c1e1db5eae7e5d6a69bd4b70 CRs-Fixed: 997243	2019-07-27 22:08:01 +02:00
Himanshu Agarwal	6b9518faf6	qcacld-2.0: Add intra bss forwarded packets count In lpm qos voting, no. of packets or bytes sent or received in a particular amount of time is recorded and decision of disabling or enabling lpm is done based on that. These packets are recorded in HDD layer. In case when packets are forwarded to tx only, packets don't come upto HDD layer and so in case of intra bss forwarding, lpm qos voting is not being executed appropriately. Add code to calculate the intra bss forwarded packets in txrx layer and update them in calculating lpm qos voting. Change-Id: I805663688cb300c8735b3e2f9680818a7b50bc9f CRs-Fixed: 990868	2019-07-27 22:08:01 +02:00
c_zding	b74a1497d5	qcacld-2.0: Fix other variable type used as boolean and initialized variable In current logic the member "cbMode" of structure "tSirSmeStartBssReq" and member "secondarySubBand" of structure "tLimChannelSwitchInfo" used as boolean, actually "cbMode" defined as "tANI_U8" type and "secondarySubBand" defined as "enum" type. Initialized variable "getAssocSTAsReq" before be used. Change function "ATH_DFSEVENTQ_UNLOCK" position, optimize the efficiency. Change-Id: Ic5fec6c00b4bbfed53ebb9b5f965930f26171a11 CRs-Fixed: 969139	2019-07-27 22:08:00 +02:00
Selvaraj, Sridhar	7217c4e77e	qcacld-2.0: Trigger Auth req(OPEN) when SHARED times out When the OPEN SHARED /WEP is configured, the current implementation will start Auth request with SHARED and if it fails, Triggers Auth request with OPEN. This change will trigger the OPEN Auth request when the timeout happens (no Auth response received for previous attempt(AUTH req SHARED case).Some AP's dosent respond to Shared Auth if they support only Open. To inter-operate with these kind of AP's try Open Auth if Auth timeout happens with Shared Auth. Change-Id: I28b9186b9dc238640fd7655c9ac73e8aa89aec54 CRs-Fixed: 984341	2019-07-27 22:08:00 +02:00
Samuel Ahn	29164f263f	qcacld-2.0: Add support for default TX params in OCB mode When OCB mode is configured, default TX parameters can be provided. These default TX parameters are used if a packet is sent without a TX control header. Change-Id: I72b3799cb0a9e00a60548facf25e57be241d82d7 CRs-Fixed: 964279	2019-07-27 22:07:59 +02:00
Eric Dumazet	a6cf2de288	tcp: tcp_v4_err() should be more careful [ Upstream commit 2c4cc9712364c051b1de2d175d5fbea6be948ebf ] ICMP handlers are not very often stressed, we should make them more resilient to bugs that might surface in the future. If there is no packet in retransmit queue, we should avoid a NULL deref. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: soukjin bae <soukjin.bae@samsung.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-07-27 22:07:59 +02:00
Yingying Tang	460b6f3aeb	qcacld-2.0: Add null pointer check while processing GID action frame In __limProcessGidManagementActionFrame(), a pointer is used without NULL pointer check. Add fix to avoid the risk. Change-Id: I9ed2ee6d85726c53ebfc036d320b28775d4e5d32 CRs-Fixed: 979671	2019-07-27 22:07:59 +02:00
Padma, Santhosh Kumar	2ebfca5060	qcacld-2.0: Validate pHashTable prima to qcacld-2.0 propagation When deauth/disassoc is received from peer at the same time when cleanup in progress because of disconnect from supplicant, there is a chance that pHashTable can be NULL. Memory pointed by pHashTable is freed during peDeleteSession, which is called during cleanup. In dphLookupHashEntry, pHashTable is referenced without any NULL check, which can lead to crash. Fix this by validating pHashTable for NULL check. Add a NULL check in _limProcessOperatingModeActionFrame before referencing sta context to resolve potential KW issue. Change-Id: I74d5c739cade19941320ee02eddc09e4fc74b105 CRs-Fixed: 898375	2019-07-27 22:07:58 +02:00
Yingying Tang	a9bfe7022d	qcacld-2.0: Add null pointer check while processing Operation Mode action frame In __limProcessOperatingModeActionFrame(), a pointer is used without NULL pointer check. Add fix to avoid the risk. Change-Id: I5d5a26b53781272406a0f1d46a90b5ef138ce552 CRs-Fixed: 979671	2019-07-27 22:07:58 +02:00
Jingxiang Ge	0c3702f9bf	qcacld-2.0: Fix buffer overwrite problem in GETIBSSPEERINFO If (length + 1) is greater than priv_data.total_len then copy_to_user results in writing more data than the buffer can hold. Fix this by writing mininum of (length + 1) and priv_data.total_len. Change-Id: If0c74b3c6c76ee3ca296fd8e0e844b9c53c30498 CRs-Fixed: 2344325	2019-07-27 22:07:58 +02:00
Guisen Yang	8259595fc6	qcacld-2.0: Fix kw issues:check NULL, initialize data and LOCRET The abnormal NULL check form cannot be detected by kw. Data used before NULL check and local address returned by function should be fixed. Change the NULL check form. Fix the use before NULL check and data used before initialization. Fix the LOCRET issue. Change-Id: Ic1756f0e45de0f407ec9e4193fbbaec885f05f67 CRs-Fixed: 2209931	2019-07-27 22:07:57 +02:00
Rongjing Liao	3cfcc97265	qcacld-2.0: add NULL point condition check for fixing KW issues add NULL point condition check for fixing KW issues Change-Id: I38b3b087fa67909c59f3d01e0b3051e4f8f56464 Signed-off-by: Rongjing Liao <liaor@codeaurora.org>	2019-07-27 22:07:57 +02:00
tinlin	440a9abf2d	qcacld-2.0: Check for minimum frameLen for action frames Propagation from cld3.0 to cld2.0. In limProcessActionFrame and limProcessActionFrameNoSession, The Rx frame pointer is directly casted to the action frame header to find the Action frame category and action ID without validating the minimum length of the frame. If the frame len is less than the action frame header len, then OOB read would occur. Check if frame_len is less than the size of action frame header len and return if true. Change-ID: Idf8ca7eeacdf57171d2850fe6317784911830aac CRs-Fixed: 2333070	2019-07-27 22:07:56 +02:00
Abhishek Singh	6a79179539	qcacld-2.0: Add check for robust action frame while sending action frames prima to qcacld-2.0 propagation Currently if PMF is enabled, only sa query action frames received from supplicant are sent protected. None of the other action frame catagory are sent protected. Adds check for robust action frames, to decide if protection is needed for the action frame catagory received from supplicant. Change-Id: Ib1eb589c530ef99b7e2fedfcd106e0f646d78d93 CRs-Fixed: 960298	2019-07-27 22:07:56 +02:00
Sriram, Madhvapathi	7e1bd968da	qcacld-2.0: Fix check for adapter device_mode while changing back from IBSS Presently, if the current device_mode is STA/P2P/AP interface mode change takes effect. If the current device_mode is IBSS the interface mode change is rejected in __wlan_hdd_cfg80211_change_iface. This causes user applications like wpa_supplicant, which start the interface in STA mode, to fail to reconfigure the interface after termination CRs-Fixed: 1005808 Change-Id: I67bcdb7453e8232dc711499ee66793877697582b	2019-07-27 22:07:56 +02:00
Sachin Ahuja	f3366fc9f6	qcacld-2.0: Pass the correct userData in wpalTimerCback prima to qcacld-2.0 propagation During Reinit, driver sends the FW download request and if request is timed out then the timercallback is executed in WD thread. Currently userdata is passed as NULL if timercallback is executed in WD thread. Update code to pass the correct userdata in timer callback when it is called in WD thread context. Change-Id: I10a9cf8c53ded7d9db4bff0761f7b86a9021011a CRs-Fixed: 1020713	2019-07-27 22:07:55 +02:00
Agrawal Ashish	f5da811e12	qcacld-2.0: Register Callback for fullPower before posting message prima to qcacld-2.0 propagation In pmcRequestFullPower, driver is posting message to enter in full power with wpa_supplicant thread. After posting message to enter in full power context has been switched to MC thread. MC thread starts processing IMPS RESPONSE, even before Supplicant thread can add callback entry to requestFullPowerList, so in effect the IMPS response handler does not invoke any callbacks, and command sitting in roam pending list does not get processed. Fix this by posting callback before posting message to enter in full power. If enter full power get fails remove the entry. Change-Id: If3d32d6998bf7f65171a8d501db69e72a6ee2865 CRs-Fixed: 903963	2019-07-27 22:07:55 +02:00
Jason A. Donenfeld	e62a2e3be5	net_dbg_ratelimited: turn into no-op when !DEBUG commit d92cff89a0c80e7e49796366e441d97f07b5d321 upstream. The pr_debug family of functions turns into a no-op when -DDEBUG is not specified, opting instead to call "no_printk", which gets compiled to a no-op (but retains gcc's nice warnings about printf-style arguments). The problem with net_dbg_ratelimited is that it is defined to be a variant of net_ratelimited_function, which expands to essentially: if (net_ratelimit()) pr_debug(fmt, ...); When DEBUG is not defined, then this becomes, if (net_ratelimit()) ; This seems benign, except it isn't. Firstly, there's the obvious overhead of calling net_ratelimit needlessly, which does quite some book keeping for the rate limiting. Given that the pr_debug and net_dbg_ratelimited family of functions are sprinkled liberally through performance critical code, with developers assuming they'll be compiled out to a no-op most of the time, we certainly do not want this needless book keeping. Secondly, and most visibly, even though no debug message is printed when DEBUG is not defined, if there is a flood of invocations, dmesg winds up peppered with messages such as "net_ratelimit: 320 callbacks suppressed". This is because our aforementioned net_ratelimit() function actually prints this text in some circumstances. It's especially odd to see this when there isn't any other accompanying debug message. So, in sum, it doesn't make sense to have this function's current behavior, and instead it should match what every other debug family of functions in the kernel does with !DEBUG -- nothing. This patch replaces calls to net_dbg_ratelimited when !DEBUG with no_printk, keeping with the idiom of all the other debug print helpers. Also, though not strictly neccessary, it guards the call with an if (0) so that all evaluation of any arguments are sure to be compiled out. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>	2019-07-27 22:07:54 +02:00
Sebastian Andrzej Siewior	a1e14bfdfc	locking/rtmutex: Avoid a NULL pointer dereference on deadlock commit 8d1e5a1a1ccf5ae9d8a5a0ee7960202ccb0c5429 upstream. With task_blocks_on_rt_mutex() returning early -EDEADLK we never add the waiter to the waitqueue. Later, we try to remove it via remove_waiter() and go boom in rt_mutex_top_waiter() because rb_entry() gives a NULL pointer. ( Tested on v3.18-RT where rtmutex is used for regular mutex and I tried to get one twice in a row. ) Not sure when this started but I guess 397335f004f4 ("rtmutex: Fix deadlock detector for real") or commit 3d5c9340d194 ("rtmutex: Handle deadlock detection smarter"). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1424187823-19600-1-git-send-email-bigeasy@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org> [ luis: backported to 3.16: adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>	2019-07-27 22:07:54 +02:00
Thomas Gleixner	7c4a7bdb1f	locking/rtmutex: Prevent dequeue vs. unlock race commit dbb26055defd03d59f678cb5f2c992abe05b064a upstream. David reported a futex/rtmutex state corruption. It's caused by the following problem: CPU0 CPU1 CPU2 l->owner=T1 rt_mutex_lock(l) lock(l->wait_lock) l->owner = T1 \| HAS_WAITERS; enqueue(T2) boost() unlock(l->wait_lock) schedule() rt_mutex_lock(l) lock(l->wait_lock) l->owner = T1 \| HAS_WAITERS; enqueue(T3) boost() unlock(l->wait_lock) schedule() signal(->T2) signal(->T3) lock(l->wait_lock) dequeue(T2) deboost() unlock(l->wait_lock) lock(l->wait_lock) dequeue(T3) ===> wait list is now empty deboost() unlock(l->wait_lock) lock(l->wait_lock) fixup_rt_mutex_waiters() if (wait_list_empty(l)) { owner = l->owner & ~HAS_WAITERS; l->owner = owner ==> l->owner = T1 } lock(l->wait_lock) rt_mutex_unlock(l) fixup_rt_mutex_waiters() if (wait_list_empty(l)) { owner = l->owner & ~HAS_WAITERS; cmpxchg(l->owner, T1, NULL) ===> Success (l->owner = NULL) l->owner = owner ==> l->owner = T1 } That means the problem is caused by fixup_rt_mutex_waiters() which does the RMW to clear the waiters bit unconditionally when there are no waiters in the rtmutexes rbtree. This can be fatal: A concurrent unlock can release the rtmutex in the fastpath because the waiters bit is not set. If the cmpxchg() gets in the middle of the RMW operation then the previous owner, which just unlocked the rtmutex is set as the owner again when the write takes place after the successfull cmpxchg(). The solution is rather trivial: verify that the owner member of the rtmutex has the waiters bit set before clearing it. This does not require a cmpxchg() or other atomic operations because the waiters bit can only be set and cleared with the rtmutex wait_lock held. It's also safe against the fast path unlock attempt. The unlock attempt via cmpxchg() will either see the bit set and take the slowpath or see the bit cleared and release it atomically in the fastpath. It's remarkable that the test program provided by David triggers on ARM64 and MIPS64 really quick, but it refuses to reproduce on x86-64, while the problem exists there as well. That refusal might explain that this got not discovered earlier despite the bug existing from day one of the rtmutex implementation more than 10 years ago. Thanks to David for meticulously instrumenting the code and providing the information which allowed to decode this subtle problem. Reported-by: David Daney <ddaney@caviumnetworks.com> Tested-by: David Daney <david.daney@cavium.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Fixes: `23f78d4a03` ("[PATCH] pi-futex: rt mutex core") Link: http://lkml.kernel.org/r/20161130210030.351136722@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org> [bwh: Backported to 3.16: use ACCESS_ONCE() instead of {READ,WRITE}_ONCE()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>	2019-07-27 22:07:54 +02:00
Guillaume Nault	59a177e5b6	pppoe: fix reference counting in PPPoE proxy commit 29e73269aa4d36f92b35610c25f8b01c789b0dc8 upstream. Drop reference on the relay_po socket when __pppoe_xmit() succeeds. This is already handled correctly in the error path. Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>	2019-07-27 22:07:53 +02:00
Manuel Schölling	f99add3d5b	dns_resolver: Do not accept domain names longer than 255 chars According to RFC1035 "[...] the total length of a domain name (i.e., label octets and label length octets) is restricted to 255 octets or less." Signed-off-by: Manuel Schölling <manuel.schoelling@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-27 22:07:53 +02:00
Lorenzo Bianconi	d86fb058ce	net: ipv4: use a dedicated counter for icmp_v4 redirect packets [ Upstream commit c09551c6ff7fe16a79a42133bcecba5fc2fc3291 ] According to the algorithm described in the comment block at the beginning of ip_rt_send_redirect, the host should try to send 'ip_rt_redirect_number' ICMP redirect packets with an exponential backoff and then stop sending them at all assuming that the destination ignores redirects. If the device has previously sent some ICMP error packets that are rate-limited (e.g TTL expired) and continues to receive traffic, the redirect packets will never be transmitted. This happens since peer->rate_tokens will be typically greater than 'ip_rt_redirect_number' and so it will never be reset even if the redirect silence timeout (ip_rt_redirect_silence) has elapsed without receiving any packet requiring redirects. Fix it by using a dedicated counter for the number of ICMP redirect packets that has been sent by the host I have not been able to identify a given commit that introduced the issue since ip_rt_send_redirect implements the same rate-limiting algorithm from commit `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-07-27 22:07:53 +02:00
Eric Dumazet	27760dcdf1	tcp: clear icsk_backoff in tcp_write_queue_purge() [ Upstream commit 04c03114be82194d4a4858d41dba8e286ad1787c ] soukjin bae reported a crash in tcp_v4_err() handling ICMP_DEST_UNREACH after tcp_write_queue_head(sk) returned a NULL pointer. Current logic should have prevented this : if (seq != tp->snd_una \|\| !icsk->icsk_retransmits \|\| !icsk->icsk_backoff \|\| fastopen) break; Problem is the write queue might have been purged and icsk_backoff has not been cleared. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: soukjin bae <soukjin.bae@samsung.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-07-27 22:07:52 +02:00
Paolo Abeni	08da77e269	udp: perform source validation for mcast early demux commit bc044e8db7962e727a75b591b9851ff2ac5cf846 upstream. The UDP early demux can leverate the rx dst cache even for multicast unconnected sockets. In such scenario the ipv4 source address is validated only on the first packet in the given flow. After that, when we fetch the dst entry from the socket rx cache, we stop enforcing the rp_filter and we even start accepting any kind of martian addresses. Disabling the dst cache for unconnected multicast socket will cause large performace regression, nearly reducing by half the max ingress tput. Instead we factor out a route helper to completely validate an skb source address for multicast packets and we call it from the UDP early demux for mcast packets landing on unconnected sockets, after successful fetching the related cached dst entry. This still gives a measurable, but limited performance regression: rp_filter = 0 rp_filter = 1 edmux disabled: 1182 Kpps 1127 Kpps edmux before: 2238 Kpps 2238 Kpps edmux after: 2037 Kpps 2019 Kpps The above figures are on top of current net tree. Applying the net-next commit 6e617de84e87 ("net: avoid a full fib lookup when rp_filter is disabled.") the delta with rp_filter == 0 will decrease even more. Fixes: 421b3885bf6d ("udp: ipv4: Add udp early demux") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>	2019-07-27 22:07:52 +02:00
Paolo Abeni	b60998c141	IPv4: early demux can return an error code commit 7487449c86c65202b3b725c4524cb48dd65e4e6f upstream. Currently no error is emitted, but this infrastructure will used by the next patch to allow source address validation for mcast sockets. Since early demux can do a route lookup and an ipv4 route lookup can return an error code this is consistent with the current ipv4 route infrastructure. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 3.16: - Drop change to net_protocol::early_demux_handler - Keep using NET_INC_STATS_BH() in ip_rcv_finish() - Fix up additional return statement in udp_v4_early_demux() - Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>	2019-07-27 22:07:51 +02:00
Paolo Abeni	a3f3d99974	ipv4: fix broadcast packets reception commit ad0ea1989cc4d5905941d0a9e62c63ad6d859cef upstream. Currently, ingress ipv4 broadcast datagrams are dropped since, in udp_v4_early_demux(), ip_check_mc_rcu() is invoked even on bcast packets. This patch addresses the issue, invoking ip_check_mc_rcu() only for mcast packets. Fixes: 6e5403093261 ("ipv4/udp: Verify multicast group is ours in upd_v4_early_demux()") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>	2019-07-27 22:07:51 +02:00
Shawn Bohrer	755cb66d68	ipv4/udp: Verify multicast group is ours in upd_v4_early_demux() commit 6e540309326188f769e03bb4c6dd8ff6752930c2 upstream. 421b3885bf6d56391297844f43fb7154a6396e12 "udp: ipv4: Add udp early demux" introduced a regression that allowed sockets bound to INADDR_ANY to receive packets from multicast groups that the socket had not joined. For example a socket that had joined 224.168.2.9 could also receive packets from 225.168.2.9 despite not having joined that group if ip_early_demux is enabled. Fix this by calling ip_check_mc_rcu() in udp_v4_early_demux() to verify that the multicast packet is indeed ours. Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com> Reported-by: Yurij M. Plotnikov <Yurij.Plotnikov@oktetlabs.ru> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luis Henriques <luis.henriques@canonical.com>	2019-07-27 22:07:51 +02:00
Eric Dumazet	5023264f3f	udp: fix dst races with multicast early demux commit 10e2eb878f3ca07ac2f05fa5ca5e6c4c9174a27a upstream. Multicast dst are not cached. They carry DST_NOCACHE. As mentioned in commit f8864972126899 ("ipv4: fix dst race in sk_dst_get()"), these dst need special care before caching them into a socket. Caching them is allowed only if their refcnt was not 0, ie we must use atomic_inc_not_zero() Also, we must use READ_ONCE() to fetch sk->sk_rx_dst, as mentioned in commit d0c294c53a771 ("tcp: prevent fetching dst twice in early demux code") Fixes: 421b3885bf6d ("udp: ipv4: Add udp early demux") Tested-by: Gregory Hoggarth <Gregory.Hoggarth@alliedtelesis.co.nz> Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Gregory Hoggarth <Gregory.Hoggarth@alliedtelesis.co.nz> Reported-by: Alex Gartrell <agartrell@fb.com> Cc: Michal Kubeček <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net> [ luis: backported to 3.16: used davem's backport to 3.14 ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>	2019-07-27 22:07:50 +02:00
Eric Dumazet	2ad7c93946	udp: ipv4: do not waste time in __udp4_lib_mcast_demux_lookup Its too easy to add thousand of UDP sockets on a particular bucket, and slow down an innocent multicast receiver. Early demux is supposed to be an optimization, we should avoid spending too much time in it. It is interesting to note __udp4_lib_demux_lookup() only tries to match first socket in the chain. 10 is the threshold we already have in __udp4_lib_lookup() to switch to secondary hash. Fixes: 421b3885bf6d5 ("udp: ipv4: Add udp early demux") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: David Held <drheld@google.com> Cc: Shawn Bohrer <sbohrer@rgmadvisors.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-27 22:07:50 +02:00
Eric Dumazet	ba085e5d79	udp: ipv4: must add synchronization in udp_sk_rx_dst_set() Unlike TCP, UDP input path does not hold the socket lock. Before messing with sk->sk_rx_dst, we must use a spinlock, otherwise multiple cpus could leak a refcount. This patch also takes care of renewing a stale dst entry. (When the sk->sk_rx_dst would not be used by IP early demux) Fixes: 421b3885bf6d ("udp: ipv4: Add udp early demux") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Shawn Bohrer <sbohrer@rgmadvisors.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-27 22:07:50 +02:00
Eric Dumazet	a59c54761f	udp: ipv4: fix potential use after free in udp_v4_early_demux() pskb_may_pull() can reallocate skb->head, we need to move the initialization of iph and uh pointers after its call. Fixes: 421b3885bf6d ("udp: ipv4: Add udp early demux") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Shawn Bohrer <sbohrer@rgmadvisors.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-27 22:07:49 +02:00
Eric Dumazet	c88bbddf16	udp: ipv4: fix an use after free in __udp4_lib_rcv() Dave Jones reported a use after free in UDP stack : [ 5059.434216] ========================= [ 5059.434314] [ BUG: held lock freed! ] [ 5059.434420] 3.13.0-rc3+ #9 Not tainted [ 5059.434520] ------------------------- [ 5059.434620] named/863 is freeing memory ffff88005e960000-ffff88005e96061f, with a lock still held there! [ 5059.434815] (slock-AF_INET){+.-...}, at: [<ffffffff8149bd21>] udp_queue_rcv_skb+0xd1/0x4b0 [ 5059.435012] 3 locks held by named/863: [ 5059.435086] #0: (rcu_read_lock){.+.+..}, at: [<ffffffff8143054d>] __netif_receive_skb_core+0x11d/0x940 [ 5059.435295] #1: (rcu_read_lock){.+.+..}, at: [<ffffffff81467a5e>] ip_local_deliver_finish+0x3e/0x410 [ 5059.435500] #2: (slock-AF_INET){+.-...}, at: [<ffffffff8149bd21>] udp_queue_rcv_skb+0xd1/0x4b0 [ 5059.435734] stack backtrace: [ 5059.435858] CPU: 0 PID: 863 Comm: named Not tainted 3.13.0-rc3+ #9 [loadavg: 0.21 0.06 0.06 1/115 1365] [ 5059.436052] Hardware name: /D510MO, BIOS MOPNV10J.86A.0175.2010.0308.0620 03/08/2010 [ 5059.436223] 0000000000000002 ffff88007e203ad8 ffffffff8153a372 ffff8800677130e0 [ 5059.436390] ffff88007e203b10 ffffffff8108cafa ffff88005e960000 ffff88007b00cfc0 [ 5059.436554] ffffea00017a5800 ffffffff8141c490 0000000000000246 ffff88007e203b48 [ 5059.436718] Call Trace: [ 5059.436769] <IRQ> [<ffffffff8153a372>] dump_stack+0x4d/0x66 [ 5059.436904] [<ffffffff8108cafa>] debug_check_no_locks_freed+0x15a/0x160 [ 5059.437037] [<ffffffff8141c490>] ? __sk_free+0x110/0x230 [ 5059.437147] [<ffffffff8112da2a>] kmem_cache_free+0x6a/0x150 [ 5059.437260] [<ffffffff8141c490>] __sk_free+0x110/0x230 [ 5059.437364] [<ffffffff8141c5c9>] sk_free+0x19/0x20 [ 5059.437463] [<ffffffff8141cb25>] sock_edemux+0x25/0x40 [ 5059.437567] [<ffffffff8141c181>] sock_queue_rcv_skb+0x81/0x280 [ 5059.437685] [<ffffffff8149bd21>] ? udp_queue_rcv_skb+0xd1/0x4b0 [ 5059.437805] [<ffffffff81499c82>] __udp_queue_rcv_skb+0x42/0x240 [ 5059.437925] [<ffffffff81541d25>] ? _raw_spin_lock+0x65/0x70 [ 5059.438038] [<ffffffff8149bebb>] udp_queue_rcv_skb+0x26b/0x4b0 [ 5059.438155] [<ffffffff8149c712>] __udp4_lib_rcv+0x152/0xb00 [ 5059.438269] [<ffffffff8149d7f5>] udp_rcv+0x15/0x20 [ 5059.438367] [<ffffffff81467b2f>] ip_local_deliver_finish+0x10f/0x410 [ 5059.438492] [<ffffffff81467a5e>] ? ip_local_deliver_finish+0x3e/0x410 [ 5059.438621] [<ffffffff81468653>] ip_local_deliver+0x43/0x80 [ 5059.438733] [<ffffffff81467f70>] ip_rcv_finish+0x140/0x5a0 [ 5059.438843] [<ffffffff81468926>] ip_rcv+0x296/0x3f0 [ 5059.438945] [<ffffffff81430b72>] __netif_receive_skb_core+0x742/0x940 [ 5059.439074] [<ffffffff8143054d>] ? __netif_receive_skb_core+0x11d/0x940 [ 5059.442231] [<ffffffff8108c81d>] ? trace_hardirqs_on+0xd/0x10 [ 5059.442231] [<ffffffff81430d83>] __netif_receive_skb+0x13/0x60 [ 5059.442231] [<ffffffff81431c1e>] netif_receive_skb+0x1e/0x1f0 [ 5059.442231] [<ffffffff814334e0>] napi_gro_receive+0x70/0xa0 [ 5059.442231] [<ffffffffa01de426>] rtl8169_poll+0x166/0x700 [r8169] [ 5059.442231] [<ffffffff81432bc9>] net_rx_action+0x129/0x1e0 [ 5059.442231] [<ffffffff810478cd>] __do_softirq+0xed/0x240 [ 5059.442231] [<ffffffff81047e25>] irq_exit+0x125/0x140 [ 5059.442231] [<ffffffff81004241>] do_IRQ+0x51/0xc0 [ 5059.442231] [<ffffffff81542bef>] common_interrupt+0x6f/0x6f We need to keep a reference on the socket, by using skb_steal_sock() at the right place. Note that another patch is needed to fix a race in udp_sk_rx_dst_set(), as we hold no lock protecting the dst. Fixes: 421b3885bf6d ("udp: ipv4: Add udp early demux") Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Shawn Bohrer <sbohrer@rgmadvisors.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-27 22:07:49 +02:00
Florian Westphal	406121ee71	bridge: netfilter: orphan skb before invoking ip netfilter hooks Pekka Pietikäinen reports xt_socket behavioural change after commit 00028aa37098o (netfilter: xt_socket: use IP early demux). Reason is xt_socket now no longer does an unconditional sk lookup - it re-uses existing skb->sk if possible, assuming ->sk was set by ip early demux. However, when netfilter is invoked via bridge, this can cause 'bogus' sockets to be examined by the match, e.g. a 'tun' device socket. bridge netfilter should orphan the skb just like the routing path before invoking ipv4/ipv6 netfilter hooks to avoid this. Reported-and-tested-by: Pekka Pietikäinen <pp@ee.oulu.fi> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-07-27 22:07:48 +02:00
Shawn Bohrer	3542138c1c	udp: ipv4: Add udp early demux The removal of the routing cache introduced a performance regression for some UDP workloads since a dst lookup must be done for each packet. This change caches the dst per socket in a similar manner to what we do for TCP by implementing early_demux. For UDP multicast we can only cache the dst if there is only one receiving socket on the host. Since caching only works when there is one receiving socket we do the multicast socket lookup using RCU. For UDP unicast we only demux sockets with an exact match in order to not break forwarding setups. Additionally since the hash chains may be long we only check the first socket to see if it is a match and not waste extra time searching the whole chain when we might not find an exact match. Benchmark results from a netperf UDP_RR test: Before 87961.22 transactions/s After 89789.68 transactions/s Benchmark results from a fio 1 byte UDP multicast pingpong test (Multicast one way unicast response): Before 12.97us RTT After 12.63us RTT Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-27 22:07:48 +02:00

1 2 3 4 5 ...

446661 Commits All Branches Search

446661 Commits

All Branches