Create a generic version of ablk_helper so it can be reused
by other architectures.
Change-Id: I3bc45cde3c736b82e752ae8a9ea6a2f9465963fc
Acked-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Git-commit: 0df9ee0ee0dd0f50f1c0c03f14f44dc76ad7dd60
Git-repo: https://android.googlesource.com/kernel/common/
Signed-off-by: Samir Mehta <samirn@codeaurora.org>
In order to safely support the use of NEON instructions in
kernel mode, some precautions need to be taken:
- the userland context that may be present in the registers (even
if the NEON/VFP is currently disabled) must be stored under the
correct task (which may not be 'current' in the UP case),
- to avoid having to keep track of additional vfpstates for the
kernel side, disallow the use of NEON in interrupt context
and run with preemption disabled,
- after use, re-enable preemption and re-enable the lazy restore
machinery by disabling the NEON/VFP unit.
This patch adds the functions kernel_neon_begin() and
kernel_neon_end() which take care of the above. It also adds
the Kconfig symbol KERNEL_MODE_NEON to enable it.
Change-Id: I595345a0eabcc55685582ed3df1c2968bf49e037
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
Git-commit: 3ad7b1bb81d15ab510cce9dc9df7f1a18185ee36
Git-repo: https://android.googlesource.com/kernel/common/
Signed-off-by: Samir Mehta <samirn@codeaurora.org>
The current thread trying to clear the held vfp state may not be
the owner of hw state. For example,
Core0 Core1
Thread1 uses VFP.
Thread1 vfpstate.hard.cpu = 1.
vfp_current_hw_state[1] points to Thread1
vfpstate.
Going to suspend.
Freeze Thread1.
Thread1 is switched out.
VFP HW registers saved to Thread1 vfpstate.
Core0 disables Core1.
Stopper thread calls vfp_force_reload().
Stopper thread vfpstate.hard.cpu = NR_CPUS.
...
(No PM notifier for non-idle path. So
vfp_pm_suspend() is NOT called on Core1.)
...
Core1 is off and VFP HW registers are lost.
...
Core0 enables Core1.
Core0 thaw Thread1.
Thread1 migrate to Core1
before using VFP.
Thread1 starts using VFP.
Now we have vfp_current_hw_state[1] points
to Thread1 vfpstate. And Thread1 has
vfpstate.hard.cpu = 1.
Thread1 does not need to reload saved vfpstate
to VFP HW.
Thread1 continues running using corrupted VFP
HW register.
This change fixes above gap by always clearing vfp_current_hw_state when
vfp_force_reload() is called.
CRs-fixed: 553415
Change-Id: I1b914a1e6f06c088b78e82ae01f68e3cf49f0442
Signed-off-by: Yuanyuan Zhong <zyy@motorola.com>
Patch-mainline: linux-arm-kernel @ Wed Oct 2 17:59:47 EDT 2013
Signed-off-by: Murali Nalajala <mnalajal@codeaurora.org>
In future kernels create_proc_entry() is going to be removed.
Let's modernize this code and use regular file operations and
seq_printf() to avoid experiencing compilation problems in future
kernels.
Change-Id: Ie53663e099ff72a3eec84df7c3f77e15c0e539be
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
commit 98a01e779f upstream.
On architectures with sizeof(int) < sizeof (long), the
computation of mask inside apply_slack() can be undefined if the
computed bit is > 32.
E.g. with: expires = 0xffffe6f5 and slack = 25, we get:
expires_limit = 0x20000000e
bit = 33
mask = (1 << 33) - 1 /* undefined */
On x86, mask becomes 1 and and the slack is not applied properly.
On s390, mask is -1, expires is set to 0 and the timer fires immediately.
Use 1UL << bit to solve that issue.
Change-Id: I1a3857d6c10d12ef497ef3c6a07a2789350a5bfc
Suggested-by: Deborah Townsend <dstownse@us.ibm.com>
Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Link: http://lkml.kernel.org/r/20140418152310.GA13654@midget.suse.cz
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0cda345d1b ]
commit b9f47a3aae (tcp_cubic: limit delayed_ack ratio to prevent
divide error) try to prevent divide error, but there is still a little
chance that delayed_ack can reach zero. In case the param cnt get
negative value, then ratio+cnt would overflow and may happen to be zero.
As a result, min(ratio, ACK_RATIO_LIMIT) will calculate to be zero.
In some old kernels, such as 2.6.32, there is a bug that would
pass negative param, which then ultimately leads to this divide error.
commit 5b35e1e6e9 (tcp: fix tcp_trim_head() to adjust segment count
with skb MSS) fixed the negative param issue. However,
it's safe that we fix the range of delayed_ack as well,
to make sure we do not hit a divide by zero.
Change-Id: I3897a284a654ea5c0519a81e190ccc130f29d1c3
CC: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Liu Yu <allanyuliu@tencent.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Waking up sync thread recursively for same CPU causes deadlock.
Fix this by avoiding recursive wakeup call to same thread.
CRs-Fixed: 823472
Change-Id: Idc51f53da9fafc17d609132c8d72d85f76c8160f
Signed-off-by: Swetha Chikkaboraiah <schikk@codeaurora.org>
Function dbs_check_cpu calls __cpufreq_driver_getavg
With policy as IN/OUT parameter.
There is possiblity of policy becoming NULL in
__cpufreq_driver_getavg. Same policy is getting
accessed in the for loop which leads to NULL pointer dereference.
So NULL check is required for the policy
CRs-fixed: 582925
Change-Id: Ib8d8de8d66500430a5aa4c275937640e4f934aa8
Signed-off-by: Swetha Chikkaboraiah <schikk@codeaurora.org>
Commit 1ce28d6b (ondemand: updated tune for hardware coordination)
brings in an attempt to synchronize expiry of per-cpu ondemand timer
at same time across all online cpus. This is meant to help systems
where "multiple cores share P-state via Hardware Coordination".
Synchronization is achieved by having timers expire when jiffy value
would be a perfect multiple of sampling window size. While such
synchronozation attemps are unnecessary on systems where cores don't
share P-states, it can hurt on power as cpus can now work off shorter
sampling windows (when percentage utilization over short window can be
observed to be high, causing frequency to be boosted).
Remove this unnecessary attempt to synchronize expiry of ondemand
timer across online cpus. A more acceptable solution could be to have
low-level architecture specific cpufreq driver indicate the need for
synchronization.
Change-Id: I6cb1cf3472ca106f029dc01248e0df7cfbb495b8
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
This change adds sysfs entry for the ondemand governor's attribute
down_differential_multi_core. This is required to be able to program
this knob to attain better perf/power for use cases like camcorder.
CRs-fixed: 561456
Change-Id: Ib7c1223215c6b714c286c231ed37f3abfae741c6
Signed-off-by: Krishna Vanka <kvanka@codeaurora.org>
The dbs_timer_init() call in store_powersave_bias() re-initializes the
dbs_info workqueue, call on dbs_timer_exit() to ensure
outstanding work is cleared prior to making this call. Also, grab the
percpu timer_mutex lock to avoid race conditions with respect to the
dbs timer.
Change-Id: I79f3d43eeb51d2d8e21edd0fe043d6583333951f
Signed-off-by: Osvaldo Banuelos <osvaldob@codeaurora.org>
store_powersave_bias() acquires the hotplug lock and the dbs_mutex
lock, but does so in the wrong order. Deadlocks like the following
can result.
Thread A:
get_online_cpus+0x3c/0x5c <- acquires 'cpu_hotplug.lock'
store_powersave_bias+0x80/0x3f4 <- acquires 'dbs_mutex'
kobj_attr_store+0x14/0x20
sysfs_write_file+0x108/0x13c
vfs_write+0xb0/0x128
sys_write+0x38/0x64
Thread B:
cpufreq_governor_dbs+0x7c/0x55c <- acquires 'dbs_mutex'
__cpufreq_governor+0x90/0xe0
__cpufreq_set_policy+0x1b0/0x258
cpufreq_add_dev_interface+0x2cc/0x334
cpufreq_add_dev+0x514/0x580
cpufreq_cpu_callback+0x88/0x9c
notifier_call_chain+0x38/0x68
__cpu_notify+0x28/0x40
_cpu_up+0xe4/0x118 <- acquires 'cpu_hotplug.lock'
cpu_up+0x64/0x80
store_online+0x48/0x78
dev_attr_store+0x18/0x24
sysfs_write_file+0x108/0x13c
vfs_write+0xb0/0x128
sys_write+0x38/0x64
Fix this by flipping the order in which the locks are acquired and
released in store_powersave_bias so that it is the same as in the
hotplug path.
Change-Id: Idc59fb29d60b8f7fceb8ed0f2bb9eff4670abda7
Signed-off-by: Matt Wagantall <mattw@codeaurora.org>
update_sampling_rate has a for loop which goes through each
online cpu and possibly queue up the ondemand work for them.
But while doing this it doesnt take any hotplug lock which
could potentially cause a race condition where ondemand work
is queued after the hotplug code (which sets the policy to NULL)
in the governor has cancelled any pending work. This could cause
a crash while trying to access the NULL policy in dbs_check_cpu.
Protecting the for_each_online_cpu loop with get_online_cpus()
and put_online_cpus().
Change-Id: Ia3f43ca7e4bed542834ab03ca1191d728f13311c
Signed-off-by: Rohit Gupta <rohgup@codeaurora.org>
Currently Ondemand governor handles any input event
like touch by scaling the CPU frequency to maximum
available on the target.
This change adds a new sysfs interface "input_boost"
whereby the CPU will scale to this frequency on input events.
The value of this sysfs is user defined so input events
can now be handled by scaling the CPU to lower frequencies
than target max.
Change-Id: I5428fd8797c9984b17a66b01a44557f2160e8b68
Signed-off-by: Dilip Gudlur <dgudlur@codeaurora.org>
Even though this work item runs on only one cpu at a time (due to
queue_work_on()) it is possible for the work item to be preempted
and so use of smp_processor_id() is illegal.
BUG: using smp_processor_id() in preemptible [00000000]
code: kworker/3:1/4162 caller is dbs_refresh_callback+0xc/0x188
[<c00151b0>] (unwind_backtrace+0x0/0x120) from [<c0279058>]
(debug_smp_processor_id+0xbc/0xf0)
[<c0279058>] (debug_smp_processor_id+0xbc/0xf0) from [<c0454b54>]
(dbs_refresh_callback+0xc/0x188)
[<c0454b54>] (dbs_refresh_callback+0xc/0x188) from [<c0087290>]
(process_one_work+0x354/0x648)
[<c0087290>] (process_one_work+0x354/0x648) from [<c0089754>]
(worker_thread+0x1a8/0x2a8)
[<c0089754>] (worker_thread+0x1a8/0x2a8) from [<c008e480>]
(kthread+0x90/0xa0)
[<c008e480>] (kthread+0x90/0xa0) from [<c000f438>]
(kernel_thread_exit+0x0/0x8)
The intent of the code is to determine which CPU this work item
is running on, which we can easily do by passing that information
in a wrapper struct around the work struct. Do this so we avoid
this problem.
Change-Id: I05ca0ff2b3cbaa239930463ea0760e3e9d75145f
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Originally CPU frequency were boosted for any input event.
In some case, sensor sends a lot of input event, which
keep CPU in high frequency. CPU frequency only need to
be boosted when real user interaction happens.
Change-Id: Ia3ad755b98d8363a17729926610b5dd6f0075288
CR-Fixed: 507519
Signed-off-by: Tingwei Zhang <tingwei@codeaurora.org>
Turn off frequency synchronization of CPUs on thread migrations
when powersave bias is enabled. This is done to prevent re-arming
of the dbs timer work (which is cancelled by store_powersave_bias)
by the sync_thread.
Change-Id: I165dd591845c1d66d01a14e8dfc44c767c677b0d
Signed-off-by: Rohit Gupta <rohgup@codeaurora.org>
store_powersave_bias takes timer_mutex before calling dbs_timer_exit
which tries to cancel the delayed work do_dbs_timer which in turn
tries to take the same lock. This can cause a lock recursion under
the race condition where can cancel_delayed_work_sync is called
when the work has already started executing.
This can be avoided by taking that lock after calling
dbs_timer_exit.
Change-Id: I7f862286e66f1ddc1e13e4eeee369dd188fc10d5
Signed-off-by: Rohit Gupta <rohgup@codeaurora.org>
Enable compiler optimizations specific to the Cortex-A15
processor when targeting MSM Krait CPUs. This is necessary
take advantage of the UDIV/SDIV instructions supported by
these processors. To accomplish this, we need to remove the
-march=armv7-a ISA restriction from the compiler options
because 'cortex-a15' is a superset of 'armv7-a'.
Change-Id: I6215aecc11fb4f77c971de7b84f68649ef234357
Signed-off-by: Stepan Moskovchenko <stepanm@codeaurora.org>
The warning was introduced in 2009 (implement CALLER_ADDRESSx).
The only "problem" here is that
CALLER_ADDRESSx for x > 1 returns NULL which doesn't do much harm.
The drawback of implementing a fix (i.e. use unwind tables to implement CALLER_ADDRESSx)
is that much of the unwinder code would need to be marked as not traceable.
Change-Id: I6b9bf44ef272006b15f8b1c1435ecd2a07c6258a
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Zdrowy Gosciu <ZdrowyGosciu+GITHUB@gmail.com>
Initialize the rr_control_msg union to prevent the local users to obtain
sensitive information from kernel memory in msm_rpcrouter_close function.
CRs-Fixed: 515623
Change-Id: Ife87010eb81e8840c9b1bf5d8aeb941c90020eac
Signed-off-by: FNU Ramendra <rramendr@codeaurora.org>
commit 99ab7b1944 upstream.
After commit f5bf18fa22 ("bootmem/sparsemem: remove limit constraint
in alloc_bootmem_section"), usemap allocations may easily be placed
outside the optimal section that holds the node descriptor, even if
there is space available in that section. This results in unnecessary
hotplug dependencies that need to have the node unplugged before the
section holding the usemap.
The reason is that the bootmem allocator doesn't guarantee a linear
search starting from the passed allocation goal but may start out at a
much higher address absent an upper limit.
Fix this by trying the allocation with the limit at the section end,
then retry without if that fails. This keeps the fix from f5bf18fa22
of not panicking if the allocation does not fit in the section, but
still makes sure to try to stay within the section at first.
[rewritten massively by Johannes to apply to 3.4]
Change-Id: Ie40e51558b3abcaf0c0e72846928b84fe17f055a
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently we don't check if the new MTU is valid or not and this allows
one to configure a smaller than minimum allowed by RFCs or even bigger
than interface own MTU, which is a problem as it may lead to packet
drops.
If you have a daemon like NetworkManager running, this may be exploited
by remote attackers by forging RA packets with an invalid MTU, possibly
leading to a DoS. (NetworkManager currently only validates for values
too small, but not for too big ones.)
The fix is just to make sure the new value is valid. That is, between
IPV6_MIN_MTU and interface's MTU.
Note that similar check is already performed at
ndisc_router_discovery(), for when kernel itself parses the RA.
Change-Id: Id2c8fd3cb68ae157dc31d663e5439ddecc109c0c
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The following sequence of commands:
i=`keyctl add user a a @s`
keyctl request2 keyring foo bar @t
keyctl unlink $i @s
tries to invoke an upcall to instantiate a keyring if one doesn't already
exist by that name within the user's keyring set. However, if the upcall
fails, the code sets keyring->type_data.reject_error to -ENOKEY or some
other error code. When the key is garbage collected, the key destroy
function is called unconditionally and keyring_destroy() uses list_empty()
on keyring->type_data.link - which is in a union with reject_error.
Subsequently, the kernel tries to unlink the keyring from the keyring names
list - which oopses like this:
BUG: unable to handle kernel paging request at 00000000ffffff8a
IP: [<ffffffff8126e051>] keyring_destroy+0x3d/0x88
...
Workqueue: events key_garbage_collector
...
RIP: 0010:[<ffffffff8126e051>] keyring_destroy+0x3d/0x88
RSP: 0018:ffff88003e2f3d30 EFLAGS: 00010203
RAX: 00000000ffffff82 RBX: ffff88003bf1a900 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 000000003bfc6901 RDI: ffffffff81a73a40
RBP: ffff88003e2f3d38 R08: 0000000000000152 R09: 0000000000000000
R10: ffff88003e2f3c18 R11: 000000000000865b R12: ffff88003bf1a900
R13: 0000000000000000 R14: ffff88003bf1a908 R15: ffff88003e2f4000
...
CR2: 00000000ffffff8a CR3: 000000003e3ec000 CR4: 00000000000006f0
...
Call Trace:
[<ffffffff8126c756>] key_gc_unused_keys.constprop.1+0x5d/0x10f
[<ffffffff8126ca71>] key_garbage_collector+0x1fa/0x351
[<ffffffff8105ec9b>] process_one_work+0x28e/0x547
[<ffffffff8105fd17>] worker_thread+0x26e/0x361
[<ffffffff8105faa9>] ? rescuer_thread+0x2a8/0x2a8
[<ffffffff810648ad>] kthread+0xf3/0xfb
[<ffffffff810647ba>] ? kthread_create_on_node+0x1c2/0x1c2
[<ffffffff815f2ccf>] ret_from_fork+0x3f/0x70
[<ffffffff810647ba>] ? kthread_create_on_node+0x1c2/0x1c2
Note the value in RAX. This is a 32-bit representation of -ENOKEY.
The solution is to only call ->destroy() if the key was successfully
instantiated.
Change-Id: Ia52370813b7e8231fdd99d2a208340af1c7b4007
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
There appears to be a race between:
(1) key_gc_unused_keys() which frees key->security and then calls
keyring_destroy() to unlink the name from the name list
(2) find_keyring_by_name() which calls key_permission(), thus accessing
key->security, on a key before checking to see whether the key usage is 0
(ie. the key is dead and might be cleaned up).
Fix this by calling ->destroy() before cleaning up the core key data -
including key->security.
Change-Id: I4b9b89af020e6348af095e9014bf23b5eb1a9ef9
Reported-by: Petr Matousek <pmatouse@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Add support for invalidating a key - which renders it immediately invisible to
further searches and causes the garbage collector to immediately wake up,
remove it from keyrings and then destroy it when it's no longer referenced.
It's better not to do this with keyctl_revoke() as that marks the key to start
returning -EKEYREVOKED to searches when what is actually desired is to have the
key refetched.
To invalidate a key the caller must be granted SEARCH permission by the key.
This may be too strict. It may be better to also permit invalidation if the
caller has any of READ, WRITE or SETATTR permission.
The primary use for this is to evict keys that are cached in special keyrings,
such as the DNS resolver or an ID mapper.
Change-Id: I923ea0f0b8f9d6b3ff8ec8beca77b1774984f1c3
Signed-off-by: David Howells <dhowells@redhat.com>
Make use of the previous patch that makes the garbage collector perform RCU
synchronisation before destroying defunct keys. Key pointers can now be
replaced in-place without creating a new keyring payload and replacing the
whole thing as the discarded keys will not be destroyed until all currently
held RCU read locks are released.
If the keyring payload space needs to be expanded or contracted, then a
replacement will still need allocating, and the original will still have to be
freed by RCU.
Change-Id: I6c4f784f120951fb51ac9c23856ea37f51770bb9
Signed-off-by: David Howells <dhowells@redhat.com>
Make the keys garbage collector invoke synchronize_rcu() prior to destroying
keys with a zero usage count. This means that a key can be examined under the
RCU read lock in the safe knowledge that it won't get deallocated until after
the lock is released - even if its usage count becomes zero whilst we're
looking at it.
This is useful in keyring search vs key link. Consider a keyring containing a
link to a key. That link can be replaced in-place in the keyring without
requiring an RCU copy-and-replace on the keyring contents without breaking a
search underway on that keyring when the displaced key is released, provided
the key is actually destroyed only after the RCU read lock held by the search
algorithm is released.
This permits __key_link() to replace a key without having to reallocate the key
payload. A key gets replaced if a new key being linked into a keyring has the
same type and description.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Change-Id: Ifd8549b5b906c638d63c358ce1f34acd81139207
If a user key gets negatively instantiated, an error code is cached in the
payload area. A negatively instantiated key may be then be positively
instantiated by updating it with valid data. However, the ->update key
type method must be aware that the error code may be there.
The following may be used to trigger the bug in the user key type:
keyctl request2 user user "" @u
keyctl add user user "a" @u
which manifests itself as:
BUG: unable to handle kernel paging request at 00000000ffffff8a
IP: [<ffffffff810a376f>] __call_rcu.constprop.76+0x1f/0x280 kernel/rcu/tree.c:3046
PGD 7cc30067 PUD 0
Oops: 0002 [#1] SMP
Modules linked in:
CPU: 3 PID: 2644 Comm: a.out Not tainted 4.3.0+ #49
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff88003ddea700 ti: ffff88003dd88000 task.ti: ffff88003dd88000
RIP: 0010:[<ffffffff810a376f>] [<ffffffff810a376f>] __call_rcu.constprop.76+0x1f/0x280
[<ffffffff810a376f>] __call_rcu.constprop.76+0x1f/0x280 kernel/rcu/tree.c:3046
RSP: 0018:ffff88003dd8bdb0 EFLAGS: 00010246
RAX: 00000000ffffff82 RBX: 0000000000000000 RCX: 0000000000000001
RDX: ffffffff81e3fe40 RSI: 0000000000000000 RDI: 00000000ffffff82
RBP: ffff88003dd8bde0 R08: ffff88007d2d2da0 R09: 0000000000000000
R10: 0000000000000000 R11: ffff88003e8073c0 R12: 00000000ffffff82
R13: ffff88003dd8be68 R14: ffff88007d027600 R15: ffff88003ddea700
FS: 0000000000b92880(0063) GS:ffff88007fd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000ffffff8a CR3: 000000007cc5f000 CR4: 00000000000006e0
Stack:
ffff88003dd8bdf0 ffffffff81160a8a 0000000000000000 00000000ffffff82
ffff88003dd8be68 ffff88007d027600 ffff88003dd8bdf0 ffffffff810a39e5
ffff88003dd8be20 ffffffff812a31ab ffff88007d027600 ffff88007d027620
Call Trace:
[<ffffffff810a39e5>] kfree_call_rcu+0x15/0x20 kernel/rcu/tree.c:3136
[<ffffffff812a31ab>] user_update+0x8b/0xb0 security/keys/user_defined.c:129
[< inline >] __key_update security/keys/key.c:730
[<ffffffff8129e5c1>] key_create_or_update+0x291/0x440 security/keys/key.c:908
[< inline >] SYSC_add_key security/keys/keyctl.c:125
[<ffffffff8129fc21>] SyS_add_key+0x101/0x1e0 security/keys/keyctl.c:60
[<ffffffff8185f617>] entry_SYSCALL_64_fastpath+0x12/0x6a arch/x86/entry/entry_64.S:185
Note the error code (-ENOKEY) in EDX.
A similar bug can be tripped by:
keyctl request2 trusted user "" @u
keyctl add trusted user "a" @u
This should also affect encrypted keys - but that has to be correctly
parameterised or it will fail with EINVAL before getting to the bit that
will crashes.
Change-Id: I171d566f431c56208e1fe279f466d2d399a9ac7c
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
郭永刚 reported that one could simply crash the kernel as root by
using a simple program:
int socket_fd;
struct sockaddr_in addr;
addr.sin_port = 0;
addr.sin_addr.s_addr = INADDR_ANY;
addr.sin_family = 10;
socket_fd = socket(10,3,0x40000000);
connect(socket_fd , &addr,16);
AF_INET, AF_INET6 sockets actually only support 8-bit protocol
identifiers. inet_sock's skc_protocol field thus is sized accordingly,
thus larger protocol identifiers simply cut off the higher bits and
store a zero in the protocol fields.
This could lead to e.g. NULL function pointer because as a result of
the cut off inet_num is zero and we call down to inet_autobind, which
is NULL for raw sockets.
kernel: Call Trace:
kernel: [<ffffffff816db90e>] ? inet_autobind+0x2e/0x70
kernel: [<ffffffff816db9a4>] inet_dgram_connect+0x54/0x80
kernel: [<ffffffff81645069>] SYSC_connect+0xd9/0x110
kernel: [<ffffffff810ac51b>] ? ptrace_notify+0x5b/0x80
kernel: [<ffffffff810236d8>] ? syscall_trace_enter_phase2+0x108/0x200
kernel: [<ffffffff81645e0e>] SyS_connect+0xe/0x10
kernel: [<ffffffff81779515>] tracesys_phase2+0x84/0x89
I found no particular commit which introduced this problem.
Change-Id: I653fad90da54908144cc8916c2dccb1fa6f14eed
CVE: CVE-2015-8543
Cc: Cong Wang <cwang@twopensource.com>
Reported-by: 郭永刚 <guoyonggang@360.cn>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Its possible to use RAW sockets to get a crash in
tcp_set_keepalive() / sk_reset_timer()
Fix is to make sure socket is a SOCK_STREAM one.
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: Ieeb498a3e623cfcb54e1c865a3c0229e4acf1e87
ARM has some private syscalls (for example, set_tls(2)) which lie
outside the range of NR_syscalls. If any of these are called while
syscall tracing is being performed, out-of-bounds array access will
occur in the ftrace and perf sys_{enter,exit} handlers.
# trace-cmd record -e raw_syscalls:* true && trace-cmd report
...
true-653 [000] 384.675777: sys_enter: NR 192 (0, 1000, 3, 4000022, ffffffff, 0)
true-653 [000] 384.675812: sys_exit: NR 192 = 1995915264
true-653 [000] 384.675971: sys_enter: NR 983045 (76f74480, 76f74000, 76f74b28, 76f74480, 76f76f74, 1)
true-653 [000] 384.675988: sys_exit: NR 983045 = 0
...
# trace-cmd record -e syscalls:* true
[ 17.289329] Unable to handle kernel paging request at virtual address aaaaaace
[ 17.289590] pgd = 9e71c000
[ 17.289696] [aaaaaace] *pgd=00000000
[ 17.289985] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
[ 17.290169] Modules linked in:
[ 17.290391] CPU: 0 PID: 704 Comm: true Not tainted 3.18.0-rc2+ #21
[ 17.290585] task: 9f4dab00 ti: 9e710000 task.ti: 9e710000
[ 17.290747] PC is at ftrace_syscall_enter+0x48/0x1f8
[ 17.290866] LR is at syscall_trace_enter+0x124/0x184
Fix this by ignoring out-of-NR_syscalls-bounds syscall numbers.
Commit cd0980fc8a "tracing: Check invalid syscall nr while tracing syscalls"
added the check for less than zero, but it should have also checked
for greater than NR_syscalls.
Link: http://lkml.kernel.org/p/1414620418-29472-1-git-send-email-rabin@rab.in
Fixes: cd0980fc8a "tracing: Check invalid syscall nr while tracing syscalls"
Cc: stable@vger.kernel.org # 2.6.33+
Signed-off-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Change-Id: I512142f8f1e1b2a8dc063209666dbce9737377e7
syscall_get_nr can return -1 in the case that the task is not executing
a system call.
This patch fixes perf_syscall_{enter,exit} to check that the syscall
number is valid before using it as an index into a bitmap.
Link: http://lkml.kernel.org/r/1345137254-7377-1-git-send-email-will.deacon@arm.com
Change-Id: Iedc719957e184c6572b3ad94e241ae2a97a0b533
Cc: Jason Baron <jbaron@redhat.com>
Cc: Wade Farnsworth <wade_farnsworth@mentor.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
commit a6138db815 upstream.
Kenton Varda <kenton@sandstorm.io> discovered that by remounting a
read-only bind mount read-only in a user namespace the
MNT_LOCK_READONLY bit would be cleared, allowing an unprivileged user
to the remount a read-only mount read-write.
Correct this by replacing the mask of mount flags to preserve
with a mask of mount flags that may be changed, and preserve
all others. This ensures that any future bugs with this mask and
remount will fail in an easy to detect way where new mount flags
simply won't change.
Change-Id: I42178b32592b2ccc688d096b420304e93abeaba0
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Francis Moreau <francis.moro@gmail.com>
Signed-off-by: Zefan Li <lizefan@huawei.com>
commit 0d0826019e upstream.
Andy Lutomirski recently demonstrated that when chroot is used to set
the root path below the path for the new ``root'' passed to pivot_root
the pivot_root system call succeeds and leaks mounts.
In examining the code I see that starting with a new root that is
below the current root in the mount tree will result in a loop in the
mount tree after the mounts are detached and then reattached to one
another. Resulting in all kinds of ugliness including a leak of that
mounts involved in the leak of the mount loop.
Prevent this problem by ensuring that the new mount is reachable from
the current root of the mount tree.
[Added stable cc. Fixes CVE-2014-7970. --Andy]
Change-Id: I77908c81d43a2e5542f8ae27ca898dd26003b0e4
Reported-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/87bnpmihks.fsf@x220.int.ebiederm.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li <lizefan@huawei.com>
Some occurences in the netfilter tree use skb_header_pointer() in
the following way ...
struct dccp_hdr _dh, *dh;
...
skb_header_pointer(skb, dataoff, sizeof(_dh), &dh);
... where dh itself is a pointer that is being passed as the copy
buffer. Instead, we need to use &_dh as the forth argument so that
we're copying the data into an actual buffer that sits on the stack.
Currently, we probably could overwrite memory on the stack (e.g.
with a possibly mal-formed DCCP packet), but unintentionally, as
we only want the buffer to be placed into _dh variable.
Change-Id: Ief6a82b5a58e1dd88d43313eb8356c52cf89b214
Fixes: 2bc780499a ("[NETFILTER]: nf_conntrack: add DCCP protocol support")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
We used to read file_handle twice. Once to get the amount of extra bytes, and
once to fetch the entire structure.
This may be problematic since we do size verifications only after the first
read, so if the number of extra bytes changes in userspace between the first
and second calls, we'll have an incoherent view of file_handle.
Instead, read the constant size once, and copy that over to the final
structure without having to re-read it again.
Change-Id: Ib05e5129629e27d5a05953098c5bc470fae40d2a
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Sasha Levin found a NULL pointer dereference that is due to a missing
page table lock, which in turn is due to the pmd entry in question being
a transparent huge-table entry.
The code - introduced in commit 1998cc0489 ("mm: make
madvise(MADV_WILLNEED) support swap file prefetch") - correctly checks
for this situation using pmd_none_or_trans_huge_or_clear_bad(), but it
turns out that that function doesn't work correctly.
pmd_none_or_trans_huge_or_clear_bad() expected that pmd_bad() would
trigger if the transparent hugepage bit was set, but it doesn't do that
if pmd_numa() is also set. Note that the NUMA bit only gets set on real
NUMA machines, so people trying to reproduce this on most normal
development systems would never actually trigger this.
Fix it by removing the very subtle (and subtly incorrect) expectation,
and instead just checking pmd_trans_huge() explicitly.
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Acked-by: Andrea Arcangeli <aarcange@redhat.com>
[ Additionally remove the now stale test for pmd_trans_huge() inside the
pmd_bad() case - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: I3f3763f236ef102de735297cd175cf514d40d28f
A local route may have a lower hop_limit set than global routes do.
RFC 3756, Section 4.2.7, "Parameter Spoofing"
> 1. The attacker includes a Current Hop Limit of one or another small
> number which the attacker knows will cause legitimate packets to
> be dropped before they reach their destination.
> As an example, one possible approach to mitigate this threat is to
> ignore very small hop limits. The nodes could implement a
> configurable minimum hop limit, and ignore attempts to set it below
> said limit.
Change-Id: I51ee1778e3d2d5fa1aefbdf1ad8869e4e8dc28b2
Signed-off-by: D.S. Ljungmark <ljungmark@modio.se>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
"len" contains sizeof(nf_ct_ext) and size of extensions. In a worst
case it can contain all extensions. Bellow you can find sizes for all
types of extensions. Their sum is definitely bigger than 256.
nf_ct_ext_types[0]->len = 24
nf_ct_ext_types[1]->len = 32
nf_ct_ext_types[2]->len = 24
nf_ct_ext_types[3]->len = 32
nf_ct_ext_types[4]->len = 152
nf_ct_ext_types[5]->len = 2
nf_ct_ext_types[6]->len = 16
nf_ct_ext_types[7]->len = 8
I have seen "len" up to 280 and my host has crashes w/o this patch.
The right way to fix this problem is reducing the size of the ecache
extension (4) and Florian is going to do this, but these changes will
be quite large to be appropriate for a stable tree.
Change-Id: If9efaf2b103cf304bbfa583e354cfad3faa77ac2
Fixes: 5b423f6a40 (netfilter: nf_conntrack: fix racy timer handling with reliable)
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The timeout entries are sizeof(int) rather than sizeof(long), which
means that when they were getting read we'd also leak kernel memory
to userspace along with the timeout values.
Change-Id: I328d1186720a6f70f555eeeb62c83ee69814868d
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This prevents a race between chown() and execve(), where chowning a
setuid-user binary to root would momentarily make the binary setuid
root.
This patch was mostly written by Linus Torvalds.
Signed-off-by: Jann Horn <jann@thejh.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: Iecebf23d07e299689e4ba4fd74ea8821ef96e72b
We have two problems in UDP stack related to bogus checksums :
1) We return -EAGAIN to application even if receive queue is not empty.
This breaks applications using edge trigger epoll()
2) Under UDP flood, we can loop forever without yielding to other
processes, potentially hanging the host, especially on non SMP.
This patch is an attempt to make things better.
We might in the future add extra support for rt applications
wanting to better control time spent doing a recv() in a hostile
environment. For example we could validate checksums before queuing
packets in socket receive queue.
Change-Id: I9355321ac7ee564d56c342fa7738b918052bf308
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Chernenkov used KASAN to discover that eCryptfs writes past the
end of the allocated buffer during encrypted filename decoding. This
fix corrects the issue by getting rid of the unnecessary 0 write when
the current bit offset is 2.
Change-Id: I2e139f816b9ce0ad6d207c6f454d6f25061383ee
Signed-off-by: Michael Halcrow <mhalcrow@google.com>
Reported-by: Dmitry Chernenkov <dmitryc@google.com>
Suggested-by: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org # v2.6.29+: 51ca58d eCryptfs: Filename Encryption: Encoding and encryption functions
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Given following iptables ruleset:
-P FORWARD DROP
-A FORWARD -m sctp --dport 9 -j ACCEPT
-A FORWARD -p tcp --dport 80 -j ACCEPT
-A FORWARD -p tcp -m conntrack -m state ESTABLISHED,RELATED -j ACCEPT
One would assume that this allows SCTP on port 9 and TCP on port 80.
Unfortunately, if the SCTP conntrack module is not loaded, this allows
*all* SCTP communication, to pass though, i.e. -p sctp -j ACCEPT,
which we think is a security issue.
This is because on the first SCTP packet on port 9, we create a dummy
"generic l4" conntrack entry without any port information (since
conntrack doesn't know how to extract this information).
All subsequent packets that are unknown will then be in established
state since they will fallback to proto_generic and will match the
'generic' entry.
Our originally proposed version [1] completely disabled generic protocol
tracking, but Jozsef suggests to not track protocols for which a more
suitable helper is available, hence we now mitigate the issue for in
tree known ct protocol helpers only, so that at least NAT and direction
information will still be preserved for others.
[1] http://www.spinics.net/lists/netfilter-devel/msg33430.html
Joint work with Daniel Borkmann.
Change-Id: I7fff74303d98876efd3e7834555cbf95d0319359
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Not caching dst_entries which cause redirects could be exploited by hosts
on the same subnet, causing a severe DoS attack. This effect aggravated
since commit f886497212 ("ipv4: fix dst race in sk_dst_get()").
Lookups causing redirects will be allocated with DST_NOCACHE set which
will force dst_release to free them via RCU. Unfortunately waiting for
RCU grace period just takes too long, we can end up with >1M dst_entries
waiting to be released and the system will run OOM. rcuos threads cannot
catch up under high softirq load.
Attaching the flag to emit a redirect later on to the specific skb allows
us to cache those dst_entries thus reducing the pressure on allocation
and deallocation.
This issue was discovered by Marcelo Leitner.
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Marcelo Leitner <mleitner@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: I53e4b500a4db2f5fece937a42a3bd810b2640c44
When a key is being garbage collected, it's key->user would get put before
the ->destroy() callback is called, where the key is removed from it's
respective tracking structures.
This leaves a key hanging in a semi-invalid state which leaves a window open
for a different task to try an access key->user. An example is
find_keyring_by_name() which would dereference key->user for a key that is
in the process of being garbage collected (where key->user was freed but
->destroy() wasn't called yet - so it's still present in the linked list).
This would cause either a panic, or corrupt memory.
Change-Id: Ic74246dc2dcc593f04f71063e3301e7356d588b7
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
This patch adds a "cdrom" sysfs entry for each mass_storage LUN, just
like "ro" sysfs entry. This allows switching between USB and CD-ROM
emulation without reinserting the module or recompiling the kernel.
Change-Id: Idf83c74815b1ad370428ab9d3e5503d5f7bcd3b6