Commit graph

307002 commits

Author SHA1 Message Date
Osvaldo Banuelos
d72e245daa cpufreq: ondemand: fix timer-related list corruption in store_powersave_bias()
The dbs_timer_init() call in store_powersave_bias() re-initializes the
dbs_info workqueue, call on dbs_timer_exit() to ensure
outstanding work is cleared prior to making this call. Also, grab the
percpu timer_mutex lock to avoid race conditions with respect to the
dbs timer.

Change-Id: I79f3d43eeb51d2d8e21edd0fe043d6583333951f
Signed-off-by: Osvaldo Banuelos <osvaldob@codeaurora.org>
2016-10-29 23:12:13 +08:00
Matt Wagantall
6b7881f3af cpufreq: ondemand: Fix hotplug deadlock with store_powersave_bias
store_powersave_bias() acquires the hotplug lock and the dbs_mutex
lock, but does so in the wrong order.  Deadlocks like the following
can result.

Thread A:
    get_online_cpus+0x3c/0x5c           <- acquires 'cpu_hotplug.lock'
    store_powersave_bias+0x80/0x3f4     <- acquires 'dbs_mutex'
    kobj_attr_store+0x14/0x20
    sysfs_write_file+0x108/0x13c
    vfs_write+0xb0/0x128
    sys_write+0x38/0x64

Thread B:
    cpufreq_governor_dbs+0x7c/0x55c     <- acquires 'dbs_mutex'
    __cpufreq_governor+0x90/0xe0
    __cpufreq_set_policy+0x1b0/0x258
    cpufreq_add_dev_interface+0x2cc/0x334
    cpufreq_add_dev+0x514/0x580
    cpufreq_cpu_callback+0x88/0x9c
    notifier_call_chain+0x38/0x68
    __cpu_notify+0x28/0x40
    _cpu_up+0xe4/0x118                  <- acquires 'cpu_hotplug.lock'
    cpu_up+0x64/0x80
    store_online+0x48/0x78
    dev_attr_store+0x18/0x24
    sysfs_write_file+0x108/0x13c
    vfs_write+0xb0/0x128
    sys_write+0x38/0x64

Fix this by flipping the order in which the locks are acquired and
released in store_powersave_bias so that it is the same as in the
hotplug path.

Change-Id: Idc59fb29d60b8f7fceb8ed0f2bb9eff4670abda7
Signed-off-by: Matt Wagantall <mattw@codeaurora.org>
2016-10-29 23:12:13 +08:00
Rohit Gupta
bb5af87584 cpufreq: ondemand: Fix update_sampling_rate race with hotplug
update_sampling_rate has a for loop which goes through each
online cpu and possibly queue up the ondemand work for them.
But while doing this it doesnt take any hotplug lock which
could potentially cause a race condition where ondemand work
is queued after the hotplug code (which sets the policy to NULL)
in the governor has cancelled any pending work. This could cause
a crash while trying to access the NULL policy in dbs_check_cpu.

Protecting the for_each_online_cpu loop with get_online_cpus()
and put_online_cpus().

Change-Id: Ia3f43ca7e4bed542834ab03ca1191d728f13311c
Signed-off-by: Rohit Gupta <rohgup@codeaurora.org>
2016-10-29 23:12:13 +08:00
Dilip Gudlur
db31d58766 cpufreq: ondemand: add input_boost interface
Currently Ondemand governor handles any input event
like touch by scaling the CPU frequency to maximum
available on the target.

This change adds a new sysfs interface "input_boost"
whereby the CPU will scale to this frequency on input events.
The value of this sysfs is user defined so input events
can now be handled by scaling the CPU to lower frequencies
than target max.

Change-Id: I5428fd8797c9984b17a66b01a44557f2160e8b68
Signed-off-by: Dilip Gudlur <dgudlur@codeaurora.org>
2016-10-29 23:12:13 +08:00
Stephen Boyd
b5608a5fc5 cpufreq: Avoid using smp_processor_id() in preemptible context
Even though this work item runs on only one cpu at a time (due to
queue_work_on()) it is possible for the work item to be preempted
and so use of smp_processor_id() is illegal.

BUG: using smp_processor_id() in preemptible [00000000]
code: kworker/3:1/4162 caller is dbs_refresh_callback+0xc/0x188
[<c00151b0>] (unwind_backtrace+0x0/0x120) from [<c0279058>]
(debug_smp_processor_id+0xbc/0xf0)
[<c0279058>] (debug_smp_processor_id+0xbc/0xf0) from [<c0454b54>]
(dbs_refresh_callback+0xc/0x188)
[<c0454b54>] (dbs_refresh_callback+0xc/0x188) from [<c0087290>]
(process_one_work+0x354/0x648)
[<c0087290>] (process_one_work+0x354/0x648) from [<c0089754>]
(worker_thread+0x1a8/0x2a8)
[<c0089754>] (worker_thread+0x1a8/0x2a8) from [<c008e480>]
(kthread+0x90/0xa0)
[<c008e480>] (kthread+0x90/0xa0) from [<c000f438>]
(kernel_thread_exit+0x0/0x8)

The intent of the code is to determine which CPU this work item
is running on, which we can easily do by passing that information
in a wrapper struct around the work struct. Do this so we avoid
this problem.

Change-Id: I05ca0ff2b3cbaa239930463ea0760e3e9d75145f
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
2016-10-29 23:12:12 +08:00
Tingwei Zhang
d75c2f5576 cpufreq: ondemand: Boost CPU frequency only for touch input
Originally CPU frequency were boosted for any input event.
In some case, sensor sends a lot of input event, which
keep CPU in high frequency.  CPU frequency only need to
be boosted when real user interaction happens.

Change-Id: Ia3ad755b98d8363a17729926610b5dd6f0075288
CR-Fixed: 507519
Signed-off-by: Tingwei Zhang <tingwei@codeaurora.org>
2016-10-29 23:12:12 +08:00
Rohit Gupta
c696bf9d0c cpufreq: ondemand: Disable freq sync feature in store_powersave_bias
Turn off frequency synchronization of CPUs on thread migrations
when powersave bias is enabled. This is done to prevent re-arming
of the dbs timer work (which is cancelled by store_powersave_bias)
by the sync_thread.

Change-Id: I165dd591845c1d66d01a14e8dfc44c767c677b0d
Signed-off-by: Rohit Gupta <rohgup@codeaurora.org>
2016-10-29 23:12:12 +08:00
Rohit Gupta
ca619d2432 cpufreq: ondemand: Fix locking issue in store_powersave_bias
store_powersave_bias takes timer_mutex before calling dbs_timer_exit
which tries to cancel the delayed work do_dbs_timer which in turn
tries to take the same lock. This can cause a lock recursion under
the race condition where can cancel_delayed_work_sync is called
when the work has already started executing.

This can be avoided by taking that lock after calling
dbs_timer_exit.

Change-Id: I7f862286e66f1ddc1e13e4eeee369dd188fc10d5
Signed-off-by: Rohit Gupta <rohgup@codeaurora.org>
2016-10-29 23:12:12 +08:00
Stepan Moskovchenko
ba81fad0d7 ARM: Use -mcpu=cortex-a15 when targeting MSM Krait CPUs
Enable compiler optimizations specific to the Cortex-A15
processor when targeting MSM Krait CPUs. This is necessary
take advantage of the UDIV/SDIV instructions supported by
these processors. To accomplish this, we need to remove the
-march=armv7-a ISA restriction from the compiler options
because 'cortex-a15' is a superset of 'armv7-a'.

Change-Id: I6215aecc11fb4f77c971de7b84f68649ef234357
Signed-off-by: Stepan Moskovchenko <stepanm@codeaurora.org>
2016-10-29 23:12:12 +08:00
Domi Papoi
b2474145ab msm: video: Checks for code robustness
Check for NULL pointer and array out of bounds

Change-Id: I42fb2b6fb087e6e4a99b2783d2b68499e802541a
Signed-off-by: Domi Papoi <dpapoi@codeaurora.org>
2016-10-29 23:12:12 +08:00
Uwe Kleine-König
5f48c8489c arm: kernel: Drop warning about return_address not using unwind tables
The warning was introduced in 2009 (implement CALLER_ADDRESSx).
The only "problem" here is that
CALLER_ADDRESSx for x > 1 returns NULL which doesn't do much harm.

The drawback of implementing a fix (i.e. use unwind tables to implement CALLER_ADDRESSx)
is that much of the unwinder code would need to be marked as not traceable.

Change-Id: I6b9bf44ef272006b15f8b1c1435ecd2a07c6258a
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Zdrowy Gosciu <ZdrowyGosciu+GITHUB@gmail.com>
2016-10-29 23:12:12 +08:00
FNU Ramendra
79d89cedad msm: rpc: Fix uninitialized union in rpc router close function
Initialize the rr_control_msg union to prevent the local users to obtain
sensitive information from kernel memory in msm_rpcrouter_close function.

CRs-Fixed: 515623
Change-Id: Ife87010eb81e8840c9b1bf5d8aeb941c90020eac
Signed-off-by: FNU Ramendra <rramendr@codeaurora.org>
2016-10-29 23:12:12 +08:00
Yinghai Lu
c7c94a6913 mm: sparse: fix usemap allocation above node descriptor section
commit 99ab7b1944 upstream.

After commit f5bf18fa22 ("bootmem/sparsemem: remove limit constraint
in alloc_bootmem_section"), usemap allocations may easily be placed
outside the optimal section that holds the node descriptor, even if
there is space available in that section.  This results in unnecessary
hotplug dependencies that need to have the node unplugged before the
section holding the usemap.

The reason is that the bootmem allocator doesn't guarantee a linear
search starting from the passed allocation goal but may start out at a
much higher address absent an upper limit.

Fix this by trying the allocation with the limit at the section end,
then retry without if that fails.  This keeps the fix from f5bf18fa22
of not panicking if the allocation does not fit in the section, but
still makes sure to try to stay within the section at first.

[rewritten massively by Johannes to apply to 3.4]

Change-Id: Ie40e51558b3abcaf0c0e72846928b84fe17f055a
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-29 23:12:12 +08:00
Marcelo Leitner
84a5789882 ipv6: addrconf: validate new MTU before applying it
Currently we don't check if the new MTU is valid or not and this allows
one to configure a smaller than minimum allowed by RFCs or even bigger
than interface own MTU, which is a problem as it may lead to packet
drops.

If you have a daemon like NetworkManager running, this may be exploited
by remote attackers by forging RA packets with an invalid MTU, possibly
leading to a DoS. (NetworkManager currently only validates for values
too small, but not for too big ones.)

The fix is just to make sure the new value is valid. That is, between
IPV6_MIN_MTU and interface's MTU.

Note that similar check is already performed at
ndisc_router_discovery(), for when kernel itself parses the RA.

Change-Id: Id2c8fd3cb68ae157dc31d663e5439ddecc109c0c
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-29 23:12:12 +08:00
David Howells
eba9d59456 KEYS: Fix crash when attempt to garbage collect an uninstantiated keyring
The following sequence of commands:

    i=`keyctl add user a a @s`
    keyctl request2 keyring foo bar @t
    keyctl unlink $i @s

tries to invoke an upcall to instantiate a keyring if one doesn't already
exist by that name within the user's keyring set.  However, if the upcall
fails, the code sets keyring->type_data.reject_error to -ENOKEY or some
other error code.  When the key is garbage collected, the key destroy
function is called unconditionally and keyring_destroy() uses list_empty()
on keyring->type_data.link - which is in a union with reject_error.
Subsequently, the kernel tries to unlink the keyring from the keyring names
list - which oopses like this:

	BUG: unable to handle kernel paging request at 00000000ffffff8a
	IP: [<ffffffff8126e051>] keyring_destroy+0x3d/0x88
	...
	Workqueue: events key_garbage_collector
	...
	RIP: 0010:[<ffffffff8126e051>] keyring_destroy+0x3d/0x88
	RSP: 0018:ffff88003e2f3d30  EFLAGS: 00010203
	RAX: 00000000ffffff82 RBX: ffff88003bf1a900 RCX: 0000000000000000
	RDX: 0000000000000000 RSI: 000000003bfc6901 RDI: ffffffff81a73a40
	RBP: ffff88003e2f3d38 R08: 0000000000000152 R09: 0000000000000000
	R10: ffff88003e2f3c18 R11: 000000000000865b R12: ffff88003bf1a900
	R13: 0000000000000000 R14: ffff88003bf1a908 R15: ffff88003e2f4000
	...
	CR2: 00000000ffffff8a CR3: 000000003e3ec000 CR4: 00000000000006f0
	...
	Call Trace:
	 [<ffffffff8126c756>] key_gc_unused_keys.constprop.1+0x5d/0x10f
	 [<ffffffff8126ca71>] key_garbage_collector+0x1fa/0x351
	 [<ffffffff8105ec9b>] process_one_work+0x28e/0x547
	 [<ffffffff8105fd17>] worker_thread+0x26e/0x361
	 [<ffffffff8105faa9>] ? rescuer_thread+0x2a8/0x2a8
	 [<ffffffff810648ad>] kthread+0xf3/0xfb
	 [<ffffffff810647ba>] ? kthread_create_on_node+0x1c2/0x1c2
	 [<ffffffff815f2ccf>] ret_from_fork+0x3f/0x70
	 [<ffffffff810647ba>] ? kthread_create_on_node+0x1c2/0x1c2

Note the value in RAX.  This is a 32-bit representation of -ENOKEY.

The solution is to only call ->destroy() if the key was successfully
instantiated.

Change-Id: Ia52370813b7e8231fdd99d2a208340af1c7b4007
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
2016-10-29 23:12:12 +08:00
David Howells
0e7ea8aab9 KEYS: Fix race between key destruction and finding a keyring by name
There appears to be a race between:

 (1) key_gc_unused_keys() which frees key->security and then calls
     keyring_destroy() to unlink the name from the name list

 (2) find_keyring_by_name() which calls key_permission(), thus accessing
     key->security, on a key before checking to see whether the key usage is 0
     (ie. the key is dead and might be cleaned up).

Fix this by calling ->destroy() before cleaning up the core key data -
including key->security.

Change-Id: I4b9b89af020e6348af095e9014bf23b5eb1a9ef9
Reported-by: Petr Matousek <pmatouse@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2016-10-29 23:12:12 +08:00
David Howells
62b1b19ca5 KEYS: Add invalidation support
Add support for invalidating a key - which renders it immediately invisible to
further searches and causes the garbage collector to immediately wake up,
remove it from keyrings and then destroy it when it's no longer referenced.

It's better not to do this with keyctl_revoke() as that marks the key to start
returning -EKEYREVOKED to searches when what is actually desired is to have the
key refetched.

To invalidate a key the caller must be granted SEARCH permission by the key.
This may be too strict.  It may be better to also permit invalidation if the
caller has any of READ, WRITE or SETATTR permission.

The primary use for this is to evict keys that are cached in special keyrings,
such as the DNS resolver or an ID mapper.

Change-Id: I923ea0f0b8f9d6b3ff8ec8beca77b1774984f1c3
Signed-off-by: David Howells <dhowells@redhat.com>
2016-10-29 23:12:12 +08:00
David Howells
3e99777c1e KEYS: Permit in-place link replacement in keyring list
Make use of the previous patch that makes the garbage collector perform RCU
synchronisation before destroying defunct keys.  Key pointers can now be
replaced in-place without creating a new keyring payload and replacing the
whole thing as the discarded keys will not be destroyed until all currently
held RCU read locks are released.

If the keyring payload space needs to be expanded or contracted, then a
replacement will still need allocating, and the original will still have to be
freed by RCU.

Change-Id: I6c4f784f120951fb51ac9c23856ea37f51770bb9
Signed-off-by: David Howells <dhowells@redhat.com>
2016-10-29 23:12:12 +08:00
David Howells
e9fb8c3832 KEYS: Perform RCU synchronisation on keys prior to key destruction
Make the keys garbage collector invoke synchronize_rcu() prior to destroying
keys with a zero usage count.  This means that a key can be examined under the
RCU read lock in the safe knowledge that it won't get deallocated until after
the lock is released - even if its usage count becomes zero whilst we're
looking at it.

This is useful in keyring search vs key link.  Consider a keyring containing a
link to a key.  That link can be replaced in-place in the keyring without
requiring an RCU copy-and-replace on the keyring contents without breaking a
search underway on that keyring when the displaced key is released, provided
the key is actually destroyed only after the RCU read lock held by the search
algorithm is released.

This permits __key_link() to replace a key without having to reallocate the key
payload.  A key gets replaced if a new key being linked into a keyring has the
same type and description.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Jeff Layton <jlayton@redhat.com>

Change-Id: Ifd8549b5b906c638d63c358ce1f34acd81139207
2016-10-29 23:12:11 +08:00
David Howells
11c4b683be KEYS: Fix handling of stored error in a negatively instantiated user key
If a user key gets negatively instantiated, an error code is cached in the
payload area.  A negatively instantiated key may be then be positively
instantiated by updating it with valid data.  However, the ->update key
type method must be aware that the error code may be there.

The following may be used to trigger the bug in the user key type:

    keyctl request2 user user "" @u
    keyctl add user user "a" @u

which manifests itself as:

	BUG: unable to handle kernel paging request at 00000000ffffff8a
	IP: [<ffffffff810a376f>] __call_rcu.constprop.76+0x1f/0x280 kernel/rcu/tree.c:3046
	PGD 7cc30067 PUD 0
	Oops: 0002 [#1] SMP
	Modules linked in:
	CPU: 3 PID: 2644 Comm: a.out Not tainted 4.3.0+ #49
	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
	task: ffff88003ddea700 ti: ffff88003dd88000 task.ti: ffff88003dd88000
	RIP: 0010:[<ffffffff810a376f>]  [<ffffffff810a376f>] __call_rcu.constprop.76+0x1f/0x280
	 [<ffffffff810a376f>] __call_rcu.constprop.76+0x1f/0x280 kernel/rcu/tree.c:3046
	RSP: 0018:ffff88003dd8bdb0  EFLAGS: 00010246
	RAX: 00000000ffffff82 RBX: 0000000000000000 RCX: 0000000000000001
	RDX: ffffffff81e3fe40 RSI: 0000000000000000 RDI: 00000000ffffff82
	RBP: ffff88003dd8bde0 R08: ffff88007d2d2da0 R09: 0000000000000000
	R10: 0000000000000000 R11: ffff88003e8073c0 R12: 00000000ffffff82
	R13: ffff88003dd8be68 R14: ffff88007d027600 R15: ffff88003ddea700
	FS:  0000000000b92880(0063) GS:ffff88007fd00000(0000) knlGS:0000000000000000
	CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
	CR2: 00000000ffffff8a CR3: 000000007cc5f000 CR4: 00000000000006e0
	Stack:
	 ffff88003dd8bdf0 ffffffff81160a8a 0000000000000000 00000000ffffff82
	 ffff88003dd8be68 ffff88007d027600 ffff88003dd8bdf0 ffffffff810a39e5
	 ffff88003dd8be20 ffffffff812a31ab ffff88007d027600 ffff88007d027620
	Call Trace:
	 [<ffffffff810a39e5>] kfree_call_rcu+0x15/0x20 kernel/rcu/tree.c:3136
	 [<ffffffff812a31ab>] user_update+0x8b/0xb0 security/keys/user_defined.c:129
	 [<     inline     >] __key_update security/keys/key.c:730
	 [<ffffffff8129e5c1>] key_create_or_update+0x291/0x440 security/keys/key.c:908
	 [<     inline     >] SYSC_add_key security/keys/keyctl.c:125
	 [<ffffffff8129fc21>] SyS_add_key+0x101/0x1e0 security/keys/keyctl.c:60
	 [<ffffffff8185f617>] entry_SYSCALL_64_fastpath+0x12/0x6a arch/x86/entry/entry_64.S:185

Note the error code (-ENOKEY) in EDX.

A similar bug can be tripped by:

    keyctl request2 trusted user "" @u
    keyctl add trusted user "a" @u

This should also affect encrypted keys - but that has to be correctly
parameterised or it will fail with EINVAL before getting to the bit that
will crashes.

Change-Id: I171d566f431c56208e1fe279f466d2d399a9ac7c
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2016-10-29 23:12:11 +08:00
Hannes Frederic Sowa
dcf5c5397a net: add validation for the socket syscall protocol argument
郭永刚 reported that one could simply crash the kernel as root by
using a simple program:

	int socket_fd;
	struct sockaddr_in addr;
	addr.sin_port = 0;
	addr.sin_addr.s_addr = INADDR_ANY;
	addr.sin_family = 10;

	socket_fd = socket(10,3,0x40000000);
	connect(socket_fd , &addr,16);

AF_INET, AF_INET6 sockets actually only support 8-bit protocol
identifiers. inet_sock's skc_protocol field thus is sized accordingly,
thus larger protocol identifiers simply cut off the higher bits and
store a zero in the protocol fields.

This could lead to e.g. NULL function pointer because as a result of
the cut off inet_num is zero and we call down to inet_autobind, which
is NULL for raw sockets.

kernel: Call Trace:
kernel:  [<ffffffff816db90e>] ? inet_autobind+0x2e/0x70
kernel:  [<ffffffff816db9a4>] inet_dgram_connect+0x54/0x80
kernel:  [<ffffffff81645069>] SYSC_connect+0xd9/0x110
kernel:  [<ffffffff810ac51b>] ? ptrace_notify+0x5b/0x80
kernel:  [<ffffffff810236d8>] ? syscall_trace_enter_phase2+0x108/0x200
kernel:  [<ffffffff81645e0e>] SyS_connect+0xe/0x10
kernel:  [<ffffffff81779515>] tracesys_phase2+0x84/0x89

I found no particular commit which introduced this problem.

Change-Id: I653fad90da54908144cc8916c2dccb1fa6f14eed
CVE: CVE-2015-8543
Cc: Cong Wang <cwang@twopensource.com>
Reported-by: 郭永刚 <guoyonggang@360.cn>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-29 23:12:11 +08:00
David S. Miller
29b4575124 bluetooth: Validate socket address length in sco_sock_bind().
Change-Id: I890640975f1af64f71947b6a1820249e08f6375b
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-29 23:12:11 +08:00
Eric Dumazet
504d01b884 net: guard tcp_set_keepalive() to tcp sockets
Its possible to use RAW sockets to get a crash in
tcp_set_keepalive() / sk_reset_timer()

Fix is to make sure socket is a SOCK_STREAM one.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: Ieeb498a3e623cfcb54e1c865a3c0229e4acf1e87
2016-10-29 23:12:11 +08:00
Rabin Vincent
b767a16393 tracing/syscalls: Ignore numbers outside NR_syscalls' range
ARM has some private syscalls (for example, set_tls(2)) which lie
outside the range of NR_syscalls.  If any of these are called while
syscall tracing is being performed, out-of-bounds array access will
occur in the ftrace and perf sys_{enter,exit} handlers.

 # trace-cmd record -e raw_syscalls:* true && trace-cmd report
 ...
 true-653   [000]   384.675777: sys_enter:            NR 192 (0, 1000, 3, 4000022, ffffffff, 0)
 true-653   [000]   384.675812: sys_exit:             NR 192 = 1995915264
 true-653   [000]   384.675971: sys_enter:            NR 983045 (76f74480, 76f74000, 76f74b28, 76f74480, 76f76f74, 1)
 true-653   [000]   384.675988: sys_exit:             NR 983045 = 0
 ...

 # trace-cmd record -e syscalls:* true
 [   17.289329] Unable to handle kernel paging request at virtual address aaaaaace
 [   17.289590] pgd = 9e71c000
 [   17.289696] [aaaaaace] *pgd=00000000
 [   17.289985] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
 [   17.290169] Modules linked in:
 [   17.290391] CPU: 0 PID: 704 Comm: true Not tainted 3.18.0-rc2+ #21
 [   17.290585] task: 9f4dab00 ti: 9e710000 task.ti: 9e710000
 [   17.290747] PC is at ftrace_syscall_enter+0x48/0x1f8
 [   17.290866] LR is at syscall_trace_enter+0x124/0x184

Fix this by ignoring out-of-NR_syscalls-bounds syscall numbers.

Commit cd0980fc8a "tracing: Check invalid syscall nr while tracing syscalls"
added the check for less than zero, but it should have also checked
for greater than NR_syscalls.

Link: http://lkml.kernel.org/p/1414620418-29472-1-git-send-email-rabin@rab.in

Fixes: cd0980fc8a "tracing: Check invalid syscall nr while tracing syscalls"
Cc: stable@vger.kernel.org # 2.6.33+
Signed-off-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

Change-Id: I512142f8f1e1b2a8dc063209666dbce9737377e7
2016-10-29 23:12:11 +08:00
Will Deacon
d1f497b214 tracing/syscalls: Fix perf syscall tracing when syscall_nr == -1
syscall_get_nr can return -1 in the case that the task is not executing
a system call.

This patch fixes perf_syscall_{enter,exit} to check that the syscall
number is valid before using it as an index into a bitmap.

Link: http://lkml.kernel.org/r/1345137254-7377-1-git-send-email-will.deacon@arm.com

Change-Id: Iedc719957e184c6572b3ad94e241ae2a97a0b533
Cc: Jason Baron <jbaron@redhat.com>
Cc: Wade Farnsworth <wade_farnsworth@mentor.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-10-29 23:12:11 +08:00
Eric W. Biederman
e888799d92 mnt: Only change user settable mount flags in remount
commit a6138db815 upstream.

Kenton Varda <kenton@sandstorm.io> discovered that by remounting a
read-only bind mount read-only in a user namespace the
MNT_LOCK_READONLY bit would be cleared, allowing an unprivileged user
to the remount a read-only mount read-write.

Correct this by replacing the mask of mount flags to preserve
with a mask of mount flags that may be changed, and preserve
all others.   This ensures that any future bugs with this mask and
remount will fail in an easy to detect way where new mount flags
simply won't change.

Change-Id: I42178b32592b2ccc688d096b420304e93abeaba0
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Francis Moreau <francis.moro@gmail.com>
Signed-off-by: Zefan Li <lizefan@huawei.com>
2016-10-29 23:12:11 +08:00
Eric W. Biederman
20c247e17f mnt: Prevent pivot_root from creating a loop in the mount tree
commit 0d0826019e upstream.

Andy Lutomirski recently demonstrated that when chroot is used to set
the root path below the path for the new ``root'' passed to pivot_root
the pivot_root system call succeeds and leaks mounts.

In examining the code I see that starting with a new root that is
below the current root in the mount tree will result in a loop in the
mount tree after the mounts are detached and then reattached to one
another.  Resulting in all kinds of ugliness including a leak of that
mounts involved in the leak of the mount loop.

Prevent this problem by ensuring that the new mount is reachable from
the current root of the mount tree.

[Added stable cc.  Fixes CVE-2014-7970.  --Andy]

Change-Id: I77908c81d43a2e5542f8ae27ca898dd26003b0e4
Reported-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/87bnpmihks.fsf@x220.int.ebiederm.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li <lizefan@huawei.com>
2016-10-29 23:12:11 +08:00
Daniel Borkmann
86a0da23c8 netfilter: nf_conntrack_dccp: fix skb_header_pointer API usages
Some occurences in the netfilter tree use skb_header_pointer() in
the following way ...

  struct dccp_hdr _dh, *dh;
  ...
  skb_header_pointer(skb, dataoff, sizeof(_dh), &dh);

... where dh itself is a pointer that is being passed as the copy
buffer. Instead, we need to use &_dh as the forth argument so that
we're copying the data into an actual buffer that sits on the stack.

Currently, we probably could overwrite memory on the stack (e.g.
with a possibly mal-formed DCCP packet), but unintentionally, as
we only want the buffer to be placed into _dh variable.

Change-Id: Ief6a82b5a58e1dd88d43313eb8356c52cf89b214
Fixes: 2bc780499a ("[NETFILTER]: nf_conntrack: add DCCP protocol support")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-29 23:12:11 +08:00
Sasha Levin
1049cdb95b vfs: read file_handle only once in handle_to_path
We used to read file_handle twice. Once to get the amount of extra bytes, and
once to fetch the entire structure.

This may be problematic since we do size verifications only after the first
read, so if the number of extra bytes changes in userspace between the first
and second calls, we'll have an incoherent view of file_handle.

Instead, read the constant size once, and copy that over to the final
structure without having to re-read it again.

Change-Id: Ib05e5129629e27d5a05953098c5bc470fae40d2a
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2016-10-29 23:12:11 +08:00
Kirill A. Shutemov
397b074320 mm: Fix NULL pointer dereference in madvise(MADV_WILLNEED) support
Sasha Levin found a NULL pointer dereference that is due to a missing
page table lock, which in turn is due to the pmd entry in question being
a transparent huge-table entry.

The code - introduced in commit 1998cc0489 ("mm: make
madvise(MADV_WILLNEED) support swap file prefetch") - correctly checks
for this situation using pmd_none_or_trans_huge_or_clear_bad(), but it
turns out that that function doesn't work correctly.

pmd_none_or_trans_huge_or_clear_bad() expected that pmd_bad() would
trigger if the transparent hugepage bit was set, but it doesn't do that
if pmd_numa() is also set. Note that the NUMA bit only gets set on real
NUMA machines, so people trying to reproduce this on most normal
development systems would never actually trigger this.

Fix it by removing the very subtle (and subtly incorrect) expectation,
and instead just checking pmd_trans_huge() explicitly.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Acked-by: Andrea Arcangeli <aarcange@redhat.com>
[ Additionally remove the now stale test for pmd_trans_huge() inside the
  pmd_bad() case - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Change-Id: I3f3763f236ef102de735297cd175cf514d40d28f
2016-10-29 23:12:11 +08:00
D.S. Ljungmark
ba0d2ae061 ipv6: Don't reduce hop limit for an interface
A local route may have a lower hop_limit set than global routes do.

RFC 3756, Section 4.2.7, "Parameter Spoofing"

>   1.  The attacker includes a Current Hop Limit of one or another small
>       number which the attacker knows will cause legitimate packets to
>       be dropped before they reach their destination.

>   As an example, one possible approach to mitigate this threat is to
>   ignore very small hop limits.  The nodes could implement a
>   configurable minimum hop limit, and ignore attempts to set it below
>   said limit.

Change-Id: I51ee1778e3d2d5fa1aefbdf1ad8869e4e8dc28b2
Signed-off-by: D.S. Ljungmark <ljungmark@modio.se>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-29 23:12:10 +08:00
Andrey Vagin
be41f71342 netfilter: nf_conntrack: reserve two bytes for nf_ct_ext->len
"len" contains sizeof(nf_ct_ext) and size of extensions. In a worst
case it can contain all extensions. Bellow you can find sizes for all
types of extensions. Their sum is definitely bigger than 256.

nf_ct_ext_types[0]->len = 24
nf_ct_ext_types[1]->len = 32
nf_ct_ext_types[2]->len = 24
nf_ct_ext_types[3]->len = 32
nf_ct_ext_types[4]->len = 152
nf_ct_ext_types[5]->len = 2
nf_ct_ext_types[6]->len = 16
nf_ct_ext_types[7]->len = 8

I have seen "len" up to 280 and my host has crashes w/o this patch.

The right way to fix this problem is reducing the size of the ecache
extension (4) and Florian is going to do this, but these changes will
be quite large to be appropriate for a stable tree.

Change-Id: If9efaf2b103cf304bbfa583e354cfad3faa77ac2
Fixes: 5b423f6a40 (netfilter: nf_conntrack: fix racy timer handling with reliable)
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-29 23:12:10 +08:00
Sasha Levin
1fea326d57 net: llc: use correct size for sysctl timeout entries
The timeout entries are sizeof(int) rather than sizeof(long), which
means that when they were getting read we'd also leak kernel memory
to userspace along with the timeout values.

Change-Id: I328d1186720a6f70f555eeeb62c83ee69814868d
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-29 23:12:10 +08:00
Jann Horn
871eac221e fs: take i_mutex during prepare_binprm for set[ug]id executables
This prevents a race between chown() and execve(), where chowning a
setuid-user binary to root would momentarily make the binary setuid
root.

This patch was mostly written by Linus Torvalds.

Signed-off-by: Jann Horn <jann@thejh.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Change-Id: Iecebf23d07e299689e4ba4fd74ea8821ef96e72b
2016-10-29 23:12:10 +08:00
Eric Dumazet
cc354dd458 udp: fix behavior of wrong checksums
We have two problems in UDP stack related to bogus checksums :

1) We return -EAGAIN to application even if receive queue is not empty.
   This breaks applications using edge trigger epoll()

2) Under UDP flood, we can loop forever without yielding to other
   processes, potentially hanging the host, especially on non SMP.

This patch is an attempt to make things better.

We might in the future add extra support for rt applications
wanting to better control time spent doing a recv() in a hostile
environment. For example we could validate checksums before queuing
packets in socket receive queue.

Change-Id: I9355321ac7ee564d56c342fa7738b918052bf308
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-29 23:12:10 +08:00
Michael Halcrow
83ee380d4b eCryptfs: Remove buggy and unnecessary write in file name decode routine
Dmitry Chernenkov used KASAN to discover that eCryptfs writes past the
end of the allocated buffer during encrypted filename decoding. This
fix corrects the issue by getting rid of the unnecessary 0 write when
the current bit offset is 2.

Change-Id: I2e139f816b9ce0ad6d207c6f454d6f25061383ee
Signed-off-by: Michael Halcrow <mhalcrow@google.com>
Reported-by: Dmitry Chernenkov <dmitryc@google.com>
Suggested-by: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org # v2.6.29+: 51ca58d eCryptfs: Filename Encryption: Encoding and encryption functions
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
2016-10-29 23:12:10 +08:00
Florian Westphal
91c6941897 netfilter: conntrack: disable generic tracking for known protocols
Given following iptables ruleset:

-P FORWARD DROP
-A FORWARD -m sctp --dport 9 -j ACCEPT
-A FORWARD -p tcp --dport 80 -j ACCEPT
-A FORWARD -p tcp -m conntrack -m state ESTABLISHED,RELATED -j ACCEPT

One would assume that this allows SCTP on port 9 and TCP on port 80.
Unfortunately, if the SCTP conntrack module is not loaded, this allows
*all* SCTP communication, to pass though, i.e. -p sctp -j ACCEPT,
which we think is a security issue.

This is because on the first SCTP packet on port 9, we create a dummy
"generic l4" conntrack entry without any port information (since
conntrack doesn't know how to extract this information).

All subsequent packets that are unknown will then be in established
state since they will fallback to proto_generic and will match the
'generic' entry.

Our originally proposed version [1] completely disabled generic protocol
tracking, but Jozsef suggests to not track protocols for which a more
suitable helper is available, hence we now mitigate the issue for in
tree known ct protocol helpers only, so that at least NAT and direction
information will still be preserved for others.

 [1] http://www.spinics.net/lists/netfilter-devel/msg33430.html

Joint work with Daniel Borkmann.

Change-Id: I7fff74303d98876efd3e7834555cbf95d0319359
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-29 23:12:10 +08:00
Hannes Frederic Sowa
ef06d64826 ipv4: try to cache dst_entries which would cause a redirect
Not caching dst_entries which cause redirects could be exploited by hosts
on the same subnet, causing a severe DoS attack. This effect aggravated
since commit f886497212 ("ipv4: fix dst race in sk_dst_get()").

Lookups causing redirects will be allocated with DST_NOCACHE set which
will force dst_release to free them via RCU.  Unfortunately waiting for
RCU grace period just takes too long, we can end up with >1M dst_entries
waiting to be released and the system will run OOM. rcuos threads cannot
catch up under high softirq load.

Attaching the flag to emit a redirect later on to the specific skb allows
us to cache those dst_entries thus reducing the pressure on allocation
and deallocation.

This issue was discovered by Marcelo Leitner.

Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Marcelo Leitner <mleitner@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>

Change-Id: I53e4b500a4db2f5fece937a42a3bd810b2640c44
2016-10-29 23:12:10 +08:00
Sasha Levin
c030f48a9d KEYS: close race between key lookup and freeing
When a key is being garbage collected, it's key->user would get put before
the ->destroy() callback is called, where the key is removed from it's
respective tracking structures.

This leaves a key hanging in a semi-invalid state which leaves a window open
for a different task to try an access key->user. An example is
find_keyring_by_name() which would dereference key->user for a key that is
in the process of being garbage collected (where key->user was freed but
->destroy() wasn't called yet - so it's still present in the linked list).

This would cause either a panic, or corrupt memory.

Change-Id: Ic74246dc2dcc593f04f71063e3301e7356d588b7
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2016-10-29 23:12:10 +08:00
FrozenCow
ea3043f8c4 usb: gadget: mass_storage: added sysfs entry for cdrom to LUNs
This patch adds a "cdrom" sysfs entry for each mass_storage LUN, just
like "ro" sysfs entry. This allows switching between USB and CD-ROM
emulation without reinserting the module or recompiling the kernel.

Change-Id: Idf83c74815b1ad370428ab9d3e5503d5f7bcd3b6
2016-10-29 23:12:10 +08:00
Zhao Wei Liew
c0f0ba5bad flo: Enable cpufreq limit driver
Used by PowerHAL for low power mode.

Change-Id: Id1766d78d67567832a18f47cdf8569f517247abf
Signed-off-by: Zhao Wei Liew <zhaoweiliew@gmail.com>
2016-10-29 23:12:10 +08:00
Zhao Wei Liew
feb154ed0e cpufreq: Add cpufreq limit driver
This allows userspace to specify a min/max limit to the CPU
frequency, working around the standard scaling_[max|min]_freq
sysfs interfaces.

Initially based on Paul's cpufreq_limit driver.

Change-Id: I87dd8a0f67aadce0ca0f5cb668d7ee16c616deb0
Signed-off-by: Zhao Wei Liew <zhaoweiliew@gmail.com>
2016-10-29 23:12:10 +08:00
Steve Kondik
cf659a312c video: msm: Don't send CABC commands with power off
* Current code is blindly sending commands to the hardware when
   it's powered down. This causes a DMA timeout and wedges the panel
   until rebooted. Add a check for the power state.

Change-Id: I33a508f22c2a1a046a50782912802784928d47f6
[zhaoweiliew: Check against the proper variable]
Signed-off-by: Zhao Wei Liew <zhaoweiliew@gmail.com>
2016-10-29 23:12:10 +08:00
Zhao Wei Liew
6bf0344b15 msm_fb: Fix ACO bounds check
This was causing auto contrast optimisation to not be set
when the file is written to.

Change-Id: I7e88a6afbf4692b5bc01d7337455f15aa5640d72
Signed-off-by: Zhao Wei Liew <zhaoweiliew@gmail.com>
2016-10-29 23:12:09 +08:00
Zhao Wei Liew
01e340d26c msm_fb: Print new line when getting ACO and SRE vals
Much neater when getting the value of the file.

Change-Id: Ie5cc4e83323b034b6ae330fe799a6482a0b240bd
Signed-off-by: Zhao Wei Liew <zhaoweiliew@gmail.com>
2016-10-29 23:12:09 +08:00
Zhao Wei Liew
201c6611ea video: msm: Add more JDI display features
* Add auto contrast optimization.
 * Add 3 intensity levels for SRE.
 * Combine all the CABC commands.
 * Slight clean up.

Change-Id: I6eef9cd79024bf0e13f48faba8dcfce1765f42d1
Signed-off-by: Zhao Wei Liew <zhaoweiliew@gmail.com>
2016-10-29 23:12:09 +08:00
Steve Kondik
644ca5d77b video: msm: Add support for CABC and SRE for JDI display
Change-Id: Id6b5f3c65a56b310a15ce5dd0484b99001c9a562
2016-10-29 23:12:09 +08:00
Zhao Wei Liew
928892ffef flo: Speed up boot and improve post-boot UX
* Boot with NOOP to speed up boot animation.
 * Enable BFQ scheduler for better UX after boot.

Change-Id: I1300e15b4435f0f51ba6eb974de1f9a7ca0e1032
Signed-off-by: Zhao Wei Liew <zhaoweiliew@gmail.com>
2016-10-29 23:12:09 +08:00
Mauro Andreolini
eb30968621 block, bfq: add Early Queue Merge (EQM) to BFQ-v7r8 for 3.4.0
A set of processes may happen  to  perform interleaved reads, i.e.,requests
whose union would give rise to a  sequential read  pattern.  There are two
typical  cases: in the first  case,   processes  read  fixed-size chunks of
data at a fixed distance from each other, while in the second case processes
may read variable-size chunks at  variable distances. The latter case occurs
for  example with  QEMU, which  splits the  I/O generated  by the  guest into
multiple chunks,  and lets these chunks  be served by a  pool of cooperating
processes,  iteratively  assigning  the  next  chunk of  I/O  to  the first
available  process. CFQ  uses actual  queue merging  for the  first type of
rocesses, whereas it  uses preemption to get a sequential  read pattern out
of the read requests  performed by the second type of  processes. In the end
it uses  two different  mechanisms to  achieve the  same goal: boosting the
throughput with interleaved I/O.

This patch introduces  Early Queue Merge (EQM), a unified mechanism to get a
sequential  read pattern  with both  types of  processes. The  main idea is
checking newly arrived requests against the next request of the active queue
both in case of actual request insert and in case of request merge. By doing
so, both the types of processes can be handled by just merging their queues.
EQM is  then simpler and  more compact than the  pair of mechanisms used in
CFQ.

Finally, EQM  also preserves the  typical low-latency properties of BFQ, by
properly restoring the weight-raising state of  a queue when it gets back to
a non-merged state.

Change-Id: I6e8e59d479c13669126ccaa7f8c2f9d54dab876f
Signed-off-by: Mauro Andreolini <mauro.andreolini@unimore.it>
Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
2016-10-29 23:12:09 +08:00
Paolo Valente
0320934b46 block: introduce the BFQ-v7r8 I/O sched for 3.4
Add the BFQ-v7r8 I/O scheduler to 3.4.
The general structure is borrowed from CFQ, as much of the code for
handling I/O contexts. Over time, several useful features have been
ported from CFQ as well (details in the changelog in README.BFQ). A
(bfq_)queue is associated to each task doing I/O on a device, and each
time a scheduling decision has to be made a queue is selected and served
until it expires.

    - Slices are given in the service domain: tasks are assigned
      budgets, measured in number of sectors. Once got the disk, a task
      must however consume its assigned budget within a configurable
      maximum time (by default, the maximum possible value of the
      budgets is automatically computed to comply with this timeout).
      This allows the desired latency vs "throughput boosting" tradeoff
      to be set.

    - Budgets are scheduled according to a variant of WF2Q+, implemented
      using an augmented rb-tree to take eligibility into account while
      preserving an O(log N) overall complexity.

    - A low-latency tunable is provided; if enabled, both interactive
      and soft real-time applications are guaranteed a very low latency.

    - Latency guarantees are preserved also in the presence of NCQ.

    - Also with flash-based devices, a high throughput is achieved
      while still preserving latency guarantees.

    - BFQ features Early Queue Merge (EQM), a sort of fusion of the
      cooperating-queue-merging and the preemption mechanisms present
      in CFQ. EQM is in fact a unified mechanism that tries to get a
      sequential read pattern, and hence a high throughput, with any
      set of processes performing interleaved I/O over a contiguous
      sequence of sectors.

    - BFQ supports full hierarchical scheduling, exporting a cgroups
      interface.  Since each node has a full scheduler, each group can
      be assigned its own weight.

    - If the cgroups interface is not used, only I/O priorities can be
      assigned to processes, with ioprio values mapped to weights
      with the relation weight = IOPRIO_BE_NR - ioprio.

    - ioprio classes are served in strict priority order, i.e., lower
      priority queues are not served as long as there are higher
      priority queues.  Among queues in the same class the bandwidth is
      distributed in proportion to the weight of each queue. A very
      thin extra bandwidth is however guaranteed to the Idle class, to
      prevent it from starving.

Change-Id: I62eb1769f7d6b4e542a10a9c7751a454d31c04de
Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
2016-10-29 23:12:09 +08:00