Commit Graph

17767 Commits

Author SHA1 Message Date
Seunghun Han 94bee3919a x86/acpi: Prevent out of bound access caused by broken ACPI tables
The bus_irq argument of mp_override_legacy_irq() is used as the index into
the isa_irq_to_gsi[] array. The bus_irq argument originates from
ACPI_MADT_TYPE_IO_APIC and ACPI_MADT_TYPE_INTERRUPT items in the ACPI
tables, but is nowhere sanity checked.

That allows broken or malicious ACPI tables to overwrite memory, which
might cause malfunction, panic or arbitrary code execution.

Add a sanity check and emit a warning when that triggers.

[ tglx: Added warning and rewrote changelog ]

Signed-off-by: Seunghun Han <kkamagui@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: security@kernel.org
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: stable@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
CVE-2017-11473

Change-Id: I06eeb72e3207d795a87279ecfdd362e361f505f7
2019-07-27 21:51:30 +02:00
Seunghun Han 209a575329 x86/MCE: Serialize sysfs changes
The check_interval file in

  /sys/devices/system/machinecheck/machinecheck<cpu number>

directory is a global timer value for MCE polling. If it is changed by one
CPU, mce_restart() broadcasts the event to other CPUs to delete and restart
the MCE polling timer and __mcheck_cpu_init_timer() reinitializes the
mce_timer variable.

If more than one CPU writes a specific value to the check_interval file
concurrently, mce_timer is not protected from such concurrent accesses and
all kinds of explosions happen. Since only root can write to those sysfs
variables, the issue is not a big deal security-wise.

However, concurrent writes to these configuration variables is void of
reason so the proper thing to do is to serialize the access with a mutex.

Boris:

 - Make store_int_with_restart() use device_store_ulong() to filter out
   negative intervals
 - Limit min interval to 1 second
 - Correct locking
 - Massage commit message

Signed-off-by: Seunghun Han <kkamagui@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20180302202706.9434-1-kkamagui@gmail.com
2019-07-27 21:51:20 +02:00
Eric Biggers 79bd202987 crypto: hash - annotate algorithms taking optional key
commit a208fa8f33031b9e0aba44c7d1b7e68eb0cbd29e upstream.

We need to consistently enforce that keyed hashes cannot be used without
setting the key.  To do this we need a reliable way to determine whether
a given hash algorithm is keyed or not.  AF_ALG currently does this by
checking for the presence of a ->setkey() method.  However, this is
actually slightly broken because the CRC-32 algorithms implement
->setkey() but can also be used without a key.  (The CRC-32 "key" is not
actually a cryptographic key but rather represents the initial state.
If not overridden, then a default initial state is used.)

Prepare to fix this by introducing a flag CRYPTO_ALG_OPTIONAL_KEY which
indicates that the algorithm has a ->setkey() method, but it is not
required to be called.  Then set it on all the CRC-32 algorithms.

The same also applies to the Adler-32 implementation in Lustre.

Also, the cryptd and mcryptd templates have to pass through the flag
from their underlying algorithm.

Change-Id: Ie0547047b828f276d967e8979cd9176e57c99ac8
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
[bwh: Backported to 3.2:
 - Drop changes to nonexistent drivers
 - There's no CRYPTO_ALG_INTERNAL flag
 - Adjust filenames]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:49:17 +02:00
Eric Biggers 7be193871e crypto: salsa20 - fix blkcipher_walk API usage
commit ecaaab5649781c5a0effdaf298a925063020500e upstream.

When asked to encrypt or decrypt 0 bytes, both the generic and x86
implementations of Salsa20 crash in blkcipher_walk_done(), either when
doing 'kfree(walk->buffer)' or 'free_page((unsigned long)walk->page)',
because walk->buffer and walk->page have not been initialized.

The bug is that Salsa20 is calling blkcipher_walk_done() even when
nothing is in 'walk.nbytes'.  But blkcipher_walk_done() is only meant to
be called when a nonzero number of bytes have been provided.

The broken code is part of an optimization that tries to make only one
call to salsa20_encrypt_bytes() to process inputs that are not evenly
divisible by 64 bytes.  To fix the bug, just remove this "optimization"
and use the blkcipher_walk API the same way all the other users do.

Reproducer:

    #include <linux/if_alg.h>
    #include <sys/socket.h>
    #include <unistd.h>

    int main()
    {
            int algfd, reqfd;
            struct sockaddr_alg addr = {
                    .salg_type = "skcipher",
                    .salg_name = "salsa20",
            };
            char key[16] = { 0 };

            algfd = socket(AF_ALG, SOCK_SEQPACKET, 0);
            bind(algfd, (void *)&addr, sizeof(addr));
            reqfd = accept(algfd, 0, 0);
            setsockopt(algfd, SOL_ALG, ALG_SET_KEY, key, sizeof(key));
            read(reqfd, key, sizeof(key));
    }

Reported-by: syzbot <syzkaller@googlegroups.com>
Fixes: eb6f13eb9f ("[CRYPTO] salsa20_generic: Fix multi-page processing")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2019-07-27 21:45:46 +02:00
Laura Abbott 249d07ea20 x86/mm/32: Set the '__vmalloc_start_set' flag in initmem_init()
commit 861ce4a3244c21b0af64f880d5bfe5e6e2fb9e4a upstream.

'__vmalloc_start_set' currently only gets set in initmem_init() when
!CONFIG_NEED_MULTIPLE_NODES. This breaks detection of vmalloc address
with virt_addr_valid() with CONFIG_NEED_MULTIPLE_NODES=y, causing
a kernel crash:

  [mm/usercopy] 517e1fbeb6: kernel BUG at arch/x86/mm/physaddr.c:78!

Set '__vmalloc_start_set' appropriately for that case as well.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Laura Abbott <labbott@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: dc16ecf7fd ("x86-32: use specific __vmalloc_start_set flag in __virt_addr_valid")
Link: http://lkml.kernel.org/r/1494278596-30373-1-git-send-email-labbott@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2019-07-27 21:44:34 +02:00
Arnd Bergmann c0e765a158 x86/io: Add "memory" clobber to insb/insw/insl/outsb/outsw/outsl
commit 7206f9bf108eb9513d170c73f151367a1bdf3dbf upstream.

The x86 version of insb/insw/insl uses an inline assembly that does
not have the target buffer listed as an output. This can confuse
the compiler, leading it to think that a subsequent access of the
buffer is uninitialized:

  drivers/net/wireless/wl3501_cs.c: In function ‘wl3501_mgmt_scan_confirm’:
  drivers/net/wireless/wl3501_cs.c:665:9: error: ‘sig.status’ is used uninitialized in this function [-Werror=uninitialized]
  drivers/net/wireless/wl3501_cs.c:668:12: error: ‘sig.cap_info’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
  drivers/net/sb1000.c: In function 'sb1000_rx':
  drivers/net/sb1000.c:775:9: error: 'st[0]' is used uninitialized in this function [-Werror=uninitialized]
  drivers/net/sb1000.c:776:10: error: 'st[1]' may be used uninitialized in this function [-Werror=maybe-uninitialized]
  drivers/net/sb1000.c:784:11: error: 'st[1]' may be used uninitialized in this function [-Werror=maybe-uninitialized]

I tried to mark the exact input buffer as an output here, but couldn't
figure it out. As suggested by Linus, marking all memory as clobbered
however is good enough too. For the outs operations, I also add the
memory clobber, to force the input to be written to local variables.
This is probably already guaranteed by the "asm volatile", but it can't
hurt to do this for symmetry.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Link: http://lkml.kernel.org/r/20170719125310.2487451-5-arnd@arndb.de
Link: https://lkml.org/lkml/2017/7/12/605
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2019-07-27 21:44:32 +02:00
Wang, Rui Y 4ea03c700d crypto: ghash-clmulni - Fix load failure
commit 3a020a723c65eb8ffa7c237faca26521a024e582 upstream.

ghash_clmulni_intel fails to load on Linux 4.3+ with the following message:
"modprobe: ERROR: could not insert 'ghash_clmulni_intel': Invalid argument"

After 8996eafdc ("crypto: ahash - ensure statesize is non-zero") all ahash
drivers are required to implement import()/export(), and must have a non-
zero statesize.

This patch has been tested with the algif_hash interface. The calculated
digest values, after several rounds of import()s and export()s, match those
calculated by tcrypt.

Signed-off-by: Rui Wang <rui.y.wang@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2019-07-27 21:44:03 +02:00
Hector Marco-Gisbert 676f73cc6e UPSTREAM: x86/mm/32: Enable full randomization on i386 and X86_32
Currently on i386 and on X86_64 when emulating X86_32 in legacy mode, only
the stack and the executable are randomized but not other mmapped files

libraries, vDSO and mmap requests on i386 and in X86_32 in legacy mode.

By default on i386 there are 8 bits for the randomization of the libraries,
vDSO and mmaps which only uses 1MB of VA.

This patch preserves the original randomness, using 1MB of VA out of 3GB or
4GB. We think that 1MB out of 3GB is not a big cost for having the ASLR.

The first obvious security benefit is that all objects are randomized (not
only the stack and the executable) in legacy mode which highly increases
the ASLR effectiveness, otherwise the attackers may use these
non-randomized areas. But also sensitive setuid/setgid applications are
more secure because currently, attackers can disable the randomization of
these applications by setting the ulimit stack to "unlimited". This is a
very old and widely known trick to disable the ASLR in i386 which has been
allowed for too long.

Another trick used to disable the ASLR was to set the ADDR_NO_RANDOMIZE
personality flag, but fortunately this doesn't work on setuid/setgid
applications because there is security checks which clear Security-relevant
flags.

This patch always randomizes the mmap_legacy_base address, removing the
possibility to disable the ASLR by setting the stack to "unlimited".

Signed-off-by: Hector Marco-Gisbert <hecmargi@upv.es>
Acked-by: Ismael Ripoll Ripoll <iripoll@upv.es>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Cc: kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/1457639460-5242-1-git-send-email-hecmargi@upv.es
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Bug: 28763575
Change-Id: Icd128489c3c196ade64f79d4ea898d29f8471baf
(cherry picked from commit 8b8addf891de8a00e4d39fc32f93f7c5eb8feceb)
2017-10-19 01:00:49 +02:00
Theodore Ts'o f6ee685bb3 BACKPORT: random: introduce getrandom(2) system call
Almost clean cherry pick of c6e9d6f38894798696f23c8084ca7edbf16ee895,
includes change made by merge 0891ad829d2a0501053703df66029e843e3b8365.

The getrandom(2) system call was requested by the LibreSSL Portable
developers.  It is analoguous to the getentropy(2) system call in
OpenBSD.

The rationale of this system call is to provide resiliance against
file descriptor exhaustion attacks, where the attacker consumes all
available file descriptors, forcing the use of the fallback code where
/dev/[u]random is not available.  Since the fallback code is often not
well-tested, it is better to eliminate this potential failure mode
entirely.

The other feature provided by this new system call is the ability to
request randomness from the /dev/urandom entropy pool, but to block
until at least 128 bits of entropy has been accumulated in the
/dev/urandom entropy pool.  Historically, the emphasis in the
/dev/urandom development has been to ensure that urandom pool is
initialized as quickly as possible after system boot, and preferably
before the init scripts start execution.

This is because changing /dev/urandom reads to block represents an
interface change that could potentially break userspace which is not
acceptable.  In practice, on most x86 desktop and server systems, in
general the entropy pool can be initialized before it is needed (and
in modern kernels, we will printk a warning message if not).  However,
on an embedded system, this may not be the case.  And so with this new
interface, we can provide the functionality of blocking until the
urandom pool has been initialized.  Any userspace program which uses
this new functionality must take care to assure that if it is used
during the boot process, that it will not cause the init scripts or
other portions of the system startup to hang indefinitely.

SYNOPSIS
	#include <linux/random.h>

	int getrandom(void *buf, size_t buflen, unsigned int flags);

DESCRIPTION
	The system call getrandom() fills the buffer pointed to by buf
	with up to buflen random bytes which can be used to seed user
	space random number generators (i.e., DRBG's) or for other
	cryptographic uses.  It should not be used for Monte Carlo
	simulations or other programs/algorithms which are doing
	probabilistic sampling.

	If the GRND_RANDOM flags bit is set, then draw from the
	/dev/random pool instead of the /dev/urandom pool.  The
	/dev/random pool is limited based on the entropy that can be
	obtained from environmental noise, so if there is insufficient
	entropy, the requested number of bytes may not be returned.
	If there is no entropy available at all, getrandom(2) will
	either block, or return an error with errno set to EAGAIN if
	the GRND_NONBLOCK bit is set in flags.

	If the GRND_RANDOM bit is not set, then the /dev/urandom pool
	will be used.  Unlike using read(2) to fetch data from
	/dev/urandom, if the urandom pool has not been sufficiently
	initialized, getrandom(2) will block (or return -1 with the
	errno set to EAGAIN if the GRND_NONBLOCK bit is set in flags).

	The getentropy(2) system call in OpenBSD can be emulated using
	the following function:

            int getentropy(void *buf, size_t buflen)
            {
                    int     ret;

                    if (buflen > 256)
                            goto failure;
                    ret = getrandom(buf, buflen, 0);
                    if (ret < 0)
                            return ret;
                    if (ret == buflen)
                            return 0;
            failure:
                    errno = EIO;
                    return -1;
            }

RETURN VALUE
       On success, the number of bytes that was filled in the buf is
       returned.  This may not be all the bytes requested by the
       caller via buflen if insufficient entropy was present in the
       /dev/random pool, or if the system call was interrupted by a
       signal.

       On error, -1 is returned, and errno is set appropriately.

ERRORS
	EINVAL		An invalid flag was passed to getrandom(2)

	EFAULT		buf is outside the accessible address space.

	EAGAIN		The requested entropy was not available, and
			getentropy(2) would have blocked if the
			GRND_NONBLOCK flag was not set.

	EINTR		While blocked waiting for entropy, the call was
			interrupted by a signal handler; see the description
			of how interrupted read(2) calls on "slow" devices
			are handled with and without the SA_RESTART flag
			in the signal(7) man page.

NOTES
	For small requests (buflen <= 256) getrandom(2) will not
	return EINTR when reading from the urandom pool once the
	entropy pool has been initialized, and it will return all of
	the bytes that have been requested.  This is the recommended
	way to use getrandom(2), and is designed for compatibility
	with OpenBSD's getentropy() system call.

	However, if you are using GRND_RANDOM, then getrandom(2) may
	block until the entropy accounting determines that sufficient
	environmental noise has been gathered such that getrandom(2)
	will be operating as a NRBG instead of a DRBG for those people
	who are working in the NIST SP 800-90 regime.  Since it may
	block for a long time, these guarantees do *not* apply.  The
	user may want to interrupt a hanging process using a signal,
	so blocking until all of the requested bytes are returned
	would be unfriendly.

	For this reason, the user of getrandom(2) MUST always check
	the return value, in case it returns some error, or if fewer
	bytes than requested was returned.  In the case of
	!GRND_RANDOM and small request, the latter should never
	happen, but the careful userspace code (and all crypto code
	should be careful) should check for this anyway!

	Finally, unless you are doing long-term key generation (and
	perhaps not even then), you probably shouldn't be using
	GRND_RANDOM.  The cryptographic algorithms used for
	/dev/urandom are quite conservative, and so should be
	sufficient for all purposes.  The disadvantage of GRND_RANDOM
	is that it can block, and the increased complexity required to
	deal with partially fulfilled getrandom(2) requests.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Zach Brown <zab@zabbo.net>

Bug: http://b/29621447
Change-Id: I189ba74070dd6d918b0fdf83ff30bb74ec0f7556
(cherry picked from commit 4af712e8df998475736f3e2727701bd31e3751a9)
2017-09-08 18:50:11 +00:00
Hugh Dickins 1448dc70cd mm: larger stack guard gap, between vmas
commit 1be7107fbe18eed3e319a6c3e83c78254b693acb upstream.

Stack guard page is a useful feature to reduce a risk of stack smashing
into a different mapping. We have been using a single page gap which
is sufficient to prevent having stack adjacent to a different mapping.
But this seems to be insufficient in the light of the stack usage in
userspace. E.g. glibc uses as large as 64kB alloca() in many commonly
used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX]
which is 256kB or stack strings with MAX_ARG_STRLEN.

This will become especially dangerous for suid binaries and the default
no limit for the stack size limit because those applications can be
tricked to consume a large portion of the stack and a single glibc call
could jump over the guard page. These attacks are not theoretical,
unfortunatelly.

Make those attacks less probable by increasing the stack guard gap
to 1MB (on systems with 4k pages; but make it depend on the page size
because systems with larger base pages might cap stack allocations in
the PAGE_SIZE units) which should cover larger alloca() and VLA stack
allocations. It is obviously not a full fix because the problem is
somehow inherent, but it should reduce attack space a lot.

One could argue that the gap size should be configurable from userspace,
but that can be done later when somebody finds that the new 1MB is wrong
for some special case applications.  For now, add a kernel command line
option (stack_guard_gap) to specify the stack gap size (in page units).

Implementation wise, first delete all the old code for stack guard page:
because although we could get away with accounting one extra page in a
stack vma, accounting a larger gap can break userspace - case in point,
a program run with "ulimit -S -v 20000" failed when the 1MB gap was
counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK
and strict non-overcommit mode.

Instead of keeping gap inside the stack vma, maintain the stack guard
gap as a gap between vmas: using vm_start_gap() in place of vm_start
(or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few
places which need to respect the gap - mainly arch_get_unmapped_area(),
and and the vma tree's subtree_gap support for that.

Change-Id: I899511079c5057ee5299ef1aff5ab8f0c77c740d
Original-patch-by: Oleg Nesterov <oleg@redhat.com>
Original-patch-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
[wt: backport to 4.11: adjust context]
[wt: backport to 4.9: adjust context ; kernel doc was not in admin-guide]
[wt: backport to 4.4: adjust context ; drop ppc hugetlb_radix changes]
[wt: backport to 3.18: adjust context ; no FOLL_POPULATE ;
     s390 uses generic arch_get_unmapped_area()]
[wt: backport to 3.16: adjust context]
[wt: backport to 3.10: adjust context ; code logic in PARISC's
     arch_get_unmapped_area() wasn't found ; code inserted into
     expand_upwards() and expand_downwards() runs under anon_vma lock;
     changes for gup.c:faultin_page go to memory.c:__get_user_pages();
     included Hugh Dickins' fixes]
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-07-11 00:00:39 +00:00
Kees Cook ef8c8d2e5e mm: Tighten x86 /dev/mem with zeroing reads
commit a4866aa812518ed1a37d8ea0c881dc946409de94 upstream.

Under CONFIG_STRICT_DEVMEM, reading System RAM through /dev/mem is
disallowed. However, on x86, the first 1MB was always allowed for BIOS
and similar things, regardless of it actually being System RAM. It was
possible for heap to end up getting allocated in low 1MB RAM, and then
read by things like x86info or dd, which would trip hardened usercopy:

usercopy: kernel memory exposure attempt detected from ffff880000090000 (dma-kmalloc-256) (4096 bytes)

This changes the x86 exception for the low 1MB by reading back zeros for
System RAM areas instead of blindly allowing them. More work is needed to
extend this to mmap, but currently mmap doesn't go through usercopy, so
hardened usercopy won't Oops the kernel.

Change-Id: I19f91639b3c52027034b4742b8d3c001e6890fe7
Reported-by: Tommi Rantala <tommi.t.rantala@nokia.com>
Tested-by: Tommi Rantala <tommi.t.rantala@nokia.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Brad Spengler <spender@grsecurity.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-05 19:20:37 +00:00
Luca Stefani ff1ebfd98d This is the 3.10.102 stable release
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXXS5iAAoJEE44bZycYXAvDj8P/jbhmGAgW6tw2cnS90QIZDqG
 M/nclEId61jICNvbfP6zsioKeWyrmzr5G7NjqTThsSNhCo/DXs3ddMqLy3pOaFdq
 mytXtHIUpwZoplEib+ODinW40CMqnu11XSWEcee2nrsPuGNsnc7BY0wmFBa6UVCV
 rOZef9SN9lJcZSYY/auvgLDXOXdQ+NMxp5hau30aF5HBO8hTDXStjPRcUwCvz7aR
 govTQJHlS4HzLH3JOYS3Dt8IYFDOrKhQIby2nFdw7eiUxHCRy2F0asabTh3DzCw1
 iLvFroozjyVXwozfWMqLCvMa+514MXJy8Nkva6xiAHraC8UrgfPtcNsTdgtkdH9T
 V2Am9b0L7yiBdG6hsZLxkU3akk7vU/0dtppwzvudANT6i2tGcDSBeaZq3T2pAv7B
 7coY53GzHZdQnbdTZbYeS1fxebxyXw50D5OJkF8DyLhoL7Uj2Dvv0QdjKv+U/e5D
 VQ+ZyGcBdCLuOzflXysI10E01y0/M3FrkubgGBM4Oh0eYKCHJaHG/NCZy5JY/qxy
 S0phem8RbeZPbcL14z+5buWIi1lUkTiCIMG8c32ZEmDh84drnICqABA0RzKmqdkj
 ucQa+PzkMQ1DyhAMUl/CwpBfSqf1Zs3agLo78Kp5MTGfeAA90m0SeVqhmDgWhwqG
 HhSlsPFfMfmJl5S0uJpQ
 =UhFl
 -----END PGP SIGNATURE-----

Merge tag 'v3.10.102' into HEAD

This is the 3.10.102 stable release

Change-Id: Ic7d338fb190966b26aa151361fc37414f701d8b2
2017-04-18 17:22:08 +02:00
Luca Stefani b46b1587e9 This is the 3.10.101 stable release
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJW6X7DAAoJEDjbvchgkmk+RqMP/23oS5ZXheNwIwKzWO8f1tnZ
 Em+1+p5zZ8cFV/mWHz6Jxxt0/3tEteT1m3sWGJLvG3Uzz1kd7alS/omxU0Lf4WEf
 Qzso9R5sg4+SKUtAhRmJlXhLvVwnKc5KiGyrBt4ZBrwViHy+phr6/uPUTI5m6u48
 Xzujf+onqeojNXTfMxAq90r9a/AHmVlTwQ+j6SqPkKKbCrtHp9bjrMEb2JFbk8CN
 dPhJQuKgUjUx0Scfk2KOecS89e0Kr+k8RWFstPtR0eUkvNlfzjcK7SD1WZ5i0yd4
 d89tG/X9T2uTVeO2hTxKmTmOAxgiBS/RAljXgA+3j1VTPBMuOvYjyIVQ8RwROFtp
 UbgkSabwhX2prKj1J0zXNUjSvYyiGQ1mXWcHemZxaJAZKl0rWwHeBuNHPuJCPAWJ
 ZVqFkA0jrKcDbF+pnt9nkWEC1vFkO0kaZAY8hYtUiXkA8ohoC+4rIap09sgGzHiM
 ElN9VZbg8dJOMfSrnFkIO8t8SQzlnLo3lNQsKF8HbtB2UsIODALYasuj61KXlt1s
 xo4bKgKVncRqA0neMdOjVlu4U8qs3f2f8FDFMdX3lAlq/Pcgo/wJCUuhl3y3cr4+
 9FRTaWxbtg7zCfvlV4JmgF9KjNdq+w6CU6ZbtRSDuBhtCx5dpMNGqD9igY6xtV4P
 GN3zWp59z0D40kIU+bLd
 =3YTQ
 -----END PGP SIGNATURE-----

Merge tag 'v3.10.101' into HEAD

This is the 3.10.101 stable release
2017-04-18 17:17:55 +02:00
Luca Stefani 20f9b57392 This is the 3.10.100 stable release
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJW4LKLAAoJEDjbvchgkmk+JcMQAMC0Af2prpT0MAwztjQ2J+2w
 Cb0bcubMFFI08ZaCEcONnICN8D8p3Afs3rLjJXU05dofVhGXm09VDWKfGsiInfZz
 Y3ac6Orqb+sAb+frTycvopdLo4izkTFbUGzdj5VIl1C+d4KLqJ7mJ3t5P2cT1ldD
 T8cNsI2QOrD/iiakSK72RpzHosmRf3men5BKy728cnTX+c14/MH1lpz6J4E4Mrw7
 axO2ELpLTdZMhpTGO6kDYBXjwTDJX6PvryCoANMWVBkqtLG8wWcw252I8w3Eaa6P
 AtsXEXR+zUxFMnKg2WTRsgXXdrVWmztNx2e1Jo4JzoZEpLIWOXlUSyuQFgH7JtGr
 x6te30SZkX7E9O6PM/qeIu2fsQnMuV5cWozlPi1Cpk9rCEhn14J4c34SF8/dcl4j
 1o84Y2kV+Z2EOQdOItrc6TLHQ3P8x/FR3fBHRljRTPj2+J9CT4wwv+gCg4jLDx4a
 YfYsReNN3ubG9leISTIAD8JY3cuSLdrHw7r4Sa9/BnFXJ+dg+oIkL8gCplD0KOeT
 /H8eSxl1erAIsZh5GRWErk7K9RjGQjQ0+dGa1poCLmHSta6U4ajNV4DRevDnbdxn
 qWGh7SuBBf2t1SyMucv07e/CpBXr0Msi3dITkx3hqd8jLl17XJ+JtfmLOPP4SJBr
 rY374DXia5zqsJa65cdm
 =41yO
 -----END PGP SIGNATURE-----

Merge tag 'v3.10.100' into HEAD

This is the 3.10.100 stable release
2017-04-18 17:17:52 +02:00
Luca Stefani 062311b2df This is the 3.10.99 stable release
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJW2MPMAAoJEDjbvchgkmk+xlkP/3pLYC8OxOPz11sBFOWDB6Jn
 +/La3kW252TMcY7K8Z6R2UJC93HxXaCySAZTrLrwUL6mqpSStiHhX/1HQMI6If4c
 jMtsbgWpU+HZzprPzY8IK6rdrZJKz+Nxu3LMuV0pYTAFKLnCa4d9bSYZ52UArVnC
 w13KGpk/gnWTO7A6ZNx4dcRpMqYHWcG+eJsT9zdExmyk65qBCxhhUxXh+DijmSn7
 QXrFJ4zjWr1kIdsk6Moat/HCTt/zvwMiWuHdnqYIzUSmvWZWbaQsqGw0cFvKM2hL
 pOJ3zf3fgUY6fsV0vG+SFdrMmL6RtL/v0c2EGM5ZlYCIPbUZcK+XMlaEqOe6UAHz
 hITIE+r03l2zqagWVb/2HOen8liHIxnfqUPYgHd6vmXz2qWXg9sWTsOhr3ZAQQLA
 tf0JDjmx/KCyBmiA7ZyhRLeyhx0jD/csxxo14YME8N3tJCyw5gEIOgXlOLNxhWRu
 uCqSN27FDnnf6ppbX1euMeWxzqi4DCZFMDJQT743V5sJIz10BsVR9HJS6mwyUioN
 ia4qVc99JfSEsXuawlZhC44Ht+Z/tTSxQPcZjWMHvftGVfxS9AZVf85BM5zNa91t
 52mtJivT25N7JxHE41iEQA9t4V1shCjGmEUKD4cVMKgC18cpXD/awDlJ1Or1YuAO
 ro6ElZeHj+O3YETFp31/
 =GlVi
 -----END PGP SIGNATURE-----

Merge tag 'v3.10.99' into HEAD

This is the 3.10.99 stable release

Change-Id: I8113e58a5519664be2acc502462633d6d2f9ebf5
2017-04-18 17:17:46 +02:00
Luca Stefani f0bb324d50 This is the 3.10.98 stable release
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJWz1zgAAoJEDjbvchgkmk+yU8P/10DITNzrhCfz5wbhvvn9Uvo
 7H1DziOora3u9h8/rz6xqgFEz2/9cZ03KoLcpGha7kEFBsvgVhN3uSI0YFpVV2mT
 8/oh1ADdkky3Pld0f7gDGydDvrmgqx83/69SQ8hDQ8Mr2QTaKNvK05QGC2/EO9kI
 OcUAXjdAGglmf5rfhNhXodG/F2DtsA55uCzeyuBhcPE3bM7d4/48pwr1b2tW2CR8
 hsprRvSz+kGgHXQy8jYdxKEI66OC/i22xVnxEc8PZmPZ0fFfmszzc9nzhcseWfpe
 0JGgfwAtM8Va+bX4kfvqPpc2qR0r8Z2iEKNnAHnGutOvSWvow0l1OEedsb/+s1J6
 /AYlPIkgTxwLDAwBIymPgowkEMOPVZzPL0tkoZI8wjB+eqUxxLlIa2dNByCyUs/U
 1xTy+0UDMMDXG911mJl+yZFvd4R7lQUavIEStmMQ+A/Go2KrATaqIM8WETBlm7oH
 s3hZ3E+RBWmfD/6JQwsJNkwv6yWeaRXNE+bj8C1r/uBdPyGqX9T22OaIOlio+I71
 XBNEM5mrTlNeNVIUIKW29qmLBxBrH2LLwpv/dRyfOfzfhi1B+dl9+3sJauvrSmWi
 jrR1khGmmaZcfOT2DVmpwlDQCQcyMcy8S8RTTAHhhuNmWtSjdc3TcfRlHXvP0sOu
 ruXBufxernb94E7sqsvF
 =LW9r
 -----END PGP SIGNATURE-----

Merge tag 'v3.10.98' into HEAD

This is the 3.10.98 stable release
2017-04-18 17:17:24 +02:00
Luca Stefani e516c31e99 This is the 3.10.96 stable release
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJWqv2IAAoJEDjbvchgkmk+180QAKqYrypT3cyClNOHGRFRaxID
 Sxo8S9tr8apxaIeP/nfZH3fYXyoadKBwxet15PNYwGVex3jBIVO0M0kspNPu9guG
 ogM0hf558EiWpdN5kydwCyN2ukJkhPP9r1ZQ5T84UcqflIboLDYXksqW1w8JX7wm
 dumt8kbbnN42e9S1bXD79CRaBB+dkNBTg0fdfpCi7pOQvUQD9DAs/j6XM1ZkOouX
 P+/vnIWbRwzbVqlJSaWNfBotlNsydosazJD9lg8iFIRDpVGJPKYbDMP2MPpyrmyA
 mesNRIy0wD9cixXW6jMS3fkSOY27N5hZIYYVPWQ8vfCcooTej4GHw37C7Inlh8z6
 iWf/sy1Hu+vniJKAr0BD86ocZxnaMv//BQtwCJZv3TfuQ93QkaRmEznEnCHYGN4M
 thoaS7oYGfrJnsHKkh913Kr3K7QuvyFttOE058PloYzJbCPV+YVRa/UGyuR6qOCl
 SbuSMXDdUDcf/Wznr6S6p6T2GIfM8GYvfm7hzIYwHpClCQpDR3lRdonDAg82mdMh
 YCNbEZQ32+l8idBX/YG97MskMD869237yh4MLUUWoxLTbevAblkYSt81WuDO4Gya
 PcWcB+zH4t2Y25W9yVoTKmaJSJPhT4ngNFSy7V8zKgVG2Vmz4YIuLRhd6N2/fGcd
 FVSXw7uHZhrn+SEl+L6W
 =tiwo
 -----END PGP SIGNATURE-----

Merge tag 'v3.10.96' into HEAD

This is the 3.10.96 stable release

Change-Id: I428f544d161be44e66e56e2d6900700e798cdd0a
2017-04-18 17:16:02 +02:00
Luca Stefani 7492d477e6 This is the 3.10.94 stable release
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJWaHZOAAoJEDjbvchgkmk+my0P/jsD1AGpZ5MFfMAktU9icsRe
 vW4jp81LEsjkhNQiQ7jbSztUh5tLPnBjXH7qyK9UggzRnhuucHLqn488GTSTrg0n
 1kXAhnwOdZeFSw/eTInDZ9jEWD3f9w+ZKJeissK0Q0gBNocBgsd4iidjyxlOXTMd
 7P19EH0ys4YL42KfvRm/3t+cT09/2VGk+Y0rDqU0xpQM7ZhRPdmdewRMtk6L0YiX
 +Ijypv+3C5Zv1JE4Y7+V7EG535jhYHnlnXnsnpSMmUHHusAGnPHl6rwk2o/7zwJZ
 /aDRCqTqAnncGS3+e/5/0AIvLbdqKbbnp0P8vXOFZ8RDZU3z7TxlO29j1R9tjceT
 80kQDAwWN/1UXUsjny0W+1ozTf5KkH6VibgBbZRrbcZ9NkiurMayXeMbfysCy3An
 OAyHxVv2gwQ/QYJp54UBAmT5fyKq6AhO9C7l+Iedt9/yuT17E69+QTkMaUc9BCZ7
 OG5Sf1H2umcRraWSc//bxyjzXkLm4kgginNuNSCa6xkVJBW+EzAssovJqV53w62U
 aZu/gRtbVP88m6Ay+QhwHkeyO1yJiVO+0ssyo4gj+tBsKnf66u/c3i9QExn3fepk
 3nr61ooKgm4wdBVOArFwLxY2PjkpGG+hR9L/sHvxTf6ULTauWD5kwdsv0LP2mJhw
 wKgrcBhfo+1LGJnJoowF
 =pOAT
 -----END PGP SIGNATURE-----

Merge tag 'v3.10.94' into HEAD

This is the 3.10.94 stable release
2017-04-18 17:12:56 +02:00
Luca Stefani 82b37d9f2f Merge remote-tracking branch 'f2fs/linux-3.10.y' into HEAD
Change-Id: Ic2fe24529f029909ddd96490bd6d885d60f88be2
2017-04-18 17:02:28 +02:00
LuK1337 fc9499e55a Import latest Samsung release
* Package version: T713XXU2BQCO

Change-Id: I293d9e7f2df458c512d59b7a06f8ca6add610c99
2017-04-18 03:43:52 +02:00
Andi Kleen 750fc132a8 perf/x86/intel: Fix PEBS data source interpretation on Nehalem/Westmere
commit e17dc65328057c00db7e1bfea249c8771a78b30b upstream.

Jiri reported some time ago that some entries in the PEBS data source table
in perf do not agree with the SDM. We investigated and the bits
changed for Sandy Bridge, but the SDM was not updated.

perf already implements the bits correctly for Sandy Bridge
and later. This patch patches it up for Nehalem and Westmere.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: jolsa@kernel.org
Link: http://lkml.kernel.org/r/1456871124-15985-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-06-07 10:42:48 +02:00
Andy Lutomirski 721485b62f x86/iopl: Fix iopl capability check on Xen PV
commit c29016cf41fe9fa994a5ecca607cf5f1cd98801e upstream.

iopl(3) is supposed to work if iopl is already 3, even if
unprivileged.  This didn't work right on Xen PV.  Fix it.

Reviewewd-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/8ce12013e6e4c0a44a97e316be4a6faff31bd5ea.1458162709.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-06-07 10:42:44 +02:00
H. Peter Anvin a47831b0d8 x86, processor-flags: Fix the datatypes and add bit number defines
commit d1fbefcb3aa608599a3c9e4582cbeeb6ba6c8939 upstream.

The control registers are unsigned long (32 bits on i386, 64 bits on
x86-64), and so make that manifest in the data type for the various
constants.  Add defines with a _BIT suffix which defines the bit
number, as opposed to the bit mask.

This should resolve some issues with ~bitmask that Linus discovered.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/n/tip-cwckhbrib2aux1qbteaebij0@git.kernel.org
[wt: backported to 3.10 only to keep next patch clean]

Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-06-07 10:42:44 +02:00
H. Peter Anvin f85cb76155 x86: Rename X86_CR4_RDWRGSFS to X86_CR4_FSGSBASE
commit afcbf13fa6d53d8a97eafaca1dcb344331d2ce0c upstream.

Rename X86_CR4_RDWRGSFS to X86_CR4_FSGSBASE to match the SDM.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Link: http://lkml.kernel.org/n/tip-buq1evi5dpykxx7ak6amaam0@git.kernel.org
[wt: backported to 3.10 only to keep next patch clean]

Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-06-07 10:42:44 +02:00
Radim Krčmář a6821c1796 KVM: i8254: change PIT discard tick policy
commit 7dd0fdff145c5be7146d0ac06732ae3613412ac1 upstream.

Discard policy uses ack_notifiers to prevent injection of PIT interrupts
before EOI from the last one.

This patch changes the policy to always try to deliver the interrupt,
which makes a difference when its vector is in ISR.
Old implementation would drop the interrupt, but proposed one injects to
IRR, like real hardware would.

The old policy breaks legacy NMI watchdogs, where PIT is used through
virtual wire (LVT0): PIT never sends an interrupt before receiving EOI,
thus a guest deadlock with disabled interrupts will stop NMIs.

Note that NMI doesn't do EOI, so PIT also had to send a normal interrupt
through IOAPIC.  (KVM's PIT is deeply rotten and luckily not used much
in modern systems.)

Even though there is a chance of regressions, I think we can fix the
LVT0 NMI bug without introducing a new tick policy.

Cc: <stable@vger.kernel.org>
Reported-by: Yuki Shibuya <shibuya.yk@ncos.nec.co.jp>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-06-07 10:42:44 +02:00
Kamal Mostafa c9950bcb91 x86/iopl/64: Properly context-switch IOPL on Xen PV
commit b7a584598aea7ca73140cb87b40319944dd3393f upstream.

From: Andy Lutomirski <luto@kernel.org>

On Xen PV, regs->flags doesn't reliably reflect IOPL and the
exit-to-userspace code doesn't change IOPL.  We need to context
switch it manually.

I'm doing this without going through paravirt because this is
specific to Xen PV.  After the dust settles, we can merge this with
the 32-bit code, tidy up the iopl syscall implementation, and remove
the set_iopl pvop entirely.

Fixes XSA-171.

Reviewewd-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/693c3bd7aeb4d3c27c92c622b7d0f554a458173c.1458162709.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[ kamal: backport to 3.19-stable: no X86_FEATURE_XENPV so just call
  xen_pv_domain() directly ]
Acked-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-06-07 10:42:43 +02:00
dcashman f5968659d5 BACKPORT: FROMLIST: mm: ASLR: use get_random_long()
(cherry picked from commit https://lkml.org/lkml/2016/2/4/833)

Replace calls to get_random_int() followed by a cast to (unsigned long)
with calls to get_random_long().  Also address shifting bug which, in case
of x86 removed entropy mask for mmap_rnd_bits values > 31 bits.

Bug: 26963541
Signed-off-by: Daniel Cashman <dcashman@android.com>
Signed-off-by: Daniel Cashman <dcashman@google.com>
Change-Id: Ie577b21a0678cf4b21eae06bddd8ccb27cbe70ff
2016-05-18 14:36:13 +05:30
dcashman e16becd41a FROMLIST: x86: mm: support ARCH_MMAP_RND_BITS.
(cherry picked from commit https://lkml.org/lkml/2015/12/21/339)

x86: arch_mmap_rnd() uses hard-coded values, 8 for 32-bit and 28 for
64-bit, to generate the random offset for the mmap base address.
This value represents a compromise between increased ASLR
effectiveness and avoiding address-space fragmentation. Replace it
with a Kconfig option, which is sensibly bounded, so that platform
developers may choose where to place this compromise. Keep default
values as new minimums.

Bug: 24047224
Signed-off-by: Daniel Cashman <dcashman@android.com>
Signed-off-by: Daniel Cashman <dcashman@google.com>
Change-Id: I4f0cb7159406166fbc9f8938ba8d3e24b72eeefb
2016-05-18 14:36:01 +05:30
Marcelo Tosatti 73c1d981f0 KVM: x86: move steal time initialization to vcpu entry time
commit 7cae2bedcbd4680b155999655e49c27b9cf020fa upstream.

As reported at https://bugs.launchpad.net/qemu/+bug/1494350,
it is possible to have vcpu->arch.st.last_steal initialized
from a thread other than vcpu thread, say the iothread, via
KVM_SET_MSRS.

Which can cause an overflow later (when subtracting from vcpu threads
sched_info.run_delay).

To avoid that, move steal time accumulation to vcpu entry time,
before copying steal time data to guest.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-16 08:41:36 -07:00
Radim Krčmář 618fedd40d KVM: VMX: disable PEBS before a guest entry
commit 7099e2e1f4d9051f31bbfa5803adf954bb5d76ef upstream.

Linux guests on Haswell (and also SandyBridge and Broadwell, at least)
would crash if you decided to run a host command that uses PEBS, like
  perf record -e 'cpu/mem-stores/pp' -a

This happens because KVM is using VMX MSR switching to disable PEBS, but
SDM [2015-12] 18.4.4.4 Re-configuring PEBS Facilities explains why it
isn't safe:
  When software needs to reconfigure PEBS facilities, it should allow a
  quiescent period between stopping the prior event counting and setting
  up a new PEBS event. The quiescent period is to allow any latent
  residual PEBS records to complete its capture at their previously
  specified buffer address (provided by IA32_DS_AREA).

There might not be a quiescent period after the MSR switch, so a CPU
ends up using host's MSR_IA32_DS_AREA to access an area in guest's
memory.  (Or MSR switching is just buggy on some models.)

The guest can learn something about the host this way:
If the guest doesn't map address pointed by MSR_IA32_DS_AREA, it results
in #PF where we leak host's MSR_IA32_DS_AREA through CR2.

After that, a malicious guest can map and configure memory where
MSR_IA32_DS_AREA is pointing and can therefore get an output from
host's tracing.

This is not a critical leak as the host must initiate with PEBS tracing
and I have not been able to get a record from more than one instruction
before vmentry in vmx_vcpu_run() (that place has most registers already
overwritten with guest's).

We could disable PEBS just few instructions before vmentry, but
disabling it earlier shouldn't affect host tracing too much.
We also don't need to switch MSR_IA32_PEBS_ENABLE on VMENTRY, but that
optimization isn't worth its code, IMO.

(If you are implementing PEBS for guests, be sure to handle the case
 where both host and guest enable PEBS, because this patch doesn't.)

Fixes: 26a4f3c08d ("perf/x86: disable PEBS on a guest entry.")
Reported-by: Jiří Olša <jolsa@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-16 08:41:35 -07:00
Todd E Brandt 137d2a237d PM / sleep / x86: Fix crash on graph trace through x86 suspend
commit 92f9e179a702a6adbc11e2fedc76ecd6ffc9e3f7 upstream.

Pause/unpause graph tracing around do_suspend_lowlevel as it has
inconsistent call/return info after it jumps to the wakeup vector.
The graph trace buffer will otherwise become misaligned and
may eventually crash and hang on suspend.

To reproduce the issue and test the fix:
Run a function_graph trace over suspend/resume and set the graph
function to suspend_devices_and_enter. This consistently hangs the
system without this fix.

Signed-off-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-09 15:31:54 -08:00
Andy Lutomirski 4e24fd5466 x86/entry/compat: Add missing CLAC to entry_INT80_32
commit 3d44d51bd339766f0178f0cf2e8d048b4a4872aa upstream.

This doesn't seem to fix a regression -- I don't think the CLAC was
ever there.

I double-checked in a debugger: entries through the int80 gate do
not automatically clear AC.

Stable maintainers: I can provide a backport to 4.3 and earlier if
needed.  This needs to be backported all the way to 3.10.

Reported-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 63bcff2a30 ("x86, smap: Add STAC and CLAC instructions to control user space access")
Link: http://lkml.kernel.org/r/b02b7e71ae54074be01fc171cbd4b72517055c0e.1456345086.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[ kamal: backport to 3.10 through 3.19-stable: file rename; context ]
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-09 15:31:54 -08:00
Ingo Molnar 9ee0d9ad93 efi: Disable interrupts around EFI calls, not in the epilog/prolog calls
commit 23a0d4e8fa6d3a1d7fb819f79bcc0a3739c30ba9 upstream.

Tapasweni Pathak reported that we do a kmalloc() in efi_call_phys_prolog()
on x86-64 while having interrupts disabled, which is a big no-no, as
kmalloc() can sleep.

Solve this by removing the irq disabling from the prolog/epilog calls
around EFI calls: it's unnecessary, as in this stage we are single
threaded in the boot thread, and we don't ever execute this from
interrupt contexts.

Reported-by: Tapasweni Pathak <tapaswenipathak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
[ luis: backported to 3.10: adjusted context ]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-03 15:06:19 -08:00
Andy Lutomirski 4800af9122 x86/asm/irq: Stop relying on magic JMP behavior for early_idt_handlers
commit 425be5679fd292a3c36cb1fe423086708a99f11a upstream.

The early_idt_handlers asm code generates an array of entry
points spaced nine bytes apart.  It's not really clear from that
code or from the places that reference it what's going on, and
the code only works in the first place because GAS never
generates two-byte JMP instructions when jumping to global
labels.

Clean up the code to generate the correct array stride (member size)
explicitly. This should be considerably more robust against
screw-ups, as GAS will warn if a .fill directive has a negative
count.  Using '. =' to advance would have been even more robust
(it would generate an actual error if it tried to move
backwards), but it would pad with nulls, confusing anyone who
tries to disassemble the code.  The new scheme should be much
clearer to future readers.

While we're at it, improve the comments and rename the array and
common code.

Binutils may start relaxing jumps to non-weak labels.  If so,
this change will fix our build, and we may need to backport this
change.

Before, on x86_64:

  0000000000000000 <early_idt_handlers>:
     0:   6a 00                   pushq  $0x0
     2:   6a 00                   pushq  $0x0
     4:   e9 00 00 00 00          jmpq   9 <early_idt_handlers+0x9>
                          5: R_X86_64_PC32        early_idt_handler-0x4
  ...
    48:   66 90                   xchg   %ax,%ax
    4a:   6a 08                   pushq  $0x8
    4c:   e9 00 00 00 00          jmpq   51 <early_idt_handlers+0x51>
                          4d: R_X86_64_PC32       early_idt_handler-0x4
  ...
   117:   6a 00                   pushq  $0x0
   119:   6a 1f                   pushq  $0x1f
   11b:   e9 00 00 00 00          jmpq   120 <early_idt_handler>
                          11c: R_X86_64_PC32      early_idt_handler-0x4

After:

  0000000000000000 <early_idt_handler_array>:
     0:   6a 00                   pushq  $0x0
     2:   6a 00                   pushq  $0x0
     4:   e9 14 01 00 00          jmpq   11d <early_idt_handler_common>
  ...
    48:   6a 08                   pushq  $0x8
    4a:   e9 d1 00 00 00          jmpq   120 <early_idt_handler_common>
    4f:   cc                      int3
    50:   cc                      int3
  ...
   117:   6a 00                   pushq  $0x0
   119:   6a 1f                   pushq  $0x1f
   11b:   eb 03                   jmp    120 <early_idt_handler_common>
   11d:   cc                      int3
   11e:   cc                      int3
   11f:   cc                      int3

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Binutils <binutils@sourceware.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/ac027962af343b0c599cbfcf50b945ad2ef3d7a8.1432336324.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-25 11:57:49 -08:00
H.J. Lu 6ec8f1c4d6 x86/boot: Double BOOT_HEAP_SIZE to 64KB
commit 8c31902cffc4d716450be549c66a67a8a3dd479c upstream.

When decompressing kernel image during x86 bootup, malloc memory
for ELF program headers may run out of heap space, which leads
to system halt.  This patch doubles BOOT_HEAP_SIZE to 64KB.

Tested with 32-bit kernel which failed to boot without this patch.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-28 21:49:28 -08:00
Mario Kleiner a919f20b06 x86/reboot/quirks: Add iMac10,1 to pci_reboot_dmi_table[]
commit 2f0c0b2d96b1205efb14347009748d786c2d9ba5 upstream.

Without the reboot=pci method, the iMac 10,1 simply
hangs after printing "Restarting system" at the point
when it should reboot. This fixes it.

Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1450466646-26663-1-git-send-email-mario.kleiner.de@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-28 21:49:28 -08:00
Ouyang Zhaowei (Charles) c55958f9a8 x86/xen: don't reset vcpu_info on a cancelled suspend
commit 6a1f513776b78c994045287073e55bae44ed9f8c upstream.

On a cancelled suspend the vcpu_info location does not change (it's
still in the per-cpu area registered by xen_vcpu_setup()).  So do not
call xen_hvm_init_shared_info() which would make the kernel think its
back in the shared info.  With the wrong vcpu_info, events cannot be
received and the domain will hang after a cancelled suspend.

Signed-off-by: Charles Ouyang <ouyangzhaowei@huawei.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-28 21:49:28 -08:00
Dmitry V. Levin c511958e1c x86/signal: Fix restart_syscall number for x32 tasks
commit 22eab1108781eff09961ae7001704f7bd8fb1dce upstream.

When restarting a syscall with regs->ax == -ERESTART_RESTARTBLOCK,
regs->ax is assigned to a restart_syscall number.  For x32 tasks, this
syscall number must have __X32_SYSCALL_BIT set, otherwise it will be
an x86_64 syscall number instead of a valid x32 syscall number. This
issue has been there since the introduction of x32.

Reported-by: strace/tests/restart_syscall.test
Reported-and-tested-by: Elvira Khabirova <lineprinter0@gmail.com>
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Cc: Elvira Khabirova <lineprinter0@gmail.com>
Link: http://lkml.kernel.org/r/20151130215436.GA25996@altlinux.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-28 21:49:28 -08:00
Andrew Cooper 8fa88fa850 x86/cpu: Fix SMAP check in PVOPS environments
commit 581b7f158fe0383b492acd1ce3fb4e99d4e57808 upstream.

There appears to be no formal statement of what pv_irq_ops.save_fl() is
supposed to return precisely.  Native returns the full flags, while lguest and
Xen only return the Interrupt Flag, and both have comments by the
implementations stating that only the Interrupt Flag is looked at.  This may
have been true when initially implemented, but no longer is.

To make matters worse, the Xen PVOP leaves the upper bits undefined, making
the BUG_ON() undefined behaviour.  Experimentally, this now trips for 32bit PV
guests on Broadwell hardware.  The BUG_ON() is consistent for an individual
build, but not consistent for all builds.  It has also been a sitting timebomb
since SMAP support was introduced.

Use native_save_fl() instead, which will obtain an accurate view of the AC
flag.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Tested-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: <lguest@lists.ozlabs.org>
Cc: Xen-devel <xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/1433323874-6927-1-git-send-email-andrew.cooper3@citrix.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-09 13:40:08 -05:00
Borislav Petkov 8f14777b58 x86/cpu: Call verify_cpu() after having entered long mode too
commit 04633df0c43d710e5f696b06539c100898678235 upstream.

When we get loaded by a 64-bit bootloader, kernel entry point is
startup_64 in head_64.S. We don't trust any and all bootloaders because
some will fiddle with CPU configuration so we go ahead and massage each
CPU into sanity again.

For example, some dell BIOSes have this XD disable feature which set
IA32_MISC_ENABLE[34] and disable NX. This might be some dumb workaround
for other OSes but Linux sure doesn't need it.

A similar thing is present in the Surface 3 firmware - see
https://bugzilla.kernel.org/show_bug.cgi?id=106051 - which sets this bit
only on the BSP:

  # rdmsr -a 0x1a0
  400850089
  850089
  850089
  850089

I know, right?!

There's not even an off switch in there.

So fix all those cases by sanitizing the 64-bit entry point too. For
that, make verify_cpu() callable in 64-bit mode also.

Requested-and-debugged-by: "H. Peter Anvin" <hpa@zytor.com>
Reported-and-tested-by: Bastien Nocera <bugzilla@hadess.net>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1446739076-21303-1-git-send-email-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-09 13:40:08 -05:00
Krzysztof Mazur 308b1b0433 x86/setup: Fix low identity map for >= 2GB kernel range
commit 68accac392d859d24adcf1be3a90e41f978bd54c upstream.

The commit f5f3497cad8c extended the low identity mapping. However, if
the kernel uses more than 2 GB (VMSPLIT_2G_OPT or VMSPLIT_1G memory
split), the normal memory mapping is overwritten by the low identity
mapping causing a crash. To avoid overwritting, limit the low identity
map to cover only memory before kernel range (PAGE_OFFSET).

Fixes: f5f3497cad8c "x86/setup: Extend low identity map to cover whole kernel range
Signed-off-by: Krzysztof Mazur <krzysiek@podlesie.net>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: http://lkml.kernel.org/r/1446815916-22105-1-git-send-email-krzysiek@podlesie.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-09 13:40:08 -05:00
Paolo Bonzini fa4fbf7138 x86/setup: Extend low identity map to cover whole kernel range
commit f5f3497cad8c8416a74b9aaceb127908755d020a upstream.

On 32-bit systems, the initial_page_table is reused by
efi_call_phys_prolog as an identity map to call
SetVirtualAddressMap.  efi_call_phys_prolog takes care of
converting the current CPU's GDT to a physical address too.

For PAE kernels the identity mapping is achieved by aliasing the
first PDPE for the kernel memory mapping into the first PDPE
of initial_page_table.  This makes the EFI stub's trick "just work".

However, for non-PAE kernels there is no guarantee that the identity
mapping in the initial_page_table extends as far as the GDT; in this
case, accesses to the GDT will cause a page fault (which quickly becomes
a triple fault).  Fix this by copying the kernel mappings from
swapper_pg_dir to initial_page_table twice, both at PAGE_OFFSET and at
identity mapping.

For some reason, this is only reproducible with QEMU's dynamic translation
mode, and not for example with KVM.  However, even under KVM one can clearly
see that the page table is bogus:

    $ qemu-system-i386 -pflash OVMF.fd -M q35 vmlinuz0 -s -S -daemonize
    $ gdb
    (gdb) target remote localhost:1234
    (gdb) hb *0x02858f6f
    Hardware assisted breakpoint 1 at 0x2858f6f
    (gdb) c
    Continuing.

    Breakpoint 1, 0x02858f6f in ?? ()
    (gdb) monitor info registers
    ...
    GDT=     0724e000 000000ff
    IDT=     fffbb000 000007ff
    CR0=0005003b CR2=ff896000 CR3=032b7000 CR4=00000690
    ...

The page directory is sane:

    (gdb) x/4wx 0x32b7000
    0x32b7000:	0x03398063	0x03399063	0x0339a063	0x0339b063
    (gdb) x/4wx 0x3398000
    0x3398000:	0x00000163	0x00001163	0x00002163	0x00003163
    (gdb) x/4wx 0x3399000
    0x3399000:	0x00400003	0x00401003	0x00402003	0x00403003

but our particular page directory entry is empty:

    (gdb) x/1wx 0x32b7000 + (0x724e000 >> 22) * 4
    0x32b7070:	0x00000000

[ It appears that you can skate past this issue if you don't receive
  any interrupts while the bogus GDT pointer is loaded, or if you avoid
  reloading the segment registers in general.

  Andy Lutomirski provides some additional insight:

   "AFAICT it's entirely permissible for the GDTR and/or LDT
    descriptor to point to unmapped memory.  Any attempt to use them
    (segment loads, interrupts, IRET, etc) will try to access that memory
    as if the access came from CPL 0 and, if the access fails, will
    generate a valid page fault with CR2 pointing into the GDT or
    LDT."

  Up until commit 23a0d4e8fa6d ("efi: Disable interrupts around EFI
  calls, not in the epilog/prolog calls") interrupts were disabled
  around the prolog and epilog calls, and the functional GDT was
  re-installed before interrupts were re-enabled.

  Which explains why no one has hit this issue until now. ]

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reported-by: Laszlo Ersek <lersek@redhat.com>
Cc: <stable@vger.kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
[ Updated changelog. ]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-09 13:40:08 -05:00
Kees Cook b205d6f7b1 x86, kaslr: randomize module base load address
Randomize the load address of modules in the kernel to make kASLR
effective for modules.  Modules can only be loaded within a particular
range of virtual address space.  This patch adds 10 bits of entropy to
the load address by adding 1-1024 * PAGE_SIZE to the beginning range
where modules are loaded.

The single base offset was chosen because randomizing each module
load ends up wasting/fragmenting memory too much. Prior approaches to
minimizing fragmentation while doing randomization tend to result in
worse entropy than just doing a single base address offset.

Example kASLR boot without this change, with a single module loaded:
---[ Modules ]---
0xffffffffc0000000-0xffffffffc0001000           4K     ro     GLB x  pte
0xffffffffc0001000-0xffffffffc0002000           4K     ro     GLB NX pte
0xffffffffc0002000-0xffffffffc0004000           8K     RW     GLB NX pte
0xffffffffc0004000-0xffffffffc0200000        2032K                   pte
0xffffffffc0200000-0xffffffffff000000        1006M                   pmd
 ---[ End Modules ]---

Example kASLR boot after this change, same module loaded:
---[ Modules ]---
0xffffffffc0000000-0xffffffffc0200000           2M                   pmd
0xffffffffc0200000-0xffffffffc03bf000        1788K                   pte
0xffffffffc03bf000-0xffffffffc03c0000           4K     ro     GLB x  pte
0xffffffffc03c0000-0xffffffffc03c1000           4K     ro     GLB NX pte
0xffffffffc03c1000-0xffffffffc03c3000           8K     RW     GLB NX pte
0xffffffffc03c3000-0xffffffffc0400000         244K                   pte
0xffffffffc0400000-0xffffffffff000000        1004M                   pmd
---[ End Modules ]---

Signed-off-by: Andy Honig <ahonig@google.com>
Link: http://lkml.kernel.org/r/20140226005916.GA27083@www.outflux.net
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Git-commit: e2b32e6785138d92d2a40e0d0473575c8c7310a2
[clingulta@codeaurora.org: resolve trivial merge conflicts, caused by
 commit 'a37ad2682' for arch=um]
Change-Id: I44dadf731c8becd301d57ee38f869fe646b383ed
Signed-off-by: Lingutla Chandrasekhar <clingutla@codeaurora.org>
2015-12-08 23:47:16 -08:00
Greg Kroah-Hartman 9754f73ba0 xen: fix backport of previous kexec patch
Fixes the backport of 0b34a166f291d255755be46e43ed5497cdd194f2 upstream

Commit 0b34a166f291d255755be46e43ed5497cdd194f2 "x86/xen: Support
kexec/kdump in HVM guests by doing a soft reset" has been added to the
4.2-stable tree" needed to correct the CONFIG variable, as
CONFIG_KEXEC_CORE only showed up in 4.3.

Reported-by: David Vrabel <david.vrabel@citrix.com>
Reported-by: Luis Henriques <luis.henriques@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-11-09 10:12:59 -08:00
Andi Kleen f7c7bb9f06 x86: Add 1/2/4/8 byte optimization to 64bit __copy_{from,to}_user_inatomic
commit ff47ab4ff3cddfa7bc1b25b990e24abe2ae474ff upstream.

The 64bit __copy_{from,to}_user_inatomic always called
copy_from_user_generic, but skipped the special optimizations for 1/2/4/8
byte accesses.

This especially hurts the futex call, which accesses the 4 byte futex
user value with a complicated fast string operation in a function call,
instead of a single movl.

Use __copy_{from,to}_user for _inatomic instead to get the same
optimizations. The only problem was the might_fault() in those functions.
So move that to new wrapper and call __copy_{f,t}_user_nocheck()
from *_inatomic directly.

32bit already did this correctly by duplicating the code.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1376687844-19857-2-git-send-email-andi@firstfloor.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22 14:37:53 -07:00
Vitaly Kuznetsov 919845cb33 x86/xen: Support kexec/kdump in HVM guests by doing a soft reset
commit 0b34a166f291d255755be46e43ed5497cdd194f2 upstream.

Currently there is a number of issues preventing PVHVM Xen guests from
doing successful kexec/kdump:

  - Bound event channels.
  - Registered vcpu_info.
  - PIRQ/emuirq mappings.
  - shared_info frame after XENMAPSPACE_shared_info operation.
  - Active grant mappings.

Basically, newly booted kernel stumbles upon already set up Xen
interfaces and there is no way to reestablish them. In Xen-4.7 a new
feature called 'soft reset' is coming. A guest performing kexec/kdump
operation is supposed to call SCHEDOP_shutdown hypercall with
SHUTDOWN_soft_reset reason before jumping to new kernel. Hypervisor
(with some help from toolstack) will do full domain cleanup (but
keeping its memory and vCPU contexts intact) returning the guest to
the state it had when it was first booted and thus allowing it to
start over.

Doing SHUTDOWN_soft_reset on Xen hypervisors which don't support it is
probably OK as by default all unknown shutdown reasons cause domain
destroy with a message in toolstack log: 'Unknown shutdown reason code
5. Destroying domain.'  which gives a clue to what the problem is and
eliminates false expectations.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22 14:37:50 -07:00
Stephen Smalley 4f9d53595c x86/mm: Set NX on gap between __ex_table and rodata
commit ab76f7b4ab2397ffdd2f1eb07c55697d19991d10 upstream.

Unused space between the end of __ex_table and the start of
rodata can be left W+x in the kernel page tables.  Extend the
setting of the NX bit to cover this gap by starting from
text_end rather than rodata_start.

  Before:
  ---[ High Kernel Mapping ]---
  0xffffffff80000000-0xffffffff81000000          16M                               pmd
  0xffffffff81000000-0xffffffff81600000           6M     ro         PSE     GLB x  pmd
  0xffffffff81600000-0xffffffff81754000        1360K     ro                 GLB x  pte
  0xffffffff81754000-0xffffffff81800000         688K     RW                 GLB x  pte
  0xffffffff81800000-0xffffffff81a00000           2M     ro         PSE     GLB NX pmd
  0xffffffff81a00000-0xffffffff81b3b000        1260K     ro                 GLB NX pte
  0xffffffff81b3b000-0xffffffff82000000        4884K     RW                 GLB NX pte
  0xffffffff82000000-0xffffffff82200000           2M     RW         PSE     GLB NX pmd
  0xffffffff82200000-0xffffffffa0000000         478M                               pmd

  After:
  ---[ High Kernel Mapping ]---
  0xffffffff80000000-0xffffffff81000000          16M                               pmd
  0xffffffff81000000-0xffffffff81600000           6M     ro         PSE     GLB x  pmd
  0xffffffff81600000-0xffffffff81754000        1360K     ro                 GLB x  pte
  0xffffffff81754000-0xffffffff81800000         688K     RW                 GLB NX pte
  0xffffffff81800000-0xffffffff81a00000           2M     ro         PSE     GLB NX pmd
  0xffffffff81a00000-0xffffffff81b3b000        1260K     ro                 GLB NX pte
  0xffffffff81b3b000-0xffffffff82000000        4884K     RW                 GLB NX pte
  0xffffffff82000000-0xffffffff82200000           2M     RW         PSE     GLB NX pmd
  0xffffffff82200000-0xffffffffa0000000         478M                               pmd

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1443704662-3138-1-git-send-email-sds@tycho.nsa.gov
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22 14:37:49 -07:00
Dirk Müller fb7eff9cfd Use WARN_ON_ONCE for missing X86_FEATURE_NRIPS
commit d2922422c48df93f3edff7d872ee4f3191fefb08 upstream.

The cpu feature flags are not ever going to change, so warning
everytime can cause a lot of kernel log spam
(in our case more than 10GB/hour).

The warning seems to only occur when nested virtualization is
enabled, so it's probably triggered by a KVM bug.  This is a
sensible and safe change anyway, and the KVM bug fix might not
be suitable for stable releases anyway.

Signed-off-by: Dirk Mueller <dmueller@suse.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22 14:37:49 -07:00
David Woodhouse 9870892fc2 x86/platform: Fix Geode LX timekeeping in the generic x86 build
commit 03da3ff1cfcd7774c8780d2547ba0d995f7dc03d upstream.

In 2007, commit 07190a08ee ("Mark TSC on GeodeLX reliable")
bypassed verification of the TSC on Geode LX. However, this code
(now in the check_system_tsc_reliable() function in
arch/x86/kernel/tsc.c) was only present if CONFIG_MGEODE_LX was
set.

OpenWRT has recently started building its generic Geode target
for Geode GX, not LX, to include support for additional
platforms. This broke the timekeeping on LX-based devices,
because the TSC wasn't marked as reliable:
https://dev.openwrt.org/ticket/20531

By adding a runtime check on is_geode_lx(), we can also include
the fix if CONFIG_MGEODEGX1 or CONFIG_X86_GENERIC are set, thus
fixing the problem.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: Andres Salomon <dilinger@queued.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Marcelo Tosatti <marcelo@kvack.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1442409003.131189.87.camel@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22 14:37:49 -07:00
Shaohua Li ffebdff74f x86/apic: Serialize LVTT and TSC_DEADLINE writes
commit 5d7c631d926b59aa16f3c56eaeb83f1036c81dc7 upstream.

The APIC LVTT register is MMIO mapped but the TSC_DEADLINE register is an
MSR. The write to the TSC_DEADLINE MSR is not serializing, so it's not
guaranteed that the write to LVTT has reached the APIC before the
TSC_DEADLINE MSR is written. In such a case the write to the MSR is
ignored and as a consequence the local timer interrupt never fires.

The SDM decribes this issue for xAPIC and x2APIC modes. The
serialization methods recommended by the SDM differ.

xAPIC:
 "1. Memory-mapped write to LVT Timer Register, setting bits 18:17 to 10b.
  2. WRMSR to the IA32_TSC_DEADLINE MSR a value much larger than current time-stamp counter.
  3. If RDMSR of the IA32_TSC_DEADLINE MSR returns zero, go to step 2.
  4. WRMSR to the IA32_TSC_DEADLINE MSR the desired deadline."

x2APIC:
 "To allow for efficient access to the APIC registers in x2APIC mode,
  the serializing semantics of WRMSR are relaxed when writing to the
  APIC registers. Thus, system software should not use 'WRMSR to APIC
  registers in x2APIC mode' as a serializing instruction. Read and write
  accesses to the APIC registers will occur in program order. A WRMSR to
  an APIC register may complete before all preceding stores are globally
  visible; software can prevent this by inserting a serializing
  instruction, an SFENCE, or an MFENCE before the WRMSR."

The xAPIC method is to just wait for the memory mapped write to hit
the LVTT by checking whether the MSR write has reached the hardware.
There is no reason why a proper MFENCE after the memory mapped write would
not do the same. Andi Kleen confirmed that MFENCE is sufficient for the
xAPIC case as well.

Issue MFENCE before writing to the TSC_DEADLINE MSR. This can be done
unconditionally as all CPUs which have TSC_DEADLINE also have MFENCE
support.

[ tglx: Massaged the changelog ]

Signed-off-by: Shaohua Li <shli@fb.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: <Kernel-team@fb.com>
Cc: <lenb@kernel.org>
Cc: <fenghua.yu@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/r/20150909041352.GA2059853@devbig257.prn2.facebook.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22 14:37:49 -07:00