android_kernel_samsung_msm8976/kernel
Rik van Riel d303cf4624 mm: fix TLB flush race between migration, and change_protection_range
commit 20841405940e7be0617612d521e206e4b6b325db upstream.

There are a few subtle races, between change_protection_range (used by
mprotect and change_prot_numa) on one side, and NUMA page migration and
compaction on the other side.

The basic race is that there is a time window between when the PTE gets
made non-present (PROT_NONE or NUMA), and the TLB is flushed.

During that time, a CPU may continue writing to the page.

This is fine most of the time, however compaction or the NUMA migration
code may come in, and migrate the page away.

When that happens, the CPU may continue writing, through the cached
translation, to what is no longer the current memory location of the
process.

This only affects x86, which has a somewhat optimistic pte_accessible.
All other architectures appear to be safe, and will either always flush,
or flush whenever there is a valid mapping, even with no permissions
(SPARC).

The basic race looks like this:

CPU A			CPU B			CPU C

						load TLB entry
make entry PTE/PMD_NUMA
			fault on entry
						read/write old page
			start migrating page
			change PTE/PMD to new page
						read/write old page [*]
flush TLB
						reload TLB from new entry
						read/write new page
						lose data

[*] the old page may belong to a new user at this point!

The obvious fix is to flush remote TLB entries, by making sure that
pte_accessible aware of the fact that PROT_NONE and PROT_NUMA memory may
still be accessible if there is a TLB flush pending for the mm.

This should fix both NUMA migration and compaction.

[mgorman@suse.de: fix build]
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Alex Thorlton <athorlton@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-09 12:24:23 -08:00
..
cpu sched, idle: Fix the idle polling state logic 2013-11-29 11:11:42 -08:00
debug kgdb/sysrq: fix inconstistent help message of sysrq key 2013-04-30 17:04:10 -07:00
events perf: Fix perf ring buffer memory ordering 2013-11-20 12:27:47 -08:00
gcov kernel/gcov: remove depends on CONFIG_EXPERIMENTAL 2013-01-11 11:39:33 -08:00
irq irq: Enable all irqs unconditionally in irq_resume 2013-12-11 22:36:27 -08:00
power PM / hibernate: Avoid overflow in hibernate_preallocate_memory() 2013-12-04 10:56:58 -08:00
sched sched: fix the theoretical signal_wake_up() vs schedule() race 2014-01-09 12:24:23 -08:00
time time: Fix 1ns/tick drift w/ GENERIC_TIME_VSYSCALL_OLD 2013-12-11 22:36:27 -08:00
trace ftrace: Initialize the ftrace profiler for each possible cpu 2014-01-09 12:24:20 -08:00
.gitignore kernel/hz.bc: ignore. 2013-04-22 07:09:06 -07:00
acct.c fs: Fix hang with BSD accounting on frozen filesystem 2013-05-04 14:57:58 -04:00
async.c async: rename and redefine async_func_ptr 2013-03-12 13:59:14 -07:00
audit.c audit: log the audit_names record type 2013-12-04 10:57:03 -08:00
audit.h audit: fix mq_open and mq_unlink to add the MQ root as a hidden parent audit_names record 2013-12-04 10:57:03 -08:00
audit_tree.c kernel/audit_tree.c:audit_add_tree_rule(): protect `rule' from kill_rules() 2013-06-12 16:29:46 -07:00
audit_watch.c audit: catch possible NULL audit buffers 2013-01-11 14:54:55 -08:00
auditfilter.c auditfilter.c: fix kernel-doc warnings 2013-05-24 16:22:52 -07:00
auditsc.c audit: fix mq_open and mq_unlink to add the MQ root as a hidden parent audit_names record 2013-12-04 10:57:03 -08:00
backtracetest.c
bounds.c
capability.c Add file_ns_capable() helper function for open-time capability checking 2013-04-14 10:06:31 -07:00
cgroup.c cgroup: use a dedicated workqueue for cgroup destruction 2013-12-04 10:57:20 -08:00
cgroup_freezer.c cgroup: rename ->create/post_create/pre_destroy/destroy() to ->css_alloc/online/offline/free() 2012-11-19 08:13:38 -08:00
compat.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal 2013-05-01 07:21:43 -07:00
configs.c proc: Supply PDE attribute setting accessor functions 2013-05-01 17:29:18 -04:00
context_tracking.c Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-06-20 08:18:35 -10:00
cpu.c CPU hotplug: provide a generic helper to disable/enable CPU hotplug 2013-06-12 16:29:44 -07:00
cpu_pm.c
cpuset.c cpuset: Fix memory allocator deadlock 2013-12-04 10:57:16 -08:00
crash_dump.c
cred.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2012-12-18 10:55:28 -08:00
delayacct.c cputime: Use accessors to read task cputime stats 2013-01-27 19:23:31 +01:00
dma.c
elfcore.c
exec_domain.c
exit.c move exit_task_namespaces() outside of exit_notify() 2013-06-15 05:39:08 +04:00
extable.c extable: Flip the sorting message 2013-04-15 13:25:16 +02:00
fork.c mm: fix TLB flush race between migration, and change_protection_range 2014-01-09 12:24:23 -08:00
freezer.c libata, freezer: avoid block device removal while system is frozen 2014-01-09 12:24:23 -08:00
futex.c futex: fix handling of read-only-mapped hugepages 2013-12-20 07:45:08 -08:00
futex_compat.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal 2013-02-23 18:50:11 -08:00
groups.c
hrtimer.c hrtimers: Move SMP function call to thread context 2013-07-28 16:30:22 -07:00
hung_task.c
irq_work.c Merge branch 'nohz/printk-v8' into irq/core 2013-02-05 00:48:46 +01:00
itimer.c
jump_label.c
kallsyms.c kernel: kallsyms: memory override issue, need check destination buffer length 2013-04-15 15:17:26 +09:30
kcmp.c kcmp: include linux/ptrace.h 2012-12-20 17:40:19 -08:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kexec.c PCI: Disable Bus Master only on kexec reboot 2013-12-20 07:45:08 -08:00
kmod.c usermodehelper: check subprocess_info->path != NULL 2013-05-16 12:01:11 -07:00
kprobes.c kprobes: Fix to free gone and unused optprobes 2013-05-28 10:37:59 +02:00
ksysfs.c Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-12-11 18:10:49 -08:00
kthread.c kthread: implement probe_kthread_data() 2013-04-30 17:04:02 -07:00
latencytop.c
lglock.c
lockdep.c Merge branch 'for-3.10/drivers' of git://git.kernel.dk/linux-block 2013-05-08 11:51:05 -07:00
lockdep_internals.h
lockdep_proc.c
lockdep_states.h
Makefile We get rid of the general module prefix confusion with a binary config option, 2013-05-05 10:58:06 -07:00
modsign_certificate.S CONFIG_SYMBOL_PREFIX: cleanup. 2013-03-15 15:09:43 +10:30
modsign_pubkey.c keys: use keyring_alloc() to create module signing keyring 2012-12-20 17:40:21 -08:00
module-internal.h
module.c module: do percpu allocation after uniqueness check. No, really! 2013-07-13 11:42:26 -07:00
module_signing.c MODSIGN: Don't use enum-type bitfields in module signature info block 2012-12-05 11:27:24 +10:30
mutex-debug.c
mutex-debug.h
mutex.c mutex: Back out architecture specific check for negative mutex count 2013-04-19 09:33:36 +02:00
mutex.h
notifier.c
nsproxy.c proc: Split the namespace stuff out into linux/proc_ns.h 2013-05-01 17:29:39 -04:00
padata.c padata: use __this_cpu_read per-cpu helper 2012-12-06 17:16:23 +08:00
panic.c dump_stack: implement arch-specific hardware description in task dumps 2013-04-30 17:04:02 -07:00
params.c params: Fix potential memory leak in add_sysfs_param() 2013-03-18 11:40:21 +00:00
pid.c pidns: Fix hang in zap_pid_ns_processes by sending a potentially extra wakeup 2013-09-26 17:18:27 -07:00
pid_namespace.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-05-01 17:51:54 -07:00
posix-cpu-timers.c posix_timers: Fix pre-condition to stop the tick on full dynticks 2013-04-22 19:59:25 +02:00
posix-timers.c posix-timers: Remove unused variable 2013-04-18 12:51:19 +02:00
printk.c printk: Fix rq->lock vs logbuf_lock unlock lock inversion 2013-07-25 14:07:31 -07:00
profile.c proc: Supply PDE attribute setting accessor functions 2013-05-01 17:29:18 -04:00
ptrace.c exec/ptrace: fix get_dumpable() incorrect tests 2013-11-29 11:11:44 -08:00
range.c range: Do not add new blank slot with add_range_with_merge 2013-06-18 11:32:10 -05:00
rcu.h rcu: Provide RCU CPU stall warnings for tiny RCU 2013-01-28 22:06:21 -08:00
rcupdate.c Merge branches 'doctorture.2013.01.29a', 'fixes.2013.01.26a', 'tagcb.2013.01.24a' and 'tiny.2013.01.29b' into HEAD 2013-01-28 22:25:21 -08:00
rcutiny.c Merge branches 'doctorture.2013.01.29a', 'fixes.2013.01.26a', 'tagcb.2013.01.24a' and 'tiny.2013.01.29b' into HEAD 2013-01-28 22:25:21 -08:00
rcutiny_plugin.h rcu: Provide RCU CPU stall warnings for tiny RCU 2013-01-28 22:06:21 -08:00
rcutorture.c rcu: Allow rcutorture to be built at low optimization levels 2013-02-04 12:18:20 -08:00
rcutree.c rcu: Fix deadlock with CPU hotplug, RCU GP init, and timer migration 2013-06-10 13:37:12 -07:00
rcutree.h rcu: Don't call wakeup() with rcu_node structure ->lock held 2013-06-10 13:37:11 -07:00
rcutree_plugin.h rcu: Don't allocate bootmem from rcu_init() 2013-05-15 10:41:12 -07:00
rcutree_trace.c rcutrace: single_open() leaks 2013-05-05 00:16:35 -04:00
relay.c Merge branch 'for-3.10/core' of git://git.kernel.dk/linux-block 2013-05-08 10:13:35 -07:00
res_counter.c res_counter: return amount of charges after res_counter_uncharge() 2012-12-18 15:02:12 -08:00
resource.c mem hotunplug: fix kfree() of bootmem memory 2013-04-29 15:54:40 -07:00
rtmutex-debug.c sched/rt: Move rt specific bits into new header file 2013-02-07 20:51:08 +01:00
rtmutex-debug.h
rtmutex-tester.c locking/rtmutex/tester: Set correct permissions on sysfs files 2013-04-10 14:48:37 +02:00
rtmutex.c sched/rt: Move rt specific bits into new header file 2013-02-07 20:51:08 +01:00
rtmutex.h
rtmutex_common.h
rwsem.c Revert "rw_semaphore: remove up/down_read_non_owner" 2013-03-23 15:53:52 -07:00
seccomp.c seccomp: allow BPF_XOR based ALU instructions. 2013-03-26 11:07:19 +11:00
semaphore.c semaphore: use `bool' type for semaphore_waiter's up 2013-04-30 17:04:08 -07:00
signal.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2013-05-01 14:08:52 -07:00
smp.c kernel/smp.c: cleanups 2013-04-30 17:04:03 -07:00
smpboot.c kthread: Prevent unpark race which puts threads on the wrong cpu 2013-04-12 14:18:43 +02:00
smpboot.h
softirq.c irq: Force hardirq exit's softirq processing on its own stack 2013-10-13 16:08:34 -07:00
spinlock.c
srcu.c srcu: use ACCESS_ONCE() to access sp->completed in srcu_read_lock() 2013-02-07 15:19:36 -08:00
stacktrace.c
stop_machine.c stop_machine: Mark per cpu stopper enabled early 2013-02-26 22:25:17 +01:00
sys.c reboot: rigrate shutdown/reboot to boot cpu 2013-06-12 16:29:44 -07:00
sys_ni.c unify compat fanotify_mark(2), switch to COMPAT_SYSCALL_DEFINE 2013-05-09 13:46:38 -04:00
sysctl.c mm: replace hardcoded 3% with admin_reserve_pages knob 2013-04-29 15:54:36 -07:00
sysctl_binary.c switch compat_sys_sysctl to COMPAT_SYSCALL_DEFINE 2013-05-09 14:53:20 -04:00
task_work.c
taskstats.c
test_kprobes.c kernel/: rename random32() to prandom_u32() 2013-04-29 18:28:42 -07:00
time.c timekeeping: Use inject_offset in warp_clock 2013-03-15 16:50:20 -07:00
timeconst.bc kernel: Replace timeconst.pl with a bc script 2013-02-16 23:17:25 +01:00
timer.c timer: Fix jiffies wrap behavior of round_jiffies_common() 2013-07-21 18:21:31 -07:00
tracepoint.c Tracing updates for Linux 3.10 2013-04-29 13:55:38 -07:00
tsacct.c cputime: Use accessors to read task cputime stats 2013-01-27 19:23:31 +01:00
uid16.c make SYSCALL_DEFINE<n>-generated wrappers do asmlinkage_protect 2013-03-03 22:58:33 -05:00
up.c
user-return-notifier.c hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
user.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-05-01 17:51:54 -07:00
user_namespace.c userns: limit the maximum depth of user_namespace->parent chain 2013-08-11 18:35:25 -07:00
utsname.c proc: Split the namespace stuff out into linux/proc_ns.h 2013-05-01 17:29:39 -04:00
utsname_sysctl.c kernel/utsname_sysctl.c: put get/get_uts() into CONFIG_PROC_SYSCTL code block 2013-02-27 19:10:22 -08:00
wait.c propagate name change to comments in kernel source 2012-12-06 10:39:54 +01:00
watchdog.c watchdog: Add comments to explain the watchdog_disabled variable 2013-03-14 08:24:05 +01:00
workqueue.c workqueue: fix ordered workqueues in NUMA setups 2013-12-04 10:57:16 -08:00
workqueue_internal.h workqueue: include workqueue info when printing debug dump of a worker task 2013-04-30 17:04:02 -07:00