android_kernel_samsung_msm8976/arch
Rik van Riel d303cf4624 mm: fix TLB flush race between migration, and change_protection_range
commit 20841405940e7be0617612d521e206e4b6b325db upstream.

There are a few subtle races, between change_protection_range (used by
mprotect and change_prot_numa) on one side, and NUMA page migration and
compaction on the other side.

The basic race is that there is a time window between when the PTE gets
made non-present (PROT_NONE or NUMA), and the TLB is flushed.

During that time, a CPU may continue writing to the page.

This is fine most of the time, however compaction or the NUMA migration
code may come in, and migrate the page away.

When that happens, the CPU may continue writing, through the cached
translation, to what is no longer the current memory location of the
process.

This only affects x86, which has a somewhat optimistic pte_accessible.
All other architectures appear to be safe, and will either always flush,
or flush whenever there is a valid mapping, even with no permissions
(SPARC).

The basic race looks like this:

CPU A			CPU B			CPU C

						load TLB entry
make entry PTE/PMD_NUMA
			fault on entry
						read/write old page
			start migrating page
			change PTE/PMD to new page
						read/write old page [*]
flush TLB
						reload TLB from new entry
						read/write new page
						lose data

[*] the old page may belong to a new user at this point!

The obvious fix is to flush remote TLB entries, by making sure that
pte_accessible aware of the fact that PROT_NONE and PROT_NUMA memory may
still be accessible if there is a TLB flush pending for the mm.

This should fix both NUMA migration and compaction.

[mgorman@suse.de: fix build]
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Alex Thorlton <athorlton@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-09 12:24:23 -08:00
..
alpha
arc ARC: Incorrect mm reference used in vmalloc fault handler 2013-11-13 12:05:32 +09:00
arm ARM: OMAP2+: hwmod_data: fix missing OMAP_INTC_START in irq data 2014-01-09 12:24:20 -08:00
arm64 arm64: spinlock: retry trylock operation if strex fails on free lock 2014-01-09 12:24:20 -08:00
avr32 avr32: fix out-of-range jump in large kernels 2013-12-04 10:57:05 -08:00
blackfin
c6x
cris cris: media platform drivers: fix build 2013-11-29 11:11:53 -08:00
frv
h8300
hexagon
ia64 exec/ptrace: fix get_dumpable() incorrect tests 2013-11-29 11:11:44 -08:00
m32r
m68k m68k/atari: ARAnyM - Fix NatFeat module support 2013-08-20 08:43:05 -07:00
metag
microblaze microblaze: fix clone syscall 2013-08-20 08:43:02 -07:00
mips MIPS: DMA: For BMIPS5000 cores flush region just like non-coherent R10000 2013-12-20 07:45:06 -08:00
mn10300
openrisc
parisc parisc: fix mmap(MAP_FIXED|MAP_SHARED) to already mmapped address 2013-12-11 22:36:27 -08:00
powerpc powerpc: Align p_end 2014-01-09 12:24:22 -08:00
s390 crypto: s390 - Fix aes-xts parameter corruption 2013-12-11 22:36:26 -08:00
score
sh Fix TLB gather virtual address range invalidation corner cases 2013-08-20 08:43:05 -07:00
sparc mm: fix TLB flush race between migration, and change_protection_range 2014-01-09 12:24:23 -08:00
tile tile: use a more conservative __my_cpu_offset in CONFIG_PREEMPT 2013-10-13 16:08:34 -07:00
um uml: check length in exitcode_proc_write() 2013-11-13 12:05:33 +09:00
unicore32
x86 mm: fix TLB flush race between migration, and change_protection_range 2014-01-09 12:24:23 -08:00
xtensa xtensa: don't use alternate signal stack on threads 2013-11-13 12:05:33 +09:00
.gitignore
Kconfig microblaze: fix clone syscall 2013-08-20 08:43:02 -07:00