Commit Graph

20 Commits

Author SHA1 Message Date
David S. Miller f36391d279 sparc64: Fix race in TLB batch processing.
As reported by Dave Kleikamp, when we emit cross calls to do batched
TLB flush processing we have a race because we do not synchronize on
the sibling cpus completing the cross call.

So meanwhile the TLB batch can be reset (tb->tlb_nr set to zero, etc.)
and either flushes are missed or flushes will flush the wrong
addresses.

Fix this by using generic infrastructure to synchonize on the
completion of the cross call.

This first required getting the flush_tlb_pending() call out from
switch_to() which operates with locks held and interrupts disabled.
The problem is that smp_call_function_many() cannot be invoked with
IRQs disabled and this is explicitly checked for with WARN_ON_ONCE().

We get the batch processing outside of locked IRQ disabled sections by
using some ideas from the powerpc port. Namely, we only batch inside
of arch_{enter,leave}_lazy_mmu_mode() calls.  If we're not in such a
region, we flush TLBs synchronously.

1) Get rid of xcall_flush_tlb_pending and per-cpu type
   implementations.

2) Do TLB batch cross calls instead via:

	smp_call_function_many()
		tlb_pending_func()
			__flush_tlb_pending()

3) Batch only in lazy mmu sequences:

	a) Add 'active' member to struct tlb_batch
	b) Define __HAVE_ARCH_ENTER_LAZY_MMU_MODE
	c) Set 'active' in arch_enter_lazy_mmu_mode()
	d) Run batch and clear 'active' in arch_leave_lazy_mmu_mode()
	e) Check 'active' in tlb_batch_add_one() and do a synchronous
           flush if it's clear.

4) Add infrastructure for synchronous TLB page flushes.

	a) Implement __flush_tlb_page and per-cpu variants, patch
	   as needed.
	b) Likewise for xcall_flush_tlb_page.
	c) Implement smp_flush_tlb_page() to invoke the cross-call.
	d) Wire up global_flush_tlb_page() to the right routine based
           upon CONFIG_SMP

5) It turns out that singleton batches are very common, 2 out of every
   3 batch flushes have only a single entry in them.

   The batch flush waiting is very expensive, both because of the poll
   on sibling cpu completeion, as well as because passing the tlb batch
   pointer to the sibling cpus invokes a shared memory dereference.

   Therefore, in flush_tlb_pending(), if there is only one entry in
   the batch perform a completely asynchronous global_flush_tlb_page()
   instead.

Reported-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2013-04-19 17:26:26 -04:00
David S. Miller 89a77915e0 sparc64: Fix get_user_pages_fast() wrt. THP.
Mostly mirrors the s390 logic, as unlike x86 we don't need the
SetPageReferenced() bits.

On sparc64 we also lack a user/privileged bit in the huge PMDs.

In order to make this work for THP and non-THP builds, some header
file adjustments were necessary.  Namely, provide the PMD_HUGE_* bit
defines and the pmd_large() inline unconditionally rather than
protected by TRANSPARENT_HUGEPAGE.

Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-13 12:22:14 -08:00
David S. Miller 4a9d1946b0 sparc64: Define pte_accessible()
We can elide flush_tlb_*() calls when _PAGE_VALID is clear
as that is the test used to determine whether or not to
queue up a TLB flush in set_pte_at().

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-18 16:06:16 -08:00
David Miller 9e695d2ecc sparc64: Support transparent huge pages.
This is relatively easy since PMD's now cover exactly 4MB of memory.

Our PMD entries are 32-bits each, so we use a special encoding.  The
lowest bit, PMD_ISHUGE, determines the interpretation.  This is possible
because sparc64's page tables are purely software entities so we can use
whatever encoding scheme we want.  We just have to make the TLB miss
assembler page table walkers aware of the layout.

set_pmd_at() works much like set_pte_at() but it has to operate in two
page from a table of non-huge PTEs, so we have to queue up TLB flushes
based upon what mappings are valid in the PTE table.  In the second regime
we are going from huge-page to non-huge-page, and in that case we need
only queue up a single TLB flush to push out the huge page mapping.

We still have 5 bits remaining in the huge PMD encoding so we can very
likely support any new pieces of THP state tracking that might get added
in the future.

With lots of help from Johannes Weiner.

Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:23:06 +09:00
David Miller dbc9fdf063 sparc64: Document PGD and PMD layout.
We're going to be messing around with the PMD interpretation and layout
for the sake of transparent huge pages, so we better clearly document what
we're starting with.

Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:23:05 +09:00
David Miller 56a70b8c6a sparc64: Halve the size of PTE tables
The reason we want to do this is to facilitate transparent huge page
support.

Right now PMD's cover 8MB of address space, and our huge page size is 4MB.
 The current transparent hugepage support is not able to handle HPAGE_SIZE
!= PMD_SIZE.

So make PTE tables be sized to half of a page instead of a full page.

We can still map properly the whole supported virtual address range which
on sparc64 requires 44 bits.  Add a compile time CPP test which ensures
that this requirement is always met.

There is a minor inefficiency added by this change.  We only use half of
the page for PTE tables.  It's not trivial to use only half of the page
yet still get all of the pgtable_page_{ctor,dtor}() stuff working
properly.  It is doable, and that will come in a subsequent change.

Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:23:04 +09:00
David Miller 15b9350a17 sparc64: Only support 4MB huge pages and 8KB base pages.
Narrowing the scope of the page size configurations will make the
transparent hugepage changes much simpler.

In the end what we really want to do is have the kernel support multiple
huge page sizes and use whatever is appropriate as the context dictactes.

Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:23:04 +09:00
David S. Miller 679bea5e43 sparc: Kill mmu_{un,}lockarea().
These were used on sun4c during floppy data transfers since on that
chip we had to lock the cpu mappings into the TLB because we cannot
take a TLB miss during the assembler floppy interrupt handler that
does the data transfer.

That is no longer necessary since we've removed sun4c support, thus
this stuff can disappear completely.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-13 13:23:16 -07:00
Aaro Koskinen 2533e82415 sparc: pgtable_64: change include order
Fix the following build breakage in v3.4-rc1:

  CC      arch/sparc/kernel/cpu.o
In file included from /home/aaro/git/linux/arch/sparc/include/asm/pgtable_64.h:15:0,
                 from /home/aaro/git/linux/arch/sparc/include/asm/pgtable.h:4,
                 from arch/sparc/kernel/cpu.c:15:
include/asm-generic/pgtable-nopud.h:13:16: error: unknown type name 'pgd_t'
include/asm-generic/pgtable-nopud.h:25:28: error: unknown type name 'pgd_t'
include/asm-generic/pgtable-nopud.h:26:27: error: unknown type name 'pgd_t'
include/asm-generic/pgtable-nopud.h:27:31: error: unknown type name 'pgd_t'
include/asm-generic/pgtable-nopud.h:28:30: error: unknown type name 'pgd_t'
include/asm-generic/pgtable-nopud.h:38:34: error: unknown type name 'pgd_t'
In file included from /home/aaro/git/linux/arch/sparc/include/asm/pgtable_64.h:783:0,
                 from /home/aaro/git/linux/arch/sparc/include/asm/pgtable.h:4,
                 from arch/sparc/kernel/cpu.c:15:
include/asm-generic/pgtable.h: In function 'pgd_none_or_clear_bad':
include/asm-generic/pgtable.h:258:2: error: implicit declaration of function 'pgd_none' [-Werror=implicit-function-declaration]
include/asm-generic/pgtable.h:260:2: error: implicit declaration of function 'pgd_bad' [-Werror=implicit-function-declaration]
include/asm-generic/pgtable.h: In function 'pud_none_or_clear_bad':
include/asm-generic/pgtable.h:269:6: error: request for member 'pgd' in something not a structure or union

Reported-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-01 14:22:23 -07:00
David Howells d550bbd40c Disintegrate asm/system.h for Sparc
Disintegrate asm/system.h for Sparc.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: sparclinux@vger.kernel.org
2012-03-28 18:30:03 +01:00
David S. Miller 3e37fd3153 sparc: Kill custom io_remap_pfn_range().
To handle the large physical addresses, just make a simple wrapper
around remap_pfn_range() like MIPS does.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-17 18:17:59 -08:00
David S. Miller 683d2fa672 sparc64: add support for _PAGE_SPECIAL
Luckily there are still a few software PTE bits remaining and they even
match up in both the sun4u and sun4v pte layouts.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-25 20:57:10 -07:00
Peter Zijlstra 90f08e399d sparc: mmu_gather rework
Rework the sparc mmu_gather usage to conform to the new world order :-)

Sparc mmu_gather does two things:
 - tracks vaddrs to unhash
 - tracks pages to free

Split these two things like powerpc has done and keep the vaddrs
in per-cpu data structures and flush them on context switch.

The remaining bits can then use the generic mmu_gather.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: David Miller <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Tony Luck <tony.luck@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25 08:39:13 -07:00
Sam Ravnborg cb1b820981 sparc: consolidate show_cpuinfo in cpu.c
We have all the cpu related info in cpu.c - so move
the remaining functions to support /proc/cpuinfo to this file.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-21 15:45:45 -07:00
Peter Zijlstra ece0e2b640 mm: remove pte_*map_nested()
Since we no longer need to provide KM_type, the whole pte_*map_nested()
API is now redundant, remove it.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Miller <davem@davemloft.net>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26 16:52:08 -07:00
Russell King 4b3073e1c5 MM: Pass a PTE pointer to update_mmu_cache() rather than the PTE itself
On VIVT ARM, when we have multiple shared mappings of the same file
in the same MM, we need to ensure that we have coherency across all
copies.  We do this via make_coherent() by making the pages
uncacheable.

This used to work fine, until we allowed highmem with highpte - we
now have a page table which is mapped as required, and is not available
for modification via update_mmu_cache().

Ralf Beache suggested getting rid of the PTE value passed to
update_mmu_cache():

  On MIPS update_mmu_cache() calls __update_tlb() which walks pagetables
  to construct a pointer to the pte again.  Passing a pte_t * is much
  more elegant.  Maybe we might even replace the pte argument with the
  pte_t?

Ben Herrenschmidt would also like the pte pointer for PowerPC:

  Passing the ptep in there is exactly what I want.  I want that
  -instead- of the PTE value, because I have issue on some ppc cases,
  for I$/D$ coherency, where set_pte_at() may decide to mask out the
  _PAGE_EXEC.

So, pass in the mapped page table pointer into update_mmu_cache(), and
remove the PTE value, updating all implementations and call sites to
suit.

Includes a fix from Stephen Rothwell:

  sparc: fix fallout from update_mmu_cache API change

  Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>

Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2010-02-20 16:41:46 +00:00
David S. Miller 1b6b9d6247 sparc64: Increase vmalloc size to fix percpu regressions.
Since we now use the embedding percpu allocator we have to make the
vmalloc area at least as large as the stretch can be between nodes.

Besides some minor asm adjustments, this turned out to be pretty
trivial.

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-28 14:39:58 -07:00
David S. Miller d8ed1d43e1 sparc64: Validate linear D-TLB misses.
When page alloc debugging is not enabled, we essentially accept any
virtual address for linear kernel TLB misses.  But with kgdb, kernel
address probing, and other facilities we can try to access arbitrary
crap.

So, make sure the address we miss on will translate to physical memory
that actually exists.

In order to make this work we have to embed the valid address bitmap
into the kernel image.  And in order to make that less expensive we
make an adjustment, in that the max physical memory address is
decreased to "1 << 41", even on the chips that support a 42-bit
physical address space.  We can do this because bit 41 indicates
"I/O space" and thus covers non-memory ranges.

The result of this is that:

1) kpte_linear_bitmap shrinks from 2K to 1K in size

2) we need 64K more for the valid address bitmap

We can't let the valid address bitmap be dynamically allocated
once we start using it to validate TLB misses, otherwise we have
crazy issues to deal with wrt. recursive TLB misses and such.

If we're in a TLB miss it could be the deepest trap level that's legal
inside of the cpu.  So if we TLB miss referencing the bitmap, the cpu
will be out of trap levels and enter RED state.

To guard against out-of-range accesses to the bitmap, we have to check
to make sure no bits in the physical address above bit 40 are set.  We
could export and use last_valid_pfn for this check, but that's just an
unnecessary extra memory reference.

On the plus side of all this, since we load all of these translations
into the special 4MB mapping TSB, and we check the TSB first for TLB
misses, there should be absolutely no real cost for these new checks
in the TLB miss path.

Reported-by: heyongli@gmail.com
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-25 16:47:46 -07:00
David S. Miller b539c46766 sparc64: Fix sparse warnings in fault.c
1) set_brkpt() is referenced by nothing and hasn't been used by anyone
   to my knowledge for many many years.  So just delete it.

2) add extern decl for do_sparc64_fault() in asm/pgtable_64.h

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-12 00:10:32 -07:00
Sam Ravnborg a439fe51a1 sparc, sparc64: use arch/sparc/include
The majority of this patch was created by the following script:

***
ASM=arch/sparc/include/asm
mkdir -p $ASM
git mv include/asm-sparc64/ftrace.h $ASM
git rm include/asm-sparc64/*
git mv include/asm-sparc/* $ASM
sed -ie 's/asm-sparc64/asm/g' $ASM/*
sed -ie 's/asm-sparc/asm/g' $ASM/*
***

The rest was an update of the top-level Makefile to use sparc
for header files when sparc64 is being build.
And a small fixlet to pick up the correct unistd.h from
sparc64 code.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2008-07-27 23:00:59 +02:00
Renamed from include/asm-sparc/pgtable_64.h (Browse further)