Commit graph

368 commits

Author SHA1 Message Date
Gerd Hoffmann
d167a51877 [PATCH] x86_64: x86_64 version of the smp alternative patch.
Changes are largely identical to the i386 version:

 * alternative #define are moved to the new alternative.h file.
 * one new elf section with pointers to the lock prefixes which can be
   nop'ed out for non-smp.
 * two new elf sections simliar to the "classic" alternatives to
   replace SMP code with simpler UP code.
 * fixup headers to use alternative.h instead of defining their own
   LOCK / LOCK_PREFIX macros.

The patch reuses the i386 version of the alternatives code to avoid code
duplication.  The code in alternatives.c was shuffled around a bit to
reduce the number of #ifdefs needed.  It also got some tweaks needed for
x86_64 (vsyscall page handling) and new features (noreplacement option
which was x86_64 only up to now).  Debug printk's are changed from
compile-time to runtime.

Loosely based on a early version from Bastian Blank <waldi@debian.org>

Signed-off-by: Gerd Hoffmann <kraxel@suse.de>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:14 -07:00
Andi Kleen
240cd6a806 [PATCH] i386/x86-64: Emulate CPUID4 on AMD
Intel systems report the cache level data from CPUID 4 in sysfs.
Add a CPUID 4 emulation for AMD CPUs to report the same
information for them. This allows programs to read this
information in a uniform way.

The AMD way to report this is less flexible so some assumptions
are hardcoded (e.g. no L3)

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:14 -07:00
Andi Kleen
79121ea9f0 [PATCH] x86_64: Use __always_inline for __inline_memcpy
Inspired from i386 changes

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:14 -07:00
Paul Mackerras
bfe5d83419 [PATCH] Define __raw_get_cpu_var and use it
There are several instances of per_cpu(foo, raw_smp_processor_id()), which
is semantically equivalent to __get_cpu_var(foo) but without the warning
that smp_processor_id() can give if CONFIG_DEBUG_PREEMPT is enabled.  For
those architectures with optimized per-cpu implementations, namely ia64,
powerpc, s390, sparc64 and x86_64, per_cpu() turns into more and slower
code than __get_cpu_var(), so it would be preferable to use __get_cpu_var
on those platforms.

This defines a __raw_get_cpu_var(x) macro which turns into per_cpu(x,
raw_smp_processor_id()) on architectures that use the generic per-cpu
implementation, and turns into __get_cpu_var(x) on the architectures that
have an optimized per-cpu implementation.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:01 -07:00
Matt Mackall
afedfd016a [PATCH] random: remove SA_SAMPLE_RANDOM from floppy driver
The floppy driver is already calling add_disk_randomness as it should, so this
was redundant.

Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:00 -07:00
Linus Torvalds
37224470c8 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (65 commits)
  ACPI: suppress power button event on S3 resume
  ACPI: resolve merge conflict between sem2mutex and processor_perflib.c
  ACPI: use for_each_possible_cpu() instead of for_each_cpu()
  ACPI: delete newly added debugging macros in processor_perflib.c
  ACPI: UP build fix for bugzilla-5737
  Enable P-state software coordination via _PDC
  P-state software coordination for speedstep-centrino
  P-state software coordination for acpi-cpufreq
  P-state software coordination for ACPI core
  ACPI: create acpi_thermal_resume()
  ACPI: create acpi_fan_suspend()/acpi_fan_resume()
  ACPI: pass pm_message_t from acpi_device_suspend() to root_suspend()
  ACPI: create acpi_device_suspend()/acpi_device_resume()
  ACPI: replace spin_lock_irq with mutex for ec poll mode
  ACPI: Allow a WAN module enable/disable on a Thinkpad X60.
  sem2mutex: acpi, acpi_link_lock
  ACPI: delete unused acpi_bus_drivers_lock
  sem2mutex: drivers/acpi/processor_perflib.c
  ACPI add ia64 exports to build acpi_memhotplug as a module
  ACPI: asus_acpi_init(): propagate correct return value
  ...

Manual resolve of conflicts in:

	arch/i386/kernel/cpu/cpufreq/acpi-cpufreq.c
	arch/i386/kernel/cpu/cpufreq/speedstep-centrino.c
	include/acpi/processor.h
2006-06-23 07:52:36 -07:00
Christoph Lameter
b63d64a324 [PATCH] sys_move_pages: x86_64 support
sys_move_pages support for x86_64

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:42:53 -07:00
Ralf Baechle
d501e62bc7 [PATCH] Delete unused definitions of kvaddr_to_nid
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:42:52 -07:00
Yasunori Goto
762834e8bf [PATCH] Unify pxm_to_node() and node_to_pxm()
Consolidate the various arch-specific implementations of pxm_to_node() and
node_to_pxm() into a single generic version.

Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:42:48 -07:00
Linus Torvalds
6c763eb9ea Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6: (27 commits)
  [PATCH] PCI: nVidia quirk to make AER PCI-E extended capability visible
  [PATCH] PCI: fix issues with extended conf space when MMCONFIG disabled because of e820
  [PATCH] PCI: Bus Parity Status sysfs interface
  [PATCH] PCI: fix memory leak in MMCONFIG error path
  [PATCH] PCI: fix error with pci_get_device() call in the mpc85xx driver
  [PATCH] PCI: MSI-K8T-Neo2-Fir: run only where needed
  [PATCH] PCI: fix race with pci_walk_bus and pci_destroy_dev
  [PATCH] PCI: clean up pci documentation to be more specific
  [PATCH] PCI: remove unneeded msi code
  [PATCH] PCI: don't move ioapics below PCI bridge
  [PATCH] PCI: cleanup unused variable about msi driver
  [PATCH] PCI: disable msi mode in pci_disable_device
  [PATCH] PCI: Allow MSI to work on kexec kernel
  [PATCH] PCI: AMD 8131 MSI quirk called too late, bus_flags not inherited ?
  [PATCH] PCI: Move various PCI IDs to header file
  [PATCH] PCI Bus Parity Status-broken hardware attribute, EDAC foundation
  [PATCH] PCI: i386/x86_84: disable PCI resource decode on device disable
  [PATCH] PCI ACPI: Rename the functions to avoid multiple instances.
  [PATCH] PCI: don't enable device if already enabled
  [PATCH] PCI: Add a "enable" sysfs attribute to the pci devices to allow userspace (Xorg) to enable devices without doing foul direct access
  ...
2006-06-22 15:07:59 -07:00
Bjorn Helgaas
4f1bcaf094 [PATCH] vgacon: make VGA_MAP_MEM take size, remove extra use
VGA_MAP_MEM translates to ioremap() on some architectures.  It makes sense
to do this to vga_vram_base, because we're going to access memory between
vga_vram_base and vga_vram_end.

But it doesn't really make sense to map starting at vga_vram_end, because
we aren't going to access memory starting there.  On ia64, which always has
to be different, ioremapping vga_vram_end gives you something completely
incompatible with ioremapped vga_vram_start, so vga_vram_size ends up being
nonsense.

As a bonus, we often know the size up front, so we can use ioremap()
correctly, rather than giving it a zero size.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-22 15:05:58 -07:00
bibo,mao
b209a6ee49 [PATCH] PCI: cleanup unused variable about msi driver
In IA64 platform, msi driver does not use irq_vector variable, and in
x86 platform LAST_DEVICE_VECTOR should one before FIRST_SYSTEM_VECTOR,
this patch modify this.

Signed-off-by: bibo, mao <bibo.mao@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-06-21 12:00:00 -07:00
Mark Maule
fd58e55fcf [PATCH] PCI: msi abstractions and support for altix
Abstract portions of the MSI core for platforms that do not use standard
APIC interrupt controllers.  This is implemented through a new arch-specific
msi setup routine, and a set of msi ops which can be set on a per platform
basis.

Signed-off-by: Mark Maule <maule@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-06-21 11:59:58 -07:00
Linus Torvalds
cee4cca740 Merge git://git.infradead.org/hdrcleanup-2.6
* git://git.infradead.org/hdrcleanup-2.6: (63 commits)
  [S390] __FD_foo definitions.
  Switch to __s32 types in joystick.h instead of C99 types for consistency.
  Add <sys/types.h> to headers included for userspace in <linux/input.h>
  Move inclusion of <linux/compat.h> out of user scope in asm-x86_64/mtrr.h
  Remove struct fddi_statistics from user view in <linux/if_fddi.h>
  Move user-visible parts of drivers/s390/crypto/z90crypt.h to include/asm-s390
  Revert include/media changes: Mauro says those ioctls are only used in-kernel(!)
  Include <linux/types.h> and use __uXX types in <linux/cramfs_fs.h>
  Use __uXX types in <linux/i2o_dev.h>, include <linux/ioctl.h> too
  Remove private struct dx_hash_info from public view in <linux/ext3_fs.h>
  Include <linux/types.h> and use __uXX types in <linux/affs_hardblocks.h>
  Use __uXX types in <linux/divert.h> for struct divert_blk et al.
  Use __u32 for elf_addr_t in <asm-powerpc/elf.h>, not u32. It's user-visible.
  Remove PPP_FCS from user view in <linux/ppp_defs.h>, remove __P mess entirely
  Use __uXX types in user-visible structures in <linux/nbd.h>
  Don't use 'u32' in user-visible struct ip_conntrack_old_tuple.
  Use __uXX types for S390 DASD volume label definitions which are user-visible
  S390 BIODASDREADCMB ioctl should use __u64 not u64 type.
  Remove unneeded inclusion of <linux/time.h> from <linux/ufs_fs.h>
  Fix private integer types used in V4L2 ioctls.
  ...

Manually resolve conflict in include/linux/mtd/physmap.h
2006-06-20 15:10:08 -07:00
Len Brown
3e8e7c93d7 Pull bugzilla-5653 into release branch 2006-06-15 15:41:53 -04:00
Len Brown
63518472c0 Pull trivial1 into release branch 2006-06-15 15:37:09 -04:00
Andi Kleen
6ae53cd496 [PATCH] x86_64: Fix stack/mmap randomization for compat tasks
ia32_setup_arg_pages would ignore the passed in random stack top
and use its own static value.

Now it uses the 8bit of randomness native i386 would use too.

This indirectly fixes mmap randomization for 32bit processes too,
which depends on the stack randomization.

Should also give slightly better virtual cache colouring and
possibly better performance with HyperThreading.

Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-30 20:31:05 -07:00
David Woodhouse
66643de455 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
Conflicts:

	include/asm-powerpc/unistd.h
	include/asm-sparc/unistd.h
	include/asm-sparc64/unistd.h

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-05-24 09:22:21 +01:00
David Woodhouse
2c23d62abb Move inclusion of <linux/compat.h> out of user scope in asm-x86_64/mtrr.h
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-05-21 22:51:13 +01:00
Andi Kleen
ac71d12c99 [PATCH] x86_64: Avoid EBDA area in early boot allocator
Based on analysis&patch from Robert Hentosch

Observed on a Dell PE6850 with 16GB

The problem occurs very early on, when the kernel allocates space for the
temporary memory map called bootmap. The bootmap overlaps the EBDA region.
EBDA region is not historically reserved in the e820 mapping. When the
bootmap is freed it marks the EBDA region as usable.

If you notice in setup.c there is already code to work around the EBDA
in reserve_ebda_region(), this check however occurs after the bootmap
is allocated and doesn't prevent the bootmap from using this range.

AK: I redid the original patch. Thanks also to Jan Beulich for
spotting some mistakes.

Cc: Robert_Hentosch@dell.com
Cc: jbeulich@novell.com
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-08 09:34:56 -07:00
Kimball Murray
e0c1e9bf81 [PATCH] x86_64: avoid IRQ0 ioapic pin collision
The patch addresses a problem with ACPI SCI interrupt entry, which gets
re-used, and the IRQ is assigned to another unrelated device.  The patch
corrects the code such that SCI IRQ is skipped and duplicate entry is
avoided.  Second issue came up with VIA chipset, the problem was caused by
original patch assigning IRQs starting 16 and up.  The VIA chipset uses
4-bit IRQ register for internal interrupt routing, and therefore cannot
handle IRQ numbers assigned to its devices.  The patch corrects this
problem by allowing PCI IRQs below 16.

Cc: len.brown@intel.com

Signed-off by: Natalie Protasevich <Natalie.Protasevich@unisys.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-08 09:34:56 -07:00
David Woodhouse
5614253686 Remove unneeded _syscallX macros from user view in asm-*/unistd.h
These aren't needed by glibc or klibc, and they're broken in some cases
anyway. The uClibc folks are apparently switching over to stop using
them too (now that we agreed that they should be dropped, at least).

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-04-29 01:51:47 +01:00
David Woodhouse
d6754b401a Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 2006-04-29 01:42:26 +01:00
David Woodhouse
cd469e0cc6 Exclude asm-generic/{page,memory_model}.h from user bits of i386/x86_64 page.h
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-04-27 15:48:08 +01:00
David Woodhouse
62c4f0a2d5 Don't include linux/config.h from anywhere else in include/
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-04-26 12:56:16 +01:00
Jens Axboe
912d35f867 [PATCH] Add support for the sys_vmsplice syscall
sys_splice() moves data to/from pipes with a file input/output. sys_vmsplice()
moves data to a pipe, with the input being a user address range instead.

This uses an approach suggested by Linus, where we can hold partial ranges
inside the pages[] map. Hopefully this will be useful for network
receive support as well.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-26 10:59:21 +02:00
Eric Dumazet
0b699e36b2 [PATCH] x86_64: bring back __read_mostly support to linux-2.6.17-rc2
It seems latest kernel has a wrong/missing __read_mostly implementation
for x86_64

__read_mostly macro should be declared outside of #if CONFIG_X86_VSMP block

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-20 07:58:11 -07:00
Andi Kleen
18bd057b14 [PATCH] i386/x86-64: Fix x87 information leak between processes
AMD K7/K8 CPUs only save/restore the FOP/FIP/FDP x87 registers in FXSAVE
when an exception is pending.  This means the value leak through
context switches and allow processes to observe some x87 instruction
state of other processes.

This was actually documented by AMD, but nobody recognized it as
being different from Intel before.

The fix first adds an optimization: instead of unconditionally
calling FNCLEX after each FXSAVE test if ES is pending and skip
it when not needed. Then do a x87 load from a kernel variable to
clear FOP/FIP/FDP.

This means other processes always will only see a constant value
defined by the kernel in their FP state.

I took some pain to make sure to chose a variable that's already
in L1 during context switch to make the overhead of this low.

Also alternative() is used to patch away the new code on CPUs
who don't need it.

Patch for both i386/x86-64.

The problem was discovered originally by Jan Beulich. Richard
Brunner provided the basic code for the workarounds, with contribution
from Jan.

This is CVE-2006-1056

Cc: richard.brunner@amd.com
Cc: jbeulich@novell.com

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-20 07:58:11 -07:00
KAMEZAWA Hiroyuki
676ff453e5 [PATCH] for_each_possible_cpu: x86_64
for_each_cpu() actually iterates across all possible CPUs.  We've had
mistakes in the past where people were using for_each_cpu() where they
should have been iterating across only online or present CPUs.  This is
inefficient and possibly buggy.

We're renaming for_each_cpu() to for_each_possible_cpu() to avoid this
in the future.

This patch replaces for_each_cpu with for_each_possible_cpu.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-19 09:13:49 -07:00
Andi Kleen
f1233ab2ce [PATCH] x86_64: Add tee and sync_file_range
tee was already there for some reason for native 64bit, but
sys_sync_file_range was missing. Also add it to the compat layer.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-18 10:39:20 -07:00
Andi Kleen
6fa679fdea [PATCH] x86_64: Increase NUMA hash function nodemap
Needed for some big Opteron systems to compute a numa hash function
They have more than 12 bits significant address.

TBD switch this over to dynamic allocation or use better hash

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-18 10:39:19 -07:00
Jens Axboe
70524490ee [PATCH] splice: add support for sys_tee()
Basically an in-kernel implementation of tee, which uses splice and the
pipe buffers as an intelligent way to pass data around by reference.

Where the user space tee consumes the input and produces a stdout and
file output, this syscall merely duplicates the data inside a pipe to
another pipe. No data is copied, the output just grabs a reference to the
input pipe data.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-11 15:51:17 +02:00
mao, bibo
cde227afe6 [PATCH] x86_64: inline function prefix with __always_inline in vsyscall
In vsyscall function do_vgettimeofday(), some functions are declared as
inlined, which is a hint for gcc to compile the function inlined but it
not forced.  Sometimes compiler does not compile the function as
inlined, so here inline is replaced by __always_inline prefix.

It does not happen in gcc compiler actually, but it possibly happens.

Signed-off-by: bibo mao <bibo.mao@intel.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:38:57 -07:00
Siddha, Suresh B
e4cff6ac78 [PATCH] x86_64: fix sync before RDTSC on Intel cpus
Commit c818a18146 didn't do the expected
thing.  This fix will remove the additional sync(cpuid) before RDTSC on
Intel platforms..

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:38:57 -07:00
Yasunori Goto
c80d79d746 [PATCH] Configurable NODES_SHIFT
Current implementations define NODES_SHIFT in include/asm-xxx/numnodes.h for
each arch.  Its definition is sometimes configurable.  Indeed, ia64 defines 5
NODES_SHIFT values in the current git tree.  But it looks a bit messy.

SGI-SN2(ia64) system requires 1024 nodes, and the number of nodes already has
been changeable by config.  Suitable node's number may be changed in the
future even if it is other architecture.  So, I wrote configurable node's
number.

This patch set defines just default value for each arch which needs multi
nodes except ia64.  But, it is easy to change to configurable if necessary.

On ia64 the number of nodes can be already configured in generic ia64 and SN2
config.  But, NODES_SHIFT is defined for DIG64 and HP'S machine too.  So, I
changed it so that all platforms can be configured via CONFIG_NODES_SHIFT.  It
would be simpler.

See also: http://marc.theaimsgroup.com/?l=linux-kernel&m=114358010523896&w=2

Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Jack Steiner <steiner@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11 06:18:39 -07:00
Andi Kleen
67d53ea5a3 [PATCH] x86_64: Eliminate IA32_NR_syscalls define
Or rather compute it based on the table length automatically.

This also has the intended side effect of not warning for new system calls
anymore.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-09 11:53:53 -07:00
Sam Ravnborg
bbd3aff89d [PATCH] x86_64: fix CONFIG_REORDER
Fix CONFIG_REORDER.

The value of cflags-y was assined to CFLAGS before cflags-y was assigned
the value used for CONFIG_REORDER.

Use cflags-y for all CFLAGS options in the Makefile to avoid this
happening again.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-09 11:53:53 -07:00
Jordan Hargrave
b20367a6c2 [PATCH] x86_64: Fix drift with HPET timer enabled
If the HPET timer is enabled, the clock can drift by ~3 seconds a day.
This is due to the HPET timer not being initialized with the correct
setting (still using PIT count).

If HZ changes, this drift can become even more pronounced.

HPET patch initializes tick_nsec with correct tick_nsec settings for
HPET timer.

Vojtech comments:

  "It's not entirely correct (it assumes the HPET ticks totally
   exactly), but it's significantly better than assuming the PIT error
   there."

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-09 11:53:53 -07:00
Andi Kleen
553f265fe8 [PATCH] x86_64: Don't run NMI watchdog during machine checks
Machine checks can stall the machine for a long time and
it's not good to trigger the nmi watchdog during that.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-09 11:53:52 -07:00
Dave Hansen
be56db6186 [PATCH] x86_64: extra NODES_SHIFT definition
The generic linux/numa.h file defines NODES_SHIFT to 0 in case
the architecture did not.

Every architecture which has a NUMA config option defines
NODES_SHIFT in its asm-$ARCH headers, but only if NUMA is
enabled, except for x86_64.

This should make it like all the rest.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-09 11:53:51 -07:00
Arjan van de Ven
952223683e [PATCH] x86_64: Introduce e820_all_mapped
Introduce a e820_all_mapped() function which checks if the entire range
<start,end> is mapped with type.

This is done by moving the local start variable to the end of each
known-good region; if at the end of the function the start address is
still before end, there must be a part that's not of the correct type;
otherwise it's a good region.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-09 11:53:50 -07:00
Arjan van de Ven
eee5a9fa63 [PATCH] x86_64: Rename e820_mapped to e820_any_mapped
Rename e820_mapped to e820_any_mapped since it tests if any part of the
range is mapped according to the type.

Later steps will introduce e820_all_mapped which will check if the
entire range is mapped with the type.  Both have their merit.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-09 11:53:17 -07:00
Andi Kleen
68a3a7feb0 [PATCH] x86_64: Reserve SRAT hotadd memory on x86-64
From: Keith Mannthey, Andi Kleen

Implement memory hotadd without sparsemem. The memory in the SRAT
hotadd area is just preserved instead and can be activated later.

There are a few restrictions:
- Only one continuous hotadd area allowed per node

The main problem is dealing with the many buggy SRAT tables
that are out there. The strategy here is to reject anything
suspicious.

Originally from Keith Mannthey, with several hacks and changes by AK
and also contributions from Andrew Morton

[ TBD: Problems pointed out by KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>:

 1) Goto's rebuild_zonelist patch will not work if CONFIG_MEMORY_HOTPLUG=n.

    Rebuilding zonelist is necessary when the system has just memory <
    4G at boot, and hot add memory > 4G.  because x86_64 has DMA32,
    ZONE_NORAML is not included into zonelist at boot time if system
    doesn't have memory >4G at boot.

    [AK: should just force the higher zones at boot time when SRAT tells us]

 2) zone and node's spanned_pages and present_pages are not incremented.
    They should be.

    For example, our server (ia64/Fujitsu PrimeQuest) can equip memory
    from 4G to 1T(maybe 2T in future), and SRAT will *always* say we have
    possible 1T +memory.  (Microsoft requires "write all possible memory
    in SRAT") When we reserve memmap for possible 1T memory, Linux will
    not work well in +minimum 4G configuraion ;)

    [AK: needs limiting to 5-10% of max memory]
 ]

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-09 11:53:16 -07:00
Ashok Raj
c12ea918ee x86_64: Remove stale lapic definition from apicdef.h
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Len Brown <len.brown@intel.com>
2006-04-01 22:50:03 -05:00
Andrew Morton
2cf8d82d63 [PATCH] make local_t signed
local_t's were defined to be unsigned.  This increases confusion because
atomic_t's are signed.  The patch goes through and changes all implementations
to use signed longs throughout.

Also, x86-64 was using 32-bit quantities for the value passed into local_add()
and local_sub().  Fixed.

All (actually, both) existing users have been audited.

(Also s/__inline__/inline/ in x86_64/local.h)

Cc: Andi Kleen <ak@muc.de>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Kyle McMartin <kyle@parisc-linux.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-31 12:18:55 -08:00
Jens Axboe
5274f052e7 [PATCH] Introduce sys_splice() system call
This adds support for the sys_splice system call. Using a pipe as a
transport, it can connect to files or sockets (latter as output only).

From the splice.c comments:

   "splice": joining two ropes together by interweaving their strands.

   This is the "extended pipe" functionality, where a pipe is used as
   an arbitrary in-memory buffer. Think of a pipe as a small kernel
   buffer that you can use to transfer data from one end to the other.

   The traditional unix read/write is extended with a "splice()" operation
   that transfers data buffers to or from a pipe buffer.

   Named by Larry McVoy, original implementation from Linus, extended by
   Jens to support splicing to files and fixing the initial implementation
   bugs.

Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-30 12:28:18 -08:00
Adrian Bunk
f45e4656ac [PATCH] arch/i386/kernel/microcode.c: remove the obsolete microcode_ioctl
Nowadays, even Debian stable ships a microcode_ctl utility recent enough to no
longer use this ioctl.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Tigran Aivazian <tigran_aivazian@symantec.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 09:16:06 -08:00
Alan Stern
e041c68341 [PATCH] Notifier chain update: API changes
The kernel's implementation of notifier chains is unsafe.  There is no
protection against entries being added to or removed from a chain while the
chain is in use.  The issues were discussed in this thread:

    http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2

We noticed that notifier chains in the kernel fall into two basic usage
classes:

	"Blocking" chains are always called from a process context
	and the callout routines are allowed to sleep;

	"Atomic" chains can be called from an atomic context and
	the callout routines are not allowed to sleep.

We decided to codify this distinction and make it part of the API.  Therefore
this set of patches introduces three new, parallel APIs: one for blocking
notifiers, one for atomic notifiers, and one for "raw" notifiers (which is
really just the old API under a new name).  New kinds of data structures are
used for the heads of the chains, and new routines are defined for
registration, unregistration, and calling a chain.  The three APIs are
explained in include/linux/notifier.h and their implementation is in
kernel/sys.c.

With atomic and blocking chains, the implementation guarantees that the chain
links will not be corrupted and that chain callers will not get messed up by
entries being added or removed.  For raw chains the implementation provides no
guarantees at all; users of this API must provide their own protections.  (The
idea was that situations may come up where the assumptions of the atomic and
blocking APIs are not appropriate, so it should be possible for users to
handle these things in their own way.)

There are some limitations, which should not be too hard to live with.  For
atomic/blocking chains, registration and unregistration must always be done in
a process context since the chain is protected by a mutex/rwsem.  Also, a
callout routine for a non-raw chain must not try to register or unregister
entries on its own chain.  (This did happen in a couple of places and the code
had to be changed to avoid it.)

Since atomic chains may be called from within an NMI handler, they cannot use
spinlocks for synchronization.  Instead we use RCU.  The overhead falls almost
entirely in the unregister routine, which is okay since unregistration is much
less frequent that calling a chain.

Here is the list of chains that we adjusted and their classifications.  None
of them use the raw API, so for the moment it is only a placeholder.

  ATOMIC CHAINS
  -------------
arch/i386/kernel/traps.c:		i386die_chain
arch/ia64/kernel/traps.c:		ia64die_chain
arch/powerpc/kernel/traps.c:		powerpc_die_chain
arch/sparc64/kernel/traps.c:		sparc64die_chain
arch/x86_64/kernel/traps.c:		die_chain
drivers/char/ipmi/ipmi_si_intf.c:	xaction_notifier_list
kernel/panic.c:				panic_notifier_list
kernel/profile.c:			task_free_notifier
net/bluetooth/hci_core.c:		hci_notifier
net/ipv4/netfilter/ip_conntrack_core.c:	ip_conntrack_chain
net/ipv4/netfilter/ip_conntrack_core.c:	ip_conntrack_expect_chain
net/ipv6/addrconf.c:			inet6addr_chain
net/netfilter/nf_conntrack_core.c:	nf_conntrack_chain
net/netfilter/nf_conntrack_core.c:	nf_conntrack_expect_chain
net/netlink/af_netlink.c:		netlink_chain

  BLOCKING CHAINS
  ---------------
arch/powerpc/platforms/pseries/reconfig.c:	pSeries_reconfig_chain
arch/s390/kernel/process.c:		idle_chain
arch/x86_64/kernel/process.c		idle_notifier
drivers/base/memory.c:			memory_chain
drivers/cpufreq/cpufreq.c		cpufreq_policy_notifier_list
drivers/cpufreq/cpufreq.c		cpufreq_transition_notifier_list
drivers/macintosh/adb.c:		adb_client_list
drivers/macintosh/via-pmu.c		sleep_notifier_list
drivers/macintosh/via-pmu68k.c		sleep_notifier_list
drivers/macintosh/windfarm_core.c	wf_client_list
drivers/usb/core/notify.c		usb_notifier_list
drivers/video/fbmem.c			fb_notifier_list
kernel/cpu.c				cpu_chain
kernel/module.c				module_notify_list
kernel/profile.c			munmap_notifier
kernel/profile.c			task_exit_notifier
kernel/sys.c				reboot_notifier_list
net/core/dev.c				netdev_chain
net/decnet/dn_dev.c:			dnaddr_chain
net/ipv4/devinet.c:			inetaddr_chain

It's possible that some of these classifications are wrong.  If they are,
please let us know or submit a patch to fix them.  Note that any chain that
gets called very frequently should be atomic, because the rwsem read-locking
used for blocking chains is very likely to incur cache misses on SMP systems.
(However, if the chain's callout routines may sleep then the chain cannot be
atomic.)

The patch set was written by Alan Stern and Chandra Seetharaman, incorporating
material written by Keith Owens and suggestions from Paul McKenney and Andrew
Morton.

[jes@sgi.com: restructure the notifier chain initialization macros]
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:50 -08:00
Ingo Molnar
8f17d3a504 [PATCH] lightweight robust futexes updates
- fix: initialize the robust list(s) to NULL in copy_process.

- doc update

- cleanup: rename _inuser to _inatomic

- __user cleanups and other small cleanups

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:49 -08:00
Ingo Molnar
8fdd6c6df7 [PATCH] lightweight robust futexes: x86_64
x86_64: add the futex_atomic_cmpxchg_inuser() assembly implementation, and
wire up the new syscalls.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Acked-by: Ulrich Drepper <drepper@redhat.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:49 -08:00