Commit graph

3534 commits

Author SHA1 Message Date
Randy Dunlap
c9cf55285e [PATCH] add poison.h and patch primary users
Localize poison values into one header file for better documentation and
easier/quicker debugging and so that the same values won't be used for
multiple purposes.

Use these constants in core arch., mm, driver, and fs code.

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Acked-by: Matt Mackall <mpm@selenic.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27 17:32:38 -07:00
Ingo Molnar
e6e5494cb2 [PATCH] vdso: randomize the i386 vDSO by moving it into a vma
Move the i386 VDSO down into a vma and thus randomize it.

Besides the security implications, this feature also helps debuggers, which
can COW a vma-backed VDSO just like a normal DSO and can thus do
single-stepping and other debugging features.

It's good for hypervisors (Xen, VMWare) too, which typically live in the same
high-mapped address space as the VDSO, hence whenever the VDSO is used, they
get lots of guest pagefaults and have to fix such guest accesses up - which
slows things down instead of speeding things up (the primary purpose of the
VDSO).

There's a new CONFIG_COMPAT_VDSO (default=y) option, which provides support
for older glibcs that still rely on a prelinked high-mapped VDSO.  Newer
distributions (using glibc 2.3.3 or later) can turn this option off.  Turning
it off is also recommended for security reasons: attackers cannot use the
predictable high-mapped VDSO page as syscall trampoline anymore.

There is a new vdso=[0|1] boot option as well, and a runtime
/proc/sys/vm/vdso_enabled sysctl switch, that allows the VDSO to be turned
on/off.

(This version of the VDSO-randomization patch also has working ELF
coredumping, the previous patch crashed in the coredumping code.)

This code is a combined work of the exec-shield VDSO randomization
code and Gerd Hoffmann's hypervisor-centric VDSO patch. Rusty Russell
started this patch and i completed it.

[akpm@osdl.org: cleanups]
[akpm@osdl.org: compile fix]
[akpm@osdl.org: compile fix 2]
[akpm@osdl.org: compile fix 3]
[akpm@osdl.org: revernt MAXMEM change]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Cc: Gerd Hoffmann <kraxel@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27 17:32:38 -07:00
KAMEZAWA Hiroyuki
76b67ed9dc [PATCH] node hotplug: register cpu: remove node struct
With Goto-san's patch, we can add new pgdat/node at runtime.  I'm now
considering node-hot-add with cpu + memory on ACPI.

I found acpi container, which describes node, could evaluate cpu before
memory. This means cpu-hot-add occurs before memory hot add.

In most part, cpu-hot-add doesn't depend on node hot add.  But register_cpu(),
which creates symbolic link from node to cpu, requires that node should be
onlined before register_cpu().  When a node is onlined, its pgdat should be
there.

This patch-set holds off creating symbolic link from node to cpu
until node is onlined.

This removes node arguments from register_cpu().

Now, register_cpu() requires 'struct node' as its argument.  But the array of
struct node is now unified in driver/base/node.c now (By Goto's node hotplug
patch).  We can get struct node in generic way.  So, this argument is not
necessary now.

This patch also guarantees add cpu under node only when node is onlined.  It
is necessary for node-hot-add vs.  cpu-hot-add patch following this.

Moreover, register_cpu calculates cpu->node_id by cpu_to_node() without regard
to its 'struct node *root' argument.  This patch removes it.

Also modify callers of register_cpu()/unregister_cpu, whose args are changed
by register-cpu-remove-node-struct patch.

[Brice.Goglin@ens-lyon.org: fix it]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Brice Goglin <Brice.Goglin@ens-lyon.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27 17:32:37 -07:00
Yasunori Goto
dd0932d9d4 [PATCH] pgdat allocation and update for ia64 of memory hotplug: allocate pgdat and per node data
This is a patch to allocate pgdat and per node data area for ia64.  The size
for them can be calculated by compute_pernodesize().

Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27 17:32:37 -07:00
Yasunori Goto
7049027c6f [PATCH] pgdat allocation and update for ia64 of memory hotplug: update pgdat address array
This is to refresh node_data[] array for ia64.  As I mentioned previous
patches, ia64 has copies of information of pgdat address array on each node as
per node data.

At v2 of node_add, this function used stop_machine_run() to update them.  (I
wished that they were copied safety as much as possible.) But, in this patch,
this arrays are just copied simply, and set node_online_map bit after
completion of pgdat initialization.

So, kernel must touch NODE_DATA() macro after checking node_online_map().
(Current code has already done it.) This is more simple way for just
hot-add.....

Note : It will be problem when hot-remove will occur,
       because, even if online_map bit is set, kernel may
       touch NODE_DATA() due to race condition. :-(

Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27 17:32:37 -07:00
Yasunori Goto
0fc44159bf [PATCH] Register sysfs file for hotplugged new node
When new node becomes enable by hot-add, new sysfs file must be created for
new node.  So, if new node is enabled by add_memory(), register_one_node() is
called to create it.  In addition, I386's arch_register_node() and a part of
register_nodes() of powerpc are consolidated to register_one_node() as a
generic_code().

This is tested by Tiger4(IPF) with node hot-plug emulation.

Signed-off-by: Keiichiro Tokunaga <tokuanga.keiich@jp.fujitsu.com>
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27 17:32:36 -07:00
KAMEZAWA Hiroyuki
2842f11419 [PATCH] catch valid mem range at onlining memory
This patch allows hot-add memory which is not aligned to section.

Now, hot-added memory has to be aligned to section size.  Considering big
section sized archs, this is not useful.

When hot-added memory is registerd as iomem resoruce by iomem resource
patch, we can make use of that information to detect valid memory range.

Note: With this, not-aligned memory can be registerd. To allow hot-add
      memory with holes, we have to do more work around add_memory().
      (It doesn't allows add memory to already existing mem section.)

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27 17:32:36 -07:00
Yasunori Goto
3218ae14b1 [PATCH] pgdat allocation for new node add (export kswapd start func)
When node is hot-added, kswapd for the node should start.  This export kswapd
start function as kswapd_run() to use at add_memory().

[akpm@osdl.org: daemonize() isn't needed when using the kthread API]
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27 17:32:36 -07:00
Yasunori Goto
10ad400b49 [PATCH] pgdat allocation for new node add (refresh node_data[])
Refresh NODE_DATA() for generic archs.  In this case, NODE_DATA(nid) ==
node_data[nid].  node_data[] is array of address of pgdat.  So, refresh is
quite simple.

Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27 17:32:36 -07:00
Yasunori Goto
306d6cbe86 [PATCH] pgdat allocation for new node add (generic alloc node_data)
For node hotplug, basically we have to allocate new pgdat.  But, there are
several types of implementations of pgdat.

1. Allocate only pgdat.
   This style allocate only pgdat area.
   And its address is recorded in node_data[].
   It is most popular style.

2. Static array of pgdat
   In this case, all of pgdats are static array.
   Some archs use this style.

3. Allocate not only pgdat, but also per node data.
   To increase performance, each node has copy of some data as
   a per node data. So, this area must be allocated too.

   Ia64 is this style. Ia64 has the copies of node_data[] array
   on each per node data to increase performance.

In this series of patches, treat (1) as generic arch.

generic archs can use generic function. (2) and (3) should have
its own if necessary.

This patch defines pgdat allocator.
Updating NODE_DATA() macro function is in other patch.

Signed-off-by: Yasonori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27 17:32:36 -07:00
Yasunori Goto
1e3590e2e4 [PATCH] pgdat allocation for new node add (get node id by acpi)
This is to find node id from acpi's handle of memory_device in DSDT.  _PXM for
the new node can be found by acpi_get_pxm() by using new memory's handle.  So,
node id can be found by pxm_to_nid_map[].

  This patch becomes simpler than v2 of node hot-add patch.
  Because old add_memory() function doesn't have node id parameter.
  So, kernel must find its handle by physical address via DSDT again.
  But, v3 just give node id to add_memory() now.

Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27 17:32:36 -07:00
Yasunori Goto
bc02af93dd [PATCH] pgdat allocation for new node add (specify node id)
Change the name of old add_memory() to arch_add_memory.  And use node id to
get pgdat for the node at NODE_DATA().

Note: Powerpc's old add_memory() is defined as __devinit. However,
      add_memory() is usually called only after bootup.
      I suppose it may be redundant. But, I'm not well known about powerpc.
      So, I keep it. (But, __meminit is better at least.)

Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27 17:32:35 -07:00
Linus Torvalds
da206c9e68 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial:
  typo fixes
  Clean up 'inline is not at beginning' warnings for usb storage
  Storage class should be first
  i386: Trivial typo fixes
  ixj: make ixj_set_tone_off() static
  spelling fixes
  fix paniced->panicked typos
  Spelling fixes for Documentation/atomic_ops.txt
  move acknowledgment for Mark Adler to CREDITS
  remove the bouncing email address of David Campbell
2006-06-26 13:33:14 -07:00
Linus Torvalds
09c0dc6862 Revert "[PATCH] kthread: update loop.c to use kthread"
This reverts commit c7b2eff059.

Hugh Dickins explains:

 "It seems too little tested: "losetup -d /dev/loop0" fails with
  EINVAL because nothing sets lo_thread; but even when you patch
  loop_thread() to set lo->lo_thread = current, it can't survive
  more than a few dozen iterations of the loop below (with a tmpfs
  mounted on /tst):

	j=0
	cp /dev/zero /tst
	while :
	do
	    let j=j+1
	    echo "Doing pass $j"
	    losetup /dev/loop0 /tst/zero
	    mkfs -t ext2 -b 1024 /dev/loop0 >/dev/null 2>&1
	    mount -t ext2 /dev/loop0 /mnt
	    umount /mnt
	    losetup -d /dev/loop0
	done

  it collapses with failed ioctl then BUG_ON(!bio).

  I think the original lo_done completion was more subtle and safe
  than the kthread conversion has allowed for."

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 11:55:42 -07:00
Linus Torvalds
2a2ed2db35 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild: (40 commits)
  kbuild: trivial fixes in Makefile
  kbuild: adding symbols in Kconfig and defconfig to TAGS
  kbuild: replace abort() with exit(1)
  kbuild: support for %.symtypes files
  kbuild: fix silentoldconfig recursion
  kbuild: add option for stripping modules while installing them
  kbuild: kill some false positives from modpost
  kbuild: export-symbol usage report generator
  kbuild: fix make -rR breakage
  kbuild: append -dirty for updated but uncommited changes
  kbuild: append git revision for all untagged commits
  kbuild: fix module.symvers parsing in modpost
  kbuild: ignore make's built-in rules & variables
  kbuild: bugfix with initramfs
  kbuild: modpost build fix
  kbuild: check license compatibility when building modules
  kbuild: export-type enhancement to modpost.c
  kbuild: add dependency on kernel.release to the package targets
  kbuild: `make kernelrelease' speedup
  kconfig: KCONFIG_OVERWRITECONFIG
  ...
2006-06-26 11:05:15 -07:00
Linus Torvalds
972d19e837 Merge master.kernel.org:/pub/scm/linux/kernel/git/herbert/crypto-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  [CRYPTO] tcrypt: Forbid tcrypt from being built-in
  [CRYPTO] aes: Add wrappers for assembly routines
  [CRYPTO] tcrypt: Speed benchmark support for digest algorithms
  [CRYPTO] tcrypt: Return -EAGAIN from module_init()
  [CRYPTO] api: Allow replacement when registering new algorithms
  [CRYPTO] api: Removed const from cra_name/cra_driver_name
  [CRYPTO] api: Added cra_init/cra_exit
  [CRYPTO] api: Fixed incorrect passing of context instead of tfm
  [CRYPTO] padlock: Rearrange context structure to reduce code size
  [CRYPTO] all: Pass tfm instead of ctx to algorithms
  [CRYPTO] digest: Remove unnecessary zeroing during init
  [CRYPTO] aes-i586: Get rid of useless function wrappers
  [CRYPTO] digest: Add alignment handling
  [CRYPTO] khazad: Use 32-bit reads on key
2006-06-26 11:03:29 -07:00
Linus Torvalds
cdf4f383a4 Merge master.kernel.org:/pub/scm/linux/kernel/git/dtor/input
* master.kernel.org:/pub/scm/linux/kernel/git/dtor/input:
  Input: iforce - remove some pointless casts
  Input: psmouse - add support for Intellimouse 4.0
  Input: atkbd - fix HANGEUL/HANJA keys
  Input: fix misspelling of Hangeul key
  Input: via-pmu - add input device support
  Input: rearrange exports
  Input: fix formatting to better follow CodingStyle
  Input: reset name, phys and uniq when unregistering
  Input: return correct size when reading modalias attribute
  Input: change my e-mail address in MAINTAINERS file
  Input: fix potential overflows in driver/input/keyboard
  Input: fix potential overflows in driver/input/touchscreen
  Input: fix potential overflows in driver/input/joystick
  Input: fix potential overflows in driver/input/mouse
  Input: fix accuracy of fixp-arith.h
  Input: iforce - use ENOSPC instead of ENOMEM
  Input: constify drivers/char/keyboard.c
2006-06-26 11:01:58 -07:00
Linus Torvalds
5f2f444136 Merge master.kernel.org:/pub/scm/linux/kernel/git/mchehab/v4l-dvb
* master.kernel.org:/pub/scm/linux/kernel/git/mchehab/v4l-dvb:
  V4L/DVB (4227): Update this driver for recent header file movement.
  V4L/DVB (4223): Add V4L2_CID_MPEG_STREAM_VBI_FMT control
  V4L/DVB (4222): Always switch tuner mode when calling VIDIOC_S_FREQUENCY.
  V4L/DVB (4221): Add HM12 YUV format define.
  V4L/DVB (4219): Av7110: analog sound output of DVB-C rev 2.3
  V4L/DVB (4217): Fix a misplaced closing bracket/else, which caused swzigzag not to be called
  V4L/DVB (4215): Make VIDEO_CX88_BLACKBIRD a separate build option
  V4L/DVB (4214): Make VIDEO_CX2341X a selectable build option
  V4L/DVB (4213): Cx88: cleanups
  V4L/DVB (4211): Fix an Oops for all fe that have get_frontend_algo == NULL
2006-06-26 10:54:02 -07:00
Linus Torvalds
81a07d7588 Merge branch 'x86-64'
* x86-64: (83 commits)
  [PATCH] x86_64: x86_64 stack usage debugging
  [PATCH] x86_64: (resend) x86_64 stack overflow debugging
  [PATCH] x86_64: msi_apic.c build fix
  [PATCH] x86_64: i386/x86-64 Add nmi watchdog support for new Intel CPUs
  [PATCH] x86_64: Avoid broadcasting NMI IPIs
  [PATCH] x86_64: fix apic error on bootup
  [PATCH] x86_64: enlarge window for stack growth
  [PATCH] x86_64: Minor string functions optimizations
  [PATCH] x86_64: Move export symbols to their C functions
  [PATCH] x86_64: Standardize i386/x86_64 handling of NMI_VECTOR
  [PATCH] x86_64: Fix modular pc speaker
  [PATCH] x86_64: remove sys32_ni_syscall()
  [PATCH] x86_64: Do not use -ffunction-sections for modules
  [PATCH] x86_64: Add cpu_relax to apic_wait_icr_idle
  [PATCH] x86_64: adjust kstack_depth_to_print default
  [PATCH] i386/x86-64: adjust /proc/interrupts column headings
  [PATCH] x86_64: Fix race in cpu_local_* on preemptible kernels
  [PATCH] x86_64: Fix fast check in safe_smp_processor_id
  [PATCH] x86_64: x86_64 setup.c - printing cmp related boottime information
  [PATCH] i386/x86-64/ia64: Move polling flag into thread_info_status
  ...

Manual resolve of trivial conflict in arch/i386/kernel/Makefile
2006-06-26 10:51:09 -07:00
Vojtech Pavlik
05ebb76109 [PATCH] x86_64: Add useful constants to time.h
In timekeeping code, one often does need to use conversion constants. Naming
these leads to code that's easier to understand, showing the reader between
which units the conversion is made.

Signed-off-by: Vojtech Pavlik <vojtech@suse.cz>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:19 -07:00
Jan Beulich
83f4fcce7f [PATCH] x86_64: allow unwinder to build without module support
Add proper conditionals to be able to build with CONFIG_MODULES=n.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:18 -07:00
Jan Beulich
c33bd9aac0 [PATCH] i386/x86-64: fall back to old-style call trace if no unwinding
If no unwinding is possible at all for a certain exception instance,
fall back to the old style call trace instead of not showing any trace
at all.

Also, allow setting the stack trace mode at the command line.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:18 -07:00
Jan Beulich
4552d5dc08 [PATCH] x86_64: reliable stack trace support
These are the generic bits needed to enable reliable stack traces based
on Dwarf2-like (.eh_frame) unwind information. Subsequent patches will
enable x86-64 and i386 to make use of this.

Thanks to Andi Kleen and Ingo Molnar, who pointed out several possibilities
for improvement.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:17 -07:00
Andi Kleen
08cd36570e [PATCH] x86_64: Optimize bitmap_weight for small bitmaps
Use inline code bitmaps <= BITS_PER_LONG in bitmap_weight. This
gives _much_ better code.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:16 -07:00
Andi Kleen
bebfa1013e [PATCH] x86_64: Add compat_printk and sysctl to turn off compat layer warnings
Sometimes e.g. with crashme the compat layer warnings can be noisy.
Add a way to turn them off by gating all output through compat_printk
that checks a global sysctl. The default is not changed.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 10:48:16 -07:00
Linus Torvalds
61a46dc9d1 Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (42 commits)
  [IOAT]: Do not dereference THIS_MODULE directly to set unsafe.
  [NETROM]: Fix possible null pointer dereference.
  [NET] netpoll: break recursive loop in netpoll rx path
  [NET] netpoll: don't spin forever sending to stopped queues
  [IRDA]: add some IBM think pads
  [ATM]: atm/mpc.c warning fix
  [NET]: skb_find_text ignores to argument
  [NET]: make net/core/dev.c:netdev_nit static
  [NET]: Fix GSO problems in dev_hard_start_xmit()
  [NET]: Fix CHECKSUM_HW GSO problems.
  [TIPC]: Fix incorrect correction to discovery timer frequency computation.
  [TIPC]: Get rid of dynamically allocated arrays in broadcast code.
  [TIPC]: Fixed link switchover bugs
  [TIPC]: Enhanced & cleaned up system messages; fixed 2 obscure memory leaks.
  [TIPC]: First phase of assert() cleanup
  [TIPC]: Disallow config operations that aren't supported in certain modes.
  [TIPC]: Fixed memory leak in tipc_link_send() when destination is unreachable
  [TIPC]: Added missing warning for out-of-memory condition
  [TIPC]: Withdrawing all names from nameless port now returns success, not error
  [TIPC]: Optimized argument validation done by connect().
  ...
2006-06-26 10:08:13 -07:00
NeilBrown
4254376914 [PATCH] md: Don't write dirty/clean update to spares - leave them alone
- record the 'event' count on each individual device (they
  might sometimes be slightly different now)
- add a new value for 'sb_dirty': '3' means that the super
  block only needs to be updated to record a clean<->dirty
  transition.
- Prefer odd event numbers for dirty states and even numbers
  for clean states
- Using all the above, don't update the superblock on
  a spare device if the update is just doing a clean-dirty
  transition.  To accomodate this, a transition from
  dirty back to clean might now decrement the events counter
  if nothing else has changed.

The net effect of this is that spare drives will not see any IO requests
during normal running of the array, so they can go to sleep if that is what
they want to do.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:39 -07:00
NeilBrown
d785a06a0b [PATCH] md/bitmap: change md/bitmap file handling to use bmap to file blocks
If md is asked to store a bitmap in a file, it tries to hold onto the page
cache pages for that file, manipulate them directly, and call a cocktail of
operations to write the file out.  I don't believe this is a supportable
approach.

This patch changes the approach to use the same approach as swap files.  i.e.
bmap is used to enumerate all the block address of parts of the file and we
write directly to those blocks of the device.

swapfile only uses parts of the file that provide a full pages at contiguous
addresses.  We don't have that luxury so we have to cope with pages that are
non-contiguous in storage.  To handle this we attach buffers to each page, and
store the addresses in those buffers.

With this approach the pagecache may contain data which is inconsistent with
what is on disk.  To alleviate the problems this can cause, md invalidates the
pagecache when releasing the file.  If the file is to be examined while the
array is active (a non-critical but occasionally useful function), O_DIRECT io
must be used.  And new version of mdadm will have support for this.

This approach simplifies a lot of code:
 - we no longer need to keep a list of pages which we need to wait for,
   as the b_endio function can keep track of how many outstanding
   writes there are.  This saves a mempool.
 - -EAGAIN returns from write_page are no longer possible (not sure if
    they ever were actually).

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:38 -07:00
NeilBrown
0b79ccf0cd [PATCH] md/bitmap: remove bitmap writeback daemon
md/bitmap currently has a separate thread to wait for writes to the bitmap
file to complete (as we cannot get a callback on that action).

However this isn't needed as bitmap_unplug is called from process context and
waits for the writeback thread to do it's work.  The same result can be
achieved by doing the waiting directly in bitmap_unplug.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:38 -07:00
Adrian Bunk
5e56341d02 [PATCH] md: make md_print_devices() static
This patch makes the needlessly global md_print_devices() static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:37 -07:00
NeilBrown
c93983bf51 [PATCH] md: support stripe/offset mode in raid10
The "industry standard" DDF format allows for a stripe/offset layout where
data is duplicated on different stripes.  e.g.

  A  B  C  D
  D  A  B  C
  E  F  G  H
  H  E  F  G

(columns are drives, rows are stripes, LETTERS are chunks of data).

This is similar to raid10's 'far' mode, but not quite the same.  So enhance
'far' mode with a 'far/offset' option which follows the layout of DDFs
stripe/offset.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:37 -07:00
NeilBrown
7c7546ccf6 [PATCH] md: allow a linear array to have drives added while active
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:37 -07:00
NeilBrown
5fd6c1dce0 [PATCH] md: allow checkpoint of recovery with version-1 superblock
For a while we have had checkpointing of resync.  The version-1 superblock
allows recovery to be checkpointed as well, and this patch implements that.

Due to early carelessness we need to add a feature flag to signal that the
recovery_offset field is in use, otherwise older kernels would assume that a
partially recovered array is in fact fully recovered.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:37 -07:00
NeilBrown
16a53ecc35 [PATCH] md: merge raid5 and raid6 code
There is a lot of commonality between raid5.c and raid6main.c.  This patches
merges both into one module called raid456.  This saves a lot of code, and
paves the way for online raid5->raid6 migrations.

There is still duplication, e.g.  between handle_stripe5 and handle_stripe6.
This will probably be cleaned up later.

Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:37 -07:00
NeilBrown
8932c2e0dc [PATCH] md: remove arbitrary limit on chunk size
The largest chunk size the code can support without substantial surgery is
2^30 bytes, so make that the limit instead of an arbitrary 4Meg.  Some day,
the 'chunksize' should change to a sector-shift instead of a byte-count.  Then
no limit would be needed.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:36 -07:00
Alasdair G Kergon
72d9486169 [PATCH] dm: improve error message consistency
Tidy device-mapper error messages to include context information
automatically.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:36 -07:00
Alasdair G Kergon
5c6bd75d06 [PATCH] dm: prevent removal if open
If you misuse the device-mapper interface (or there's a bug in your userspace
tools) it's possible to end up with 'unlinked' mapped devices that cannot be
removed until you reboot (along with uninterruptible processes).

This patch prevents you from removing a device that is still open.

It introduces dm_lock_for_deletion() which is called when a device is about to
be removed to ensure that nothing has it open and nothing further can open it.
 It uses a private open_count for this which also lets us remove one of the
problematic bdget_disk() calls elsewhere.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:36 -07:00
David Teigland
c2ade42dd3 [PATCH] dm: create error table
Add a library function dm_create_error_table() to create a table that rejects
any I/O sent to a device with EIO.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:36 -07:00
Alasdair G Kergon
17b2f66f2a [PATCH] dm: add exports
Move definitions of core device-mapper functions for manipulating mapped
devices and their tables to <linux/device-mapper.h> advertising their
availability for use elsewhere in the kernel.

Protect the contents of device-mapper.h with ifdef __KERNEL__.  And throw
in a few formatting clean-ups and extra comments.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:36 -07:00
Jeff Mahoney
5806f07cd2 [PATCH] lib: add idr_replace
This patch adds idr_replace() to replace an existing pointer in a single
operation.

Device-mapper will use this to update the pointer it stored against a given
id.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:34 -07:00
Antonino A. Daplas
e614b18dce [PATCH] VT binding: Update fbcon to support binding
The control for binding/unbinding is moved from fbcon to the console layer.
Thus the fbcon sysfs attributes, attach and detach, are also gone.

    1. Add a notifier event that tells fbcon if a framebuffer driver has been
       unregistered.  If no registered driver remains, fbcon will unregister
       itself from the console layer.

    2. Replaced calls to give_up_console() with unregister_con_driver().

    3. Still use take_over_console() instead of register_con_driver() to
       maintain compatibility

    4. Respect the parameter first_fb_vc and last_fb_vc instead of using 0 and
       MAX_NR_CONSOLES - 1. These parameters are settable by the user.

    5. When fbcon is completely unbound from the console layer, fbcon will
       also release (iow, decrement module reference counts to zero) all fbdev
       drivers. In other words, a bind or unbind request from the console layer
       will propagate down to the framebuffer drivers.

    6. If fbcon is not bound to the console, it will ignore all notifier
       events (except driver registration and unregistration) and all sysfs
       requests.

Signed-off-by: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:33 -07:00
Antonino A. Daplas
3e795de763 [PATCH] VT binding: Add binding/unbinding support for the VT console
The framebuffer console is now able to dynamically bind and unbind from the VT
console layer.  Due to the way the VT console layer works, the drivers
themselves decide when to bind or unbind.  However, it was decided that
binding must be controlled, not by the drivers themselves, but by the VT
console layer.  With this, dynamic binding is possible for all VT console
drivers, not just fbcon.

Thus, the VT console layer will impose the following to all VT console
drivers:

- all registered VT console drivers will be entered in a private list
- drivers can register themselves to the VT console layer, but they cannot
  decide when to bind or unbind. (Exception: To maintain backwards
  compatibility, take_over_console() will automatically bind the driver after
  registration.)
- drivers can remove themselves from the list by unregistering from the VT
  console layer. A prerequisite for unregistration is that the driver must not
  be bound.

The following functions are new in the vt.c:

register_con_driver() - public function, this function adds the VT console
driver to an internal list maintained by the VT console

bind_con_driver() - private function, it binds the driver to the console

take_over_console() is changed to call register_con_driver() followed by a
bind_con_driver().  This is the only time drivers can decide when to bind to
the VT layer.  This is to maintain backwards compatibility.

unbind_con_driver() - private function, it unbinds the driver from its
console.  The vacated consoles will be taken over by the default boot console
driver.

unregister_con_driver() - public function, removes the driver from the
internal list maintained by the VT console.  It will only succeed if the
driver is currently unbound.

con_is_bound() checks if the driver is currently bound or not

give_up_console() is just a wrapper to unregister_con_driver().

There are also 3 additional functions meant to be called only by the tty layer
for sysfs control:

	vt_bind() - calls bind_con_driver()
	vt_unbind() - calls unbind_con_driver()
	vt_show_drivers() - shows the list of registered drivers

Most VT console drivers will continue to work as is, but might have problems
when unbinding or binding which should be fixable with minimal changes.

Signed-off-by: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:33 -07:00
Antonino A. Daplas
9a17917671 [PATCH] Detaching fbcon: sdd sysfs class device entry for fbcon
In order for this feature to work, an interface will be needed.  The most
appropriate is sysfs.  However, the framebuffer console has no sysfs entry
yet.  This will create a sysfs class device entry for fbcon under
/sys/class/graphics.

Add a class_device entry 'fbcon' under class 'graphics'.  Console-specific
attributes which where previously under class/graphics/fb[x] are moved to
class/graphics/fbcon.  These attributes, 'con_rotate' and 'con_rotate_all',
are also renamed to 'rotate' and 'rotate_all' respectively.

Signed-off-by: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:32 -07:00
Oleg Nesterov
d5f70c00ad [PATCH] coredump: kill ptrace related stuff
With this patch zap_process() sets SIGNAL_GROUP_EXIT while sending SIGKILL to
the thread group.  This means that a TASK_TRACED task

	1. Will be awakened by signal_wake_up(1)

	2. Can't sleep again via ptrace_notify()

	3. Can't go to do_signal_stop() after return
	   from ptrace_stop() in get_signal_to_deliver()

So we can remove all ptrace related stuff from coredump path.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:27 -07:00
Eric W. Biederman
13b41b0949 [PATCH] proc: Use struct pid not struct task_ref
Incrementally update my proc-dont-lock-task_structs-indefinitely patches so
that they work with struct pid instead of struct task_ref.

Mostly this is a straight 1-1 substitution.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:26 -07:00
Eric W. Biederman
99f8955183 [PATCH] proc: don't lock task_structs indefinitely
Every inode in /proc holds a reference to a struct task_struct.  If a
directory or file is opened and remains open after the the task exits this
pinning continues.  With 8K stacks on a 32bit machine the amount pinned per
file descriptor is about 10K.

Normally I would figure a reasonable per user process limit is about 100
processes.  With 80 processes, with a 1000 file descriptors each I can trigger
the 00M killer on a 32bit kernel, because I have pinned about 800MB of useless
data.

This patch replaces the struct task_struct pointer with a pointer to a struct
task_ref which has a struct task_struct pointer.  The so the pinning of dead
tasks does not happen.

The code now has to contend with the fact that the task may now exit at any
time.  Which is a little but not muh more complicated.

With this change it takes about 1000 processes each opening up 1000 file
descriptors before I can trigger the OOM killer.  Much better.

[mlp@google.com: task_mmu small fixes]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Paul Jackson <pj@sgi.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Albert Cahalan <acahalan@gmail.com>
Signed-off-by: Prasanna Meda <mlp@google.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:25 -07:00
Eric W. Biederman
48e6484d49 [PATCH] proc: Rewrite the proc dentry flush on exit optimization
To keep the dcache from filling up with dead /proc entries we flush them on
process exit.  However over the years that code has gotten hairy with a
dentry_pointer and a lock in task_struct and misdocumented as a correctness
feature.

I have rewritten this code to look and see if we have a corresponding entry in
the dcache and if so flush it on process exit.  This removes the extra fields
in the task_struct and allows me to trivially handle the case of a
/proc/<tgid>/task/<pid> entry as well as the current /proc/<pid> entries.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:24 -07:00
Eric W. Biederman
aed7a6c476 [PATCH] proc: Replace proc_inode.type with proc_inode.fd
The sole renaming use of proc_inode.type is to discover the file descriptor
number, so just store the file descriptor number and don't wory about
processing this field.  This removes any /proc limits on the maximum number of
file descriptors, and clears the path to make the hard coded /proc inode
numbers go away.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:24 -07:00
Hansjoerg Lipp
5024ad4af6 [PATCH] i4l: Gigaset drivers: add IOCTLs to compat_ioctl.h
Add the IOCTLs of the Gigaset drivers to compat_ioctl.h in order to make
them available for 32 bit programs on 64 bit platforms.  Please merge.

Signed-off-by: Hansjoerg Lipp <hjlipp@web.de>
Acked-by: Tilman Schmidt <tilman@imap.cc>
Cc: Karsten Keil <kkeil@suse.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:23 -07:00
Roman Zippel
19923c190e [PATCH] fix and optimize clock source update
This fixes the clock source updates in update_wall_time() to correctly
track the time coming in via current_tick_length().  Optimize the fast
paths to be as short as possible to keep the overhead low.

Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Acked-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:21 -07:00