Commit graph

6390 commits

Author SHA1 Message Date
Eugene Surovegin
bab70a4af7 [PATCH] lock PTE before updating it in 440/BookE page fault handler
Fix 44x and BookE page fault handler to correctly lock PTE before
trying to pte_update() it, otherwise this PTE might be swapped out
after pte_present() check but before pte_uptdate() call, resulting in
corrupted PTE. This can happen with enabled preemption and low memory
condition.

Signed-off-by: Eugene Surovegin <ebs@ebshome.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-29 13:44:15 +11:00
Oleg Nesterov
a7e5328a06 [PATCH] cleanup __exit_signal->cleanup_sighand path
Move 'tsk->sighand = NULL' from cleanup_sighand() to __exit_signal().  This
makes the exit path more understandable and allows us to do
cleanup_sighand() outside of ->siglock protected section.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 18:36:44 -08:00
Oleg Nesterov
47e65328a7 [PATCH] pids: kill PIDTYPE_TGID
This patch kills PIDTYPE_TGID pid_type thus saving one hash table in
kernel/pid.c and speeding up subthreads create/destroy a bit.  It is also a
preparation for the further tref/pids rework.

This patch adds 'struct list_head thread_group' to 'struct task_struct'
instead.

We don't detach group leader from PIDTYPE_PID namespace until another
thread inherits it's ->pid == ->tgid, so we are safe wrt premature
free_pidmap(->tgid) call.

Currently there are no users of find_task_by_pid_type(PIDTYPE_TGID).
Should the need arise, we can use find_task_by_pid()->group_leader.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-By: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 18:36:44 -08:00
Oleg Nesterov
6a14c5c9da [PATCH] move __exit_signal() to kernel/exit.c
__exit_signal() is private to release_task() now.  I think it is better to
make it static in kernel/exit.c and export flush_sigqueue() instead - this
function is much more simple and straightforward.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 18:36:43 -08:00
Oleg Nesterov
c81addc9d3 [PATCH] rename __exit_sighand to cleanup_sighand
Cosmetic, rename __exit_sighand to cleanup_sighand and move it close to
copy_sighand().

This matches copy_signal/cleanup_signal naming, and I think it is easier to
follow.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 18:36:43 -08:00
Oleg Nesterov
6b3934ef52 [PATCH] copy_process: cleanup bad_fork_cleanup_signal
__exit_signal() does important cleanups atomically under ->siglock.  It is
also called from copy_process's error path.  This is not good, for example we
can't move __unhash_process() under ->siglock for that reason.

We should not mix these 2 paths, just look at ugly 'if (p->sighand)' under
'bad_fork_cleanup_sighand:' label.  For copy_process() case it is sufficient
to just backout copy_signal(), nothing more.

Again, nobody can see this task yet.  For CLONE_THREAD case we just decrement
signal->count, otherwise nobody can see this ->signal and we can free it
lockless.

This patch assumes it is safe to do exit_thread_group_keys() without
tasklist_lock.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 18:36:42 -08:00
Oleg Nesterov
7001510d0c [PATCH] copy_process: cleanup bad_fork_cleanup_sighand
The only caller of exit_sighand(tsk) is copy_process's error path.  We can
call __exit_sighand() directly and kill exit_sighand().

This 'tsk' was not yet registered in pid_hash[] or init_task.tasks, it has no
external references, nobody can see it, and

	IF (clone_flags & CLONE_SIGHAND)
		At least 'current' has a reference to ->sighand, this
		means atomic_dec_and_test(sighand->count) can't be true.

	ELSE
		Nobody can see this ->sighand, this means we can free it
		without any locking.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 18:36:42 -08:00
Oleg Nesterov
f63ee72e0f [PATCH] introduce lock_task_sighand() helper
Add lock_task_sighand() helper and converts group_send_sig_info() to use
it.  Hopefully we will have more users soon.

This patch also removes '!sighand->count' and '!p->usage' checks, I think
they both are bogus, racy and unneeded (but probably it makes sense to
restore them as BUG_ON()s).

->sighand is cleared and it's ->count is decremented in release_task() with
sighand->siglock held, so it is a bug to have '!p->usage || !->count' after
we already locked and verified it is the same.  On the other hand, an
already dead task without ->sighand can have a non-zero ->usage due to
ptrace, for example.

If we read the stale value of ->sighand we must see the change after
spin_lock(), because that change was done while holding that same old
->sighand.siglock.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 18:36:42 -08:00
Oleg Nesterov
aa1757f90b [PATCH] convert sighand_cache to use SLAB_DESTROY_BY_RCU
This patch borrows a clever Hugh's 'struct anon_vma' trick.

Without tasklist_lock held we can't trust task->sighand until we locked it
and re-checked that it is still the same.

But this means we don't need to defer 'kmem_cache_free(sighand)'.  We can
return the memory to slab immediately, all we need is to be sure that
sighand->siglock can't dissapear inside rcu protected section.

To do so we need to initialize ->siglock inside ctor function,
SLAB_DESTROY_BY_RCU does the rest.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 18:36:42 -08:00
Oleg Nesterov
c7c6464117 [PATCH] pidhash: don't use zero pids
daemonize() calls set_special_pids(1,1), while init and kernel threads spawned
from init/main.c:init() run with 0,0 special pids.  This patch changes
INIT_SIGNALS() so that that they run with ->pgrp == ->session == 1 also.  This
patch relies on fact that swapper's pid == 1.

Now we have no hashed zero pids in pid_hash[].

User-space visibible change is that now /sbin/init runs with (1,1) special
pids and becomes a session leader.

Quoting Eric W. Biederman:
>
> daemonize consuming pids (1,1) then consumes pgrp 1.  So that when
> /sbin/init calls setsid() it thinks /sbin/init is a process group
> leader and setsid() fails.  So /sbin/init wants pgrp 1 session 1
> but doesn't get it.  I am pretty certain daemonize did not exist so
> /sbin/init got pgrp 1 session 1 in 2.4.
>
> That is the bug that is being fixed.
>
> This patch takes things one step farther and essentially calls
> setsid() for pid == 1 before init is execed.  That is new behavior
> but it cleans up the kernel as we now do not need to support the
> case of a process without a process group or a session.
>
> The only process that could have possibly cared was /sbin/init
> and it already calls setsid() because it doesn't want that.
>
> If this was going to break anything noticeable the change in behavior
> from 2.4 to 2.6 would have already done that.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 18:36:41 -08:00
Oleg Nesterov
73b9ebfe12 [PATCH] pidhash: don't count idle threads
fork_idle() does unhash_process() just after copy_process().  Contrary,
boot_cpu's idle thread explicitely registers itself for each pid_type with nr
= 0.

copy_process() already checks p->pid != 0 before process_counts++, I think we
can just skip attach_pid() calls and job control inits for idle threads and
kill unhash_process().  We don't need to cleanup ->proc_dentry in fork_idle()
because with this patch idle threads are never hashed in
kernel/pid.c:pid_hash[].

We don't need to hash pid == 0 in pidmap_init().  free_pidmap() is never
called with pid == 0 arg, so it will never be reused.  So it is still possible
to use pid == 0 in any PIDTYPE_xxx namespace from kernel/pid.c's POV.

However with this patch we don't hash pid == 0 for PIDTYPE_PID case.  We still
have have PIDTYPE_PGID/PIDTYPE_SID entries with pid == 0: /sbin/init and
kernel threads which don't call daemonize().

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 18:36:41 -08:00
Oleg Nesterov
c97d98931a [PATCH] kill SET_LINKS/REMOVE_LINKS
Both SET_LINKS() and SET_LINKS/REMOVE_LINKS() have exactly one caller, and
these callers already check thread_group_leader().

This patch kills theese macros, they mix two different things: setting
process's parent and registering it in init_task.tasks list.  Callers are
updated to do these actions by hand.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 18:36:41 -08:00
Oleg Nesterov
8fafabd86f [PATCH] remove add_parent()'s parent argument
add_parent(p, parent) is always called with parent == p->parent, and it makes
no sense to do it differently.  This patch removes this argument.

No changes in affected .o files.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 18:36:41 -08:00
Eric W. Biederman
d73d65293e [PATCH] pidhash: kill switch_exec_pids
switch_exec_pids is only called from de_thread by way of exec, and it is
only called when we are exec'ing from a non thread group leader.

Currently switch_exec_pids gives the leader the pid of the thread and
unhashes and rehashes all of the process groups.  The leader is already in
the EXIT_DEAD state so no one cares about it's pids.  The only concern for
the leader is that __unhash_process called from release_task will function
correctly.  If we don't touch the leader at all we know that
__unhash_process will work fine so there is no need to touch the leader.

For the task becomming the thread group leader, we just need to give it the
pid of the old thread group leader, add it to the task list, and attach it
to the session and the process group of the thread group.

Currently de_thread is also adding the task to the task list which is just
silly.

Currently the only leader of __detach_pid besides detach_pid is
switch_exec_pids because of the ugly extra work that was being
performed.

So this patch removes switch_exec_pids because it is doing too much, it is
creating an unnecessary special case in pid.c, duing work duplicated in
de_thread, and generally obscuring what it is going on.

The necessary work is added to de_thread, and it seems to be a little
clearer there what is going on.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Kirill Korotaev <dev@sw.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 18:36:40 -08:00
Eric W. Biederman
6c99c5cb94 [PATCH] Remove dead kill_sl prototype from sched.h
The kill_sl function doesn't exist in the kernel so a prototype is completely
unnecessary.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 18:36:40 -08:00
Paul Mackerras
bac30d1a78 Merge ../linux-2.6 2006-03-29 13:24:50 +11:00
Herbert Xu
d2acc3479c [INET]: Introduce tunnel4/tunnel6
Basically this patch moves the generic tunnel protocol stuff out of
xfrm4_tunnel/xfrm6_tunnel and moves it into the new files of tunnel4.c
and tunnel6 respectively.

The reason for this is that the problem that Hugo uncovered is only
the tip of the iceberg.  The real problem is that when we removed the
dependency of ipip on xfrm4_tunnel we didn't really consider the module
case at all.

For instance, as it is it's possible to build both ipip and xfrm4_tunnel
as modules and if the latter is loaded then ipip simply won't load.

After considering the alternatives I've decided that the best way out of
this is to restore the dependency of ipip on the non-xfrm-specific part
of xfrm4_tunnel.  This is acceptable IMHO because the intention of the
removal was really to be able to use ipip without the xfrm subsystem.
This is still preserved by this patch.

So now both ipip/xfrm4_tunnel depend on the new tunnel4.c which handles
the arbitration between the two.  The order of processing is determined
by a simple integer which ensures that ipip gets processed before
xfrm4_tunnel.

The situation for ICMP handling is a little bit more complicated since
we may not have enough information to determine who it's for.  It's not
a big deal at the moment since the xfrm ICMP handlers are basically
no-ops.  In future we can deal with this when we look at ICMP caching
in general.

The user-visible change to this is the removal of the TUNNEL Kconfig
prompts.  This makes sense because it can only be used through IPCOMP
as it stands.

The addition of the new modules shouldn't introduce any problems since
module dependency will cause them to be loaded.

Oh and I also turned some unnecessary pskb's in IPv6 related to this
patch to skb's.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-28 17:02:46 -08:00
Denis Vlasenko
f0088a50e7 [NET]: deinline 200+ byte inlines in sock.h
Sizes in bytes (allyesconfig, i386) and files where those inlines
are used:

238 sock_queue_rcv_skb 2.6.16/net/x25/x25_in.o
238 sock_queue_rcv_skb 2.6.16/net/rose/rose_in.o
238 sock_queue_rcv_skb 2.6.16/net/packet/af_packet.o
238 sock_queue_rcv_skb 2.6.16/net/netrom/nr_in.o
238 sock_queue_rcv_skb 2.6.16/net/llc/llc_sap.o
238 sock_queue_rcv_skb 2.6.16/net/llc/llc_conn.o
238 sock_queue_rcv_skb 2.6.16/net/irda/af_irda.o
238 sock_queue_rcv_skb 2.6.16/net/ipx/af_ipx.o
238 sock_queue_rcv_skb 2.6.16/net/ipv6/udp.o
238 sock_queue_rcv_skb 2.6.16/net/ipv6/raw.o
238 sock_queue_rcv_skb 2.6.16/net/ipv4/udp.o
238 sock_queue_rcv_skb 2.6.16/net/ipv4/raw.o
238 sock_queue_rcv_skb 2.6.16/net/ipv4/ipmr.o
238 sock_queue_rcv_skb 2.6.16/net/econet/econet.o
238 sock_queue_rcv_skb 2.6.16/net/econet/af_econet.o
238 sock_queue_rcv_skb 2.6.16/net/bluetooth/sco.o
238 sock_queue_rcv_skb 2.6.16/net/bluetooth/l2cap.o
238 sock_queue_rcv_skb 2.6.16/net/bluetooth/hci_sock.o
238 sock_queue_rcv_skb 2.6.16/net/ax25/ax25_in.o
238 sock_queue_rcv_skb 2.6.16/net/ax25/af_ax25.o
238 sock_queue_rcv_skb 2.6.16/net/appletalk/ddp.o
238 sock_queue_rcv_skb 2.6.16/drivers/net/pppoe.o

276 sk_receive_skb 2.6.16/net/decnet/dn_nsp_in.o
276 sk_receive_skb 2.6.16/net/dccp/ipv6.o
276 sk_receive_skb 2.6.16/net/dccp/ipv4.o
276 sk_receive_skb 2.6.16/net/dccp/dccp_ipv6.o
276 sk_receive_skb 2.6.16/drivers/net/pppoe.o

209 sk_dst_check 2.6.16/net/ipv6/ip6_output.o
209 sk_dst_check 2.6.16/net/ipv4/udp.o
209 sk_dst_check 2.6.16/net/decnet/dn_nsp_out.o

Large inlines with multiple callers:
Size  Uses Wasted Name and definition
===== ==== ====== ================================================
  238   21   4360 sock_queue_rcv_skb    include/net/sock.h
  109   10    801 sock_recv_timestamp   include/net/sock.h
  276    4    768 sk_receive_skb        include/net/sock.h
   94    8    518 __sk_dst_check        include/net/sock.h
  209    3    378 sk_dst_check  include/net/sock.h
  131    4    333 sk_setup_caps include/net/sock.h
  152    2    132 sk_stream_alloc_pskb  include/net/sock.h
  125    2    105 sk_stream_writequeue_purge    include/net/sock.h

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-28 17:02:45 -08:00
Linus Torvalds
ca9ba4471c Merge master.kernel.org:/home/rmk/linux-2.6-arm
* master.kernel.org:/home/rmk/linux-2.6-arm:
  [ARM] 3388/1: ixp23xx: add core ixp23xx support
  [ARM] 3417/1: add support for logicpd pxa270 card engine
  [ARM] 3387/1: ixp23xx: add defconfig
  [ARM] 3377/2: add support for intel xsc3 core
  [ARM] Move ice-dcc code into misc.c
  [ARM] Fix decompressor serial IO to give CRLF not LFCR
  [ARM] proc-v6: mark page table walks outer-cacheable, shared.  Enable NX.
  [ARM] nommu: trivial patch for arch/arm/lib/Makefile
  [ARM] 3416/1: Update LART site URL
  [ARM] 3415/1: Akita: Add missing EXPORT_SYMBOL
  [ARM] 3414/1: ep93xx: reset ethernet controller before uncompressing
2006-03-28 13:53:03 -08:00
Linus Torvalds
d4965b3e2f Merge master.kernel.org:/home/rmk/linux-2.6-serial
* master.kernel.org:/home/rmk/linux-2.6-serial:
  [SERIAL] Provide Cirrus EP93xx AMBA PL010 serial support.
  [SERIAL] amba-pl010: allow platforms to specify modem control method
  [SERIAL] Remove obsoleted au1x00_uart driver
  [SERIAL] Small time UART configuration fix for AU1100 processor
2006-03-28 13:52:37 -08:00
Lennert Buytenhek
c471307437 [ARM] 3388/1: ixp23xx: add core ixp23xx support
Patch from Lennert Buytenhek

This patch adds support for the Intel ixp23xx series of CPUs.  The
ixp23xx is an XSC3 based CPU with 512K of L2 cache, a 64bit 66MHz PCI
interface, two DDR RAM interfaces, QDR RAM interfaces, two gigabit
MACs, two 10/100 MACs, expansion bus, four microengines, a Media and
Switch Fabric unit almost identical to the one on the ixp2400, two
xscale (8250ish) UARTs and a bunch of other stuff.

This patch adds the core ixp23xx support code, and support for the
ADI Engineering Roadrunner, Intel IXDP2351, and IP Fabrics Double
Espresso platforms.

Signed-off-by: Deepak Saxena <dsaxena@plexity.net>
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-03-28 21:18:54 +01:00
Lennert Buytenhek
e9937d4b0a [ARM] 3417/1: add support for logicpd pxa270 card engine
Patch from Lennert Buytenhek

Add support for the LogicPD PXA270 Card Engine.

Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-03-28 21:08:13 +01:00
Lennert Buytenhek
23bdf86aa0 [ARM] 3377/2: add support for intel xsc3 core
Patch from Lennert Buytenhek

This patch adds support for the new XScale v3 core.  This is an
ARMv5 ISA core with the following additions:

- L2 cache
- I/O coherency support (on select chipsets)
- Low-Locality Reference cache attributes (replaces mini-cache)
- Supersections (v6 compatible)
- 36-bit addressing (v6 compatible)
- Single instruction cache line clean/invalidate
- LRU cache replacement (vs round-robin)

I attempted to merge the XSC3 support into proc-xscale.S, but XSC3
cores have separate errata and have to handle things like L2, so it
is simpler to keep it separate.

L2 cache support is currently a build option because the L2 enable
bit must be set before we enable the MMU and there is no easy way to
capture command line parameters at this point.

There are still optimizations that can be done such as using LLR for
copypage (in theory using the exisiting mini-cache code) but those
can be addressed down the road.

Signed-off-by: Deepak Saxena <dsaxena@plexity.net>
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-03-28 21:00:40 +01:00
Linus Torvalds
7baf398f12 Merge branch 'cfq-merge' of git://brick.kernel.dk/data/git/linux-2.6-block
* 'cfq-merge' of git://brick.kernel.dk/data/git/linux-2.6-block:
  [BLOCK] cfq-iosched: seek and async performance fixes
  [PATCH] ll_rw_blk: fix 80-col offender in put_io_context()
  [PATCH] cfq-iosched: small cfq_choose_req() optimization
  [PATCH] [BLOCK] cfq-iosched: change cfq io context linking from list to tree
2006-03-28 09:25:44 -08:00
Linus Torvalds
78cd9e04e0 Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6:
  [SPARC64]: Implement futex_atomic_cmpxchg_inatomic().
2006-03-28 09:25:22 -08:00
Alexey Dobriyan
7f927fcc2f [PATCH] Typo fixes
Fix a lot of typos.  Eyeballed by jmc@ in OpenBSD.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 09:16:08 -08:00
Matthias Gehre
910638ae7e [PATCH] Replace 0xff.. with correct DMA_xBIT_MASK
Replace all occurences of 0xff..  in calls to function pci_set_dma_mask()
and pci_set_consistant_dma_mask() with the corresponding DMA_xBIT_MASK from
linux/dma-mapping.h.

Signed-off-by: Matthias Gehre <M.Gehre@gmx.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 09:16:07 -08:00
Adrian Bunk
f45e4656ac [PATCH] arch/i386/kernel/microcode.c: remove the obsolete microcode_ioctl
Nowadays, even Debian stable ships a microcode_ctl utility recent enough to no
longer use this ioctl.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Tigran Aivazian <tigran_aivazian@symantec.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 09:16:06 -08:00
Arjan van de Ven
4b6f5d20b0 [PATCH] Make most file operations structs in fs/ const
This is a conversion to make the various file_operations structs in fs/
const.  Basically a regexp job, with a few manual fixups

The goal is both to increase correctness (harder to accidentally write to
shared datastructures) and reducing the false sharing of cachelines with
things that get dirty in .data (while .rodata is nicely read only and thus
cache clean)

Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 09:16:06 -08:00
Arjan van de Ven
99ac48f54a [PATCH] mark f_ops const in the inode
Mark the f_ops members of inodes as const, as well as fix the
ripple-through this causes by places that copy this f_ops and then "do
stuff" with it.

Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 09:16:05 -08:00
KAMEZAWA Hiroyuki
0a94502277 [PATCH] for_each_possible_cpu: fixes for generic part
replaces for_each_cpu with for_each_possible_cpu().

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 09:16:05 -08:00
KAMEZAWA Hiroyuki
631d6747e1 [PATCH] for_each_possible_cpu: defines for_each_possible_cpu
for_each_cpu() is a for-loop over cpu_possible_map.  for_each_online_cpu is
for-loop cpu over cpu_online_map.  .....for_each_cpu() is not sufficiently
explicit and can lead to mistakes.

This patch adds for_each_possible_cpu() in preparation for the removal of
for_each_cpu().

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 09:16:04 -08:00
Andi Kleen
70674f95c0 [PATCH] Optimize select/poll by putting small data sets on the stack
Optimize select and poll by a using stack space for small fd sets

This brings back an old optimization from Linux 2.0.  Using the stack is
faster than kmalloc.  On a Intel P4 system it speeds up a select of a
single pty fd by about 13% (~4000 cycles -> ~3500)

It also saves memory because a daemon hanging in select or poll will
usually save one or two less pages.  This can add up - e.g.  if you have 10
daemons blocking in poll/select you save 40KB of memory.

I did a patch for this long ago, but it was never applied.  This version is
a reimplementation of the old patch that tries to be less intrusive.  I
only did the minimal changes needed for the stack allocation.

The cut off point before external memory is allocated is currently at
832bytes.  The system calls always allocate this much memory on the stack.

These 832 bytes are divided into 256 bytes frontend data (for the select
bitmaps of the pollfds) and the rest of the space for the wait queues used
by the low level drivers.  There are some extreme cases where this won't
work out for select and it falls back to allocating memory too early -
especially with very sparse large select bitmaps - but the majority of
processes who only have a small number of file descriptors should be ok.
[TBD: 832/256 might not be the best split for select or poll]

I suspect more optimizations might be possible, but they would be more
complicated.  One way would be to cache the select/poll context over
multiple system calls because typically the input values should be similar.
 Problem is when to flush the file descriptors out though.

Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 09:16:04 -08:00
Alan Cox
d266ab8893 [PATCH] Small fixes backported to old IDE SiS driver
Some quick backport bits from the libata PATA work to fix things found in
the sis driver.  The piix driver needs some fixes too but those are way to
large and need someone working on old IDE with time to do them.

This patch fixes the case where random bits get loaded into SIS timing
registers according to the description of the correct behaviour from
Vojtech Pavlik.  It also adds the SiS5517 ATA16 chipset which is not
currently supported by the driver.  Thanks to Conrad Harriss for loaning me
the machine with the 5517 chipset.

Signed-off-by: Alan Cox <alan@redhat.com>
Acked-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 09:16:04 -08:00
Andrew Morton
e51236092d [PATCH] remove relayfs_fs.h
This is obsolete.

Cc: Tom Zanussi <zanussi@us.ibm.com>
Cc: Jens Axboe <axboe@suse.de>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 09:16:03 -08:00
Adrian Bunk
a28af471b8 [PATCH] fs/fat/: proper prototypes for two functions
Add proper prototypes for fat_cache_init() and fat_cache_destroy() in
msdos_fs.h.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 09:16:03 -08:00
Andrew Morton
7a1e524a5f [PATCH] alpha: make poll flags the same as other architectures
Renumber the recently-added POLLREMOVE and POLLRDHUP to line up with the other
architectures.

Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 09:16:02 -08:00
Brian Rogan
273577165c [PATCH] Add oprofile_add_ext_sample
On ppc64 we look at a profiling register to work out the sample address and
if it was in userspace or kernel.

The backtrace interface oprofile_add_sample does not allow this.  Create
oprofile_add_ext_sample and make oprofile_add_sample use it too.

Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: Philippe Elie <phil.el@wanadoo.fr>
Cc: John Levon <levon@movementarian.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 09:16:02 -08:00
Paul Fulghum
0080b7aae8 [PATCH] synclink_gt add gpio feature
Add driver support for general purpose I/O feature of the Synclink GT
adapters.

Signed-off-by: Paul Fulghum <paulkf@micrgate.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 09:16:02 -08:00
Kyle McMartin
f5f5370da4 [PATCH] Decrapify asm-generic/local.h
Now that Christoph Lameter's atomic_long_t support is merged in mainline,
might as well convert asm-generic/local.h to use it, so the same code can
be used for both sizes of 32 and 64-bit unsigned longs.

akpm sayeth:

Q:

  Is there any particular reason why these routines weren't simply
  implemented with local_save/restore_flags, if they are only meant to
  guarantee atomicity to the local cpu?  I'm sure on most platforms this
  would be more efficient than using an atomic...

A:

  The whole _point_ of local_t is to avoid local_irq_disable().  It's
  designed to exploit the fact that many CPUs can do incs and decs in a way
  which is atomic wrt local interrupts, but not atomic wrt SMP.

But this patch makes sense, because asm-generic/local.h is just a fallback
implementation for architectures which either cannot perform these
local-irq-atomic operations, or its maintainers haven't yet got around to
implementing them.

We need more work done on local_t in the 2.6.17 timeframe - they're defined as
unsigned long, but some architectures implement them as signed long.

Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 09:16:01 -08:00
Matt Mackall
41623b064f [PATCH] RTC: Fix up some RTC whitespace and style
Fix up some RTC whitespace and style

Signed-off-by: Matt Mackall <mpm@selenic.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 09:16:01 -08:00
Matt Mackall
da2468b6a8 [PATCH] RTC: Remove RTC UIP synchronization on MIPS MC146818
Signed-off-by: Matt Mackall <mpm@selenic.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 09:16:00 -08:00
Matt Mackall
63732c2f37 [PATCH] RTC: Remove RTC UIP synchronization on x86
Reading the CMOS clock on x86 and some other arches currently takes up to one
second because it synchronizes with the CMOS second tick-over.  This delay
shows up at boot time as well a resume time.

This is the currently the most substantial boot time delay for machines that
are working towards instant-on capability.  Also, a quick back of the envelope
calculation (.5sec * 2M users * 1 boot a day * 10 years) suggests it has cost
Linux users in the neighborhood of a million man-hours.

An earlier thread on this topic is here:

http://groups.google.com/group/linux.kernel/browse_frm/thread/8a24255215ff6151/2aa97e66a977653d?hl=en&lr=&ie=UTF-8&rnum=1&prev=/groups%3Fhl%3Den%26lr%3D%26ie%3DUTF-8%26selm%3D1To2R-2S7-11%40gated-at.bofh.it#2aa97e66a977653d

..from which the consensus seems to be that it's no longer desirable.

In my view, there are basically four cases to consider:

1) networked, need precise walltime: use NTP
2) networked, don't need precise walltime: use NTP anyway
3) not networked, don't need sub-second precision walltime: don't care
4) not networked, need sub-second precision walltime:
   get a network or a radio time source because RTC isn't good enough anyway

So this patch series simply removes the synchronization in favor of a simple
seqlock-like approach using the seconds value.

Note that for purposes of timer accuracy on wakeup, this patch will cause us
to fire timers up to one second late.  But as the current timer resume code
will already sync once (or more!), it's no worse for short timers.

Signed-off-by: Matt Mackall <mpm@selenic.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 09:16:00 -08:00
Benjamin Herrenschmidt
e8222502ee [PATCH] powerpc: Kill _machine and hard-coded platform numbers
This removes statically assigned platform numbers and reworks the
powerpc platform probe code to use a better mechanism.  With this,
board support files can simply declare a new machine type with a
macro, and implement a probe() function that uses the flattened
device-tree to detect if they apply for a given machine.

We now have a machine_is() macro that replaces the comparisons of
_machine with the various PLATFORM_* constants.  This commit also
changes various drivers to use the new macro instead of looking at
_machine.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-28 23:15:54 +11:00
Jens Axboe
206dc69b31 [BLOCK] cfq-iosched: seek and async performance fixes
Detect whether a given process is seeky and if so disable (mostly) the
idle window if it is. We still allow just a little idle time, just enough
to allow that process to submit a new request. That is needed to maintain
fairness across priority groups.

In some cases, we could setup several async queues. This is not optimal
from a performance POV, since we want all async io in one queue to perform
good sorting on it. It also impacted sync queues, as async io got too much
slice time.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-03-28 13:03:44 +02:00
Andrew Morton
872345b715 [PATCH] git-powerpc: WARN was a dumb idea
There are at least 14 different implementations of WARN() in the tree already.
The build fails all over the place.

Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-28 20:48:54 +11:00
Russell King
a081568d70 [ARM] Fix decompressor serial IO to give CRLF not LFCR
As per the corresponding change to the serial drivers, arrange
for ARM decompressors to give CRLF.  Move the common putstr code
into misc.c such that machines only need to supply "putc" and
"flush" functions.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-03-28 10:24:33 +01:00
David S. Miller
6e57a3a897 [SPARC64]: Implement futex_atomic_cmpxchg_inatomic().
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-28 01:00:08 -08:00
Jens Axboe
e2d74ac066 [PATCH] [BLOCK] cfq-iosched: change cfq io context linking from list to tree
On setups with many disks, we spend a considerable amount of time
looking up the process-disk mapping on each queue of io. Testing with
a NULL based block driver, this costs 40-50% reduction in throughput
for 1000 disks.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-03-28 08:59:01 +02:00
Stephen Rothwell
b239cbe957 [PATCH] powerpc: make ISA floppies work again
We used to assume that a DMA mapping request with a NULL dev was for
ISA DMA.  This assumption was broken at some point.  Now we explicitly
pass the detected ISA PCI device in the floppy setup.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-28 16:45:36 +11:00
Ryan S. Arnold
45d607ed92 [PATCH] powerpc: hvc_console updates
These are some updates from both Ryan and Arnd for the hvc_console
driver:

The main point is to enable the inclusion of a console driver
for rtas, which is currrently needed for the cell platform.

Also shuffle around some data-type declarations and moves some
functions out of include/asm-ppc64/hvconsole.h and into a new
drivers/char/hvc_console.h file.

Signed-off-by: "Ryan S. Arnold" <rsa@us.ibm.com>
Signed-off-by: Arnd Bergmann <abergman@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-28 16:45:26 +11:00
Michael Ellerman
d0160bf0b3 [PATCH] powerpc: Rename and export ppc64_firmware_features
We need to export ppc64_firmware_features for modules. Before we do that
I think we should probably rename it to powerpc_firmware_features.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-28 16:45:20 +11:00
Anton Blanchard
2f25194dbe [PATCH] powerpc: export validate_sp for oprofile calltrace
Export validate_sp so we can use it in the oprofile calltrace code.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-28 16:19:52 +11:00
Anton Blanchard
72533db012 [PATCH] powerpc: Remove some ifdefs in oprofile_impl.h
- No one uses op_counter_config.valid, so remove it
- No need to ifdef around function protypes.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-28 16:19:49 +11:00
Paul Mackerras
0a26b1364f ppc: Remove CHRP, POWER3 and POWER4 support from arch/ppc
32-bit CHRP machines are now supported only in arch/powerpc, as are
all 64-bit PowerPC processors.  This means that we don't use
Open Firmware on any platform in arch/ppc any more.

This makes PReP support a single-platform option like every other
platform support option in arch/ppc now, thus CONFIG_PPC_MULTIPLATFORM
is gone from arch/ppc.  CONFIG_PPC_PREP is the option that selects
PReP support and is generally what has replaced
CONFIG_PPC_MULTIPLATFORM within arch/ppc.

_machine is all but dead now, being #defined to 0.

Updated Makefiles, comments and Kconfig options generally to reflect
these changes.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-28 10:22:10 +11:00
Linus Torvalds
fdccffc6b7 Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
  [NET]: drop duplicate assignment in request_sock
  [IPSEC]: Fix tunnel error handling in ipcomp6
2006-03-27 08:47:29 -08:00
Linus Torvalds
4fa639123d Merge branch 'for-linus' of git://brick.kernel.dk/data/git/linux-2.6-block
* 'for-linus' of git://brick.kernel.dk/data/git/linux-2.6-block:
  [PATCH] Don't make debugfs depend on DEBUG_KERNEL
  [PATCH] Fix blktrace compile with sysfs not defined
  [PATCH] unused label in drivers/block/cciss.
  [BLOCK] increase size of disk stat counters
  [PATCH] blk_execute_rq_nowait-speedup
  [PATCH] ide-cd: quiet down GPCMD_READ_CDVD_CAPACITY failure
  [BLOCK] ll_rw_blk: kmalloc -> kzalloc conversion
  [PATCH] kzalloc() conversion in drivers/block
  [PATCH] update max_sectors documentation
2006-03-27 08:46:49 -08:00
NeilBrown
df5b89b323 [PATCH] md: Convert reconfig_sem to reconfig_mutex
... being careful that mutex_trylock is inverted wrt down_trylock

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:45:03 -08:00
NeilBrown
e464eafdb4 [PATCH] md: Support suspending of IO to regions of an md array
This allows user-space to access data safely.  This is needed for raid5
reshape as user-space needs to take a backup of the first few stripes before
allowing reshape to commence.

It will also be useful in cluster-aware raid1 configurations so that all
cluster members can leave a section of the array untouched while a
resync/recovery happens.

A 'start' and 'end' of the suspended range are written to 2 sysfs attributes.
Note that only one range can be suspended at a time.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:45:02 -08:00
NeilBrown
63c70c4f3a [PATCH] md: Split reshape handler in check_reshape and start_reshape
check_reshape checks validity and does things that can be done instantly -
like adding devices to raid1.  start_reshape initiates a restriping process to
convert the whole array.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:45:02 -08:00
NeilBrown
b578d55fdd [PATCH] md: Only checkpoint expansion progress occasionally
Instead of checkpointing at each stripe, only checkpoint when a new write
would overwrite uncheckpointed data.  Block any write to the uncheckpointed
area.  Arbitrarily checkpoint at least every 3Meg.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:45:02 -08:00
NeilBrown
f67055780c [PATCH] md: Checkpoint and allow restart of raid5 reshape
We allow the superblock to record an 'old' and a 'new' geometry, and a
position where any conversion is up to.  The geometry allows for changing
chunksize, layout and level as well as number of devices.

When using verion-0.90 superblock, we convert the version to 0.91 while the
conversion is happening so that an old kernel will refuse the assemble the
array.  For version-1, we use a feature bit for the same effect.

When starting an array we check for an incomplete reshape and restart the
reshape process if needed.  If the reshape stopped at an awkward time (like
when updating the first stripe) we refuse to assemble the array, and let
user-space worry about it.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:45:01 -08:00
NeilBrown
292695531a [PATCH] md: Final stages of raid5 expand code
This patch adds raid5_reshape and end_reshape which will start and finish the
reshape processes.

raid5_reshape is only enabled in CONFIG_MD_RAID5_RESHAPE is set, to discourage
accidental use.

Read the 'help' for the CONFIG_MD_RAID5_RESHAPE entry.

and Make sure that you have backups, just in case.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:45:01 -08:00
NeilBrown
ccfcc3c10b [PATCH] md: Core of raid5 resize process
This patch provides the core of the resize/expand process.

sync_request notices if a 'reshape' is happening and acts accordingly.

It allocated new stripe_heads for the next chunk-wide-stripe in the target
geometry, marking them STRIPE_EXPANDING.

Then it finds which stripe heads in the old geometry can provide data needed
by these and marks them STRIPE_EXPAND_SOURCE.  This causes stripe_handle to
read all blocks on those stripes.

Once all blocks on a STRIPE_EXPAND_SOURCE stripe_head are read, any that are
needed are copied into the corresponding STRIPE_EXPANDING stripe_head.  Once a
STRIPE_EXPANDING stripe_head is full, it is marks STRIPE_EXPAND_READY and then
is written out and released.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:45:01 -08:00
NeilBrown
7ecaa1e6a1 [PATCH] md: Infrastructure to allow normal IO to continue while array is expanding
We need to allow that different stripes are of different effective sizes, and
use the appropriate size.  Also, when a stripe is being expanded, we must
block any IO attempts until the stripe is stable again.

Key elements in this change are:
 - each stripe_head gets a 'disk' field which is part of the key,
   thus there can sometimes be two stripe heads of the same area of
   the array, but covering different numbers of devices.  One of these
   will be marked STRIPE_EXPANDING and so won't accept new requests.
 - conf->expand_progress tracks how the expansion is progressing and
   is used to determine whether the target part of the array has been
   expanded yet or not.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:45:01 -08:00
NeilBrown
ad01c9e375 [PATCH] md: Allow stripes to be expanded in preparation for expanding an array
Before a RAID-5 can be expanded, we need to be able to expand the stripe-cache
data structure.

This requires allocating new stripes in a new kmem_cache.  If this succeeds,
we copy cache pages over and release the old stripes and kmem_cache.

We then allocate new pages.  If that fails, we leave the stripe cache at it's
new size.  It isn't worth the effort to shrink it back again.

Unfortuanately this means we need two kmem_cache names as we, for a short
period of time, we have two kmem_caches.  So they are raid5/%s and
raid5/%s-alt

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:45:01 -08:00
NeilBrown
b55e6bfcd2 [PATCH] md: Split disks array out of raid5 conf structure so it is easier to grow
The remainder of this batch implements raid5 reshaping.  Currently the only
shape change that is supported is added a device, but it is envisioned that
changing the chunksize and layout will also be supported, as well as changing
the level (e.g.  1->5, 5->6).

The reshape process naturally has to move all of the data in the array, and so
should be used with caution.  It is believed to work, and some testing does
support this, but wider testing would be great for increasing my confidence.

You will need a version of mdadm newer than 2.3.1 to make use of raid5 growth.
 This is because mdadm need to take a copy of a 'critical section' at the
start of the array incase there is a crash at an awkward moment.  On restart,
mdadm will restore the critical section and allow reshape to continue.

I hope to release a 2.4-pre by early next week - it still needs a little more
polishing.

This patch:

Previously the array of disk information was included in the raid5 'conf'
structure which was allocated to an appropriate size.  This makes it awkward
to change the size of that array.  So we split it off into a separate
kmalloced array which will require a little extra indexing, but is much easier
to grow.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:45:01 -08:00
Jun'ichi Nomura
641dc636b0 [PATCH] dm/md dependency tree in sysfs: bd_claim_by_kobject
Adding bd_claim_by_kobject() function which takes kobject as additional
signature of holder device and creates sysfs symlinks between holder device
and claimed device.  bd_release_from_kobject() is a counterpart of
bd_claim_by_kobject.

Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:45:00 -08:00
Andrew Morton
100873687d [PATCH] dm-md-dependency-tree-in-sysfs-holders-slaves-subdirectory-tidy
Remove all the CONFIG_SYSFS stuff.  That's supposed to all be implemented up
in header files.

Yes, the CONFIG_SYSFS=n data structures will be a little larger than
necessary, but that's a tradeoff we can decide to make.

Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:59 -08:00
Jun'ichi Nomura
6a4d44c1f1 [PATCH] dm/md dependency tree in sysfs: holders/slaves subdirectory
Creating "slaves" and "holders" directories in /sys/block/<disk> and
creating "holders" directory under /sys/block/<disk>/<partition>

Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:59 -08:00
Darrick J. Wong
3ac51e741a [PATCH] dm store geometry
Allow drive geometry to be stored with a new DM_DEV_SET_GEOMETRY ioctl.
Device-mapper will now respond to HDIO_GETGEO.  If the geometry information is
not available, zero will be returned for all of the parameters.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:59 -08:00
NeilBrown
969429b504 [PATCH] dm: make sure QUEUE_FLAG_CLUSTER is set properly
This flag should be set for a virtual device iff it is set for all
underlying devices.

Signed-off-by: Neil Brown <neilb@suse.de>
Acked-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:59 -08:00
Pavel Roskin
ed49843b89 [PATCH] Add ID for Quadro NVS280
Quadro NVS280 is a dual-head PCIe card with PCI ID 10de:00fd and subsystem ID
10de:0215.

Signed-off-by: Pavel Roskin <proski@gnu.org>
Signed-off-by: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:57 -08:00
Alessandro Zummo
1d98af8727 [PATCH] RTC subsystem: M48T86 driver
Add a driver for the ST M48T86 / Dallas DS12887 RTC.

This is a platform driver.  The platform device must provide I/O routines to
access the RTC.

Signed-off-by: Alessandro Zummo <a.zummo@towertech.it>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:52 -08:00
Alessandro Zummo
f7f3682fb2 [PATCH] RTC subsystem: I2C driver ids
This patch adds the I2C driver ids to i2c-id.h in preparation of the I2C
direct probing method.

This is kept separate so that it can be integrated to

Signed-off-by: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:51 -08:00
Alessandro Zummo
6fc7f10cee [PATCH] RTC subsystem: I2C cleanup
This patch, completely optional, removes from drivers/i2c/chips all the
drivers that are implemented in the new RTC subsystem.

It should be noted that none of the current driver is actually integrated,
i.e.  usable without further patches.

Signed-off-by: Alessandro Zummo <a.zummo@towertech.it>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:51 -08:00
Alessandro Zummo
0c86edc0d4 [PATCH] RTC subsystem: class
Add the basic RTC subsystem infrastructure to the kernel.

rtc/class.c - registration facilities for RTC drivers
rtc/interface.c - kernel/rtc interface functions
rtc/hctosys.c - snippet of code that copies hw clock to sw clock
		at bootup, if configured to do so.

Signed-off-by: Alessandro Zummo <a.zummo@towertech.it>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:51 -08:00
Alessandro Zummo
12b824fb15 [PATCH] RTC subsystem: ARM cleanup
This patch removes from the ARM subsytem some of the rtc-related functions
that have been included in the RTC subsystem.  It also fixes some naming
collisions.

Signed-off-by: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:50 -08:00
Alessandro Zummo
c58411e95d [PATCH] RTC Subsystem: library functions
RTC and date/time related functions.

Signed-off-by: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:50 -08:00
Yoichi Yuasa
d23ee8fe6e [PATCH] mips: fixed collision of rtc function name
Fix the collision of rtc function name.

Signed-off-by: Yoichi Yuasa <yoichi_yuasa@tripeaks.co.jp>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:50 -08:00
Alan Stern
e041c68341 [PATCH] Notifier chain update: API changes
The kernel's implementation of notifier chains is unsafe.  There is no
protection against entries being added to or removed from a chain while the
chain is in use.  The issues were discussed in this thread:

    http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2

We noticed that notifier chains in the kernel fall into two basic usage
classes:

	"Blocking" chains are always called from a process context
	and the callout routines are allowed to sleep;

	"Atomic" chains can be called from an atomic context and
	the callout routines are not allowed to sleep.

We decided to codify this distinction and make it part of the API.  Therefore
this set of patches introduces three new, parallel APIs: one for blocking
notifiers, one for atomic notifiers, and one for "raw" notifiers (which is
really just the old API under a new name).  New kinds of data structures are
used for the heads of the chains, and new routines are defined for
registration, unregistration, and calling a chain.  The three APIs are
explained in include/linux/notifier.h and their implementation is in
kernel/sys.c.

With atomic and blocking chains, the implementation guarantees that the chain
links will not be corrupted and that chain callers will not get messed up by
entries being added or removed.  For raw chains the implementation provides no
guarantees at all; users of this API must provide their own protections.  (The
idea was that situations may come up where the assumptions of the atomic and
blocking APIs are not appropriate, so it should be possible for users to
handle these things in their own way.)

There are some limitations, which should not be too hard to live with.  For
atomic/blocking chains, registration and unregistration must always be done in
a process context since the chain is protected by a mutex/rwsem.  Also, a
callout routine for a non-raw chain must not try to register or unregister
entries on its own chain.  (This did happen in a couple of places and the code
had to be changed to avoid it.)

Since atomic chains may be called from within an NMI handler, they cannot use
spinlocks for synchronization.  Instead we use RCU.  The overhead falls almost
entirely in the unregister routine, which is okay since unregistration is much
less frequent that calling a chain.

Here is the list of chains that we adjusted and their classifications.  None
of them use the raw API, so for the moment it is only a placeholder.

  ATOMIC CHAINS
  -------------
arch/i386/kernel/traps.c:		i386die_chain
arch/ia64/kernel/traps.c:		ia64die_chain
arch/powerpc/kernel/traps.c:		powerpc_die_chain
arch/sparc64/kernel/traps.c:		sparc64die_chain
arch/x86_64/kernel/traps.c:		die_chain
drivers/char/ipmi/ipmi_si_intf.c:	xaction_notifier_list
kernel/panic.c:				panic_notifier_list
kernel/profile.c:			task_free_notifier
net/bluetooth/hci_core.c:		hci_notifier
net/ipv4/netfilter/ip_conntrack_core.c:	ip_conntrack_chain
net/ipv4/netfilter/ip_conntrack_core.c:	ip_conntrack_expect_chain
net/ipv6/addrconf.c:			inet6addr_chain
net/netfilter/nf_conntrack_core.c:	nf_conntrack_chain
net/netfilter/nf_conntrack_core.c:	nf_conntrack_expect_chain
net/netlink/af_netlink.c:		netlink_chain

  BLOCKING CHAINS
  ---------------
arch/powerpc/platforms/pseries/reconfig.c:	pSeries_reconfig_chain
arch/s390/kernel/process.c:		idle_chain
arch/x86_64/kernel/process.c		idle_notifier
drivers/base/memory.c:			memory_chain
drivers/cpufreq/cpufreq.c		cpufreq_policy_notifier_list
drivers/cpufreq/cpufreq.c		cpufreq_transition_notifier_list
drivers/macintosh/adb.c:		adb_client_list
drivers/macintosh/via-pmu.c		sleep_notifier_list
drivers/macintosh/via-pmu68k.c		sleep_notifier_list
drivers/macintosh/windfarm_core.c	wf_client_list
drivers/usb/core/notify.c		usb_notifier_list
drivers/video/fbmem.c			fb_notifier_list
kernel/cpu.c				cpu_chain
kernel/module.c				module_notify_list
kernel/profile.c			munmap_notifier
kernel/profile.c			task_exit_notifier
kernel/sys.c				reboot_notifier_list
net/core/dev.c				netdev_chain
net/decnet/dn_dev.c:			dnaddr_chain
net/ipv4/devinet.c:			inetaddr_chain

It's possible that some of these classifications are wrong.  If they are,
please let us know or submit a patch to fix them.  Note that any chain that
gets called very frequently should be atomic, because the rwsem read-locking
used for blocking chains is very likely to incur cache misses on SMP systems.
(However, if the chain's callout routines may sleep then the chain cannot be
atomic.)

The patch set was written by Alan Stern and Chandra Seetharaman, incorporating
material written by Keith Owens and suggestions from Paul McKenney and Andrew
Morton.

[jes@sgi.com: restructure the notifier chain initialization macros]
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:50 -08:00
Ingo Molnar
76b81e2b0e [PATCH] lightweight robust futexes updates 2
futex.h updates:

- get rid of FUTEX_OWNER_PENDING - it's not used
- reduce ROBUST_LIST_LIMIT to a saner value

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:50 -08:00
Ingo Molnar
8f17d3a504 [PATCH] lightweight robust futexes updates
- fix: initialize the robust list(s) to NULL in copy_process.

- doc update

- cleanup: rename _inuser to _inatomic

- __user cleanups and other small cleanups

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:49 -08:00
Ingo Molnar
8fdd6c6df7 [PATCH] lightweight robust futexes: x86_64
x86_64: add the futex_atomic_cmpxchg_inuser() assembly implementation, and
wire up the new syscalls.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Acked-by: Ulrich Drepper <drepper@redhat.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:49 -08:00
Ingo Molnar
dfd4e3ec24 [PATCH] lightweight robust futexes: i386
i386: add the futex_atomic_cmpxchg_inuser() assembly implementation, and wire
up the new syscalls.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Acked-by: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:49 -08:00
Ingo Molnar
34f192c652 [PATCH] lightweight robust futexes: compat
32-bit syscall compatibility support.  (This patch also moves all futex
related compat functionality into kernel/futex_compat.c.)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Acked-by: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:49 -08:00
Ingo Molnar
0771dfefc9 [PATCH] lightweight robust futexes: core
Add the core infrastructure for robust futexes: structure definitions, the new
syscalls and the do_exit() based cleanup mechanism.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Acked-by: Ulrich Drepper <drepper@redhat.com>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:49 -08:00
Ingo Molnar
e9056f13bf [PATCH] lightweight robust futexes: arch defaults
This patchset provides a new (written from scratch) implementation of robust
futexes, called "lightweight robust futexes".  We believe this new
implementation is faster and simpler than the vma-based robust futex solutions
presented before, and we'd like this patchset to be adopted in the upstream
kernel.  This is version 1 of the patchset.

  Background
  ----------

What are robust futexes?  To answer that, we first need to understand what
futexes are: normal futexes are special types of locks that in the
noncontended case can be acquired/released from userspace without having to
enter the kernel.

A futex is in essence a user-space address, e.g.  a 32-bit lock variable
field.  If userspace notices contention (the lock is already owned and someone
else wants to grab it too) then the lock is marked with a value that says
"there's a waiter pending", and the sys_futex(FUTEX_WAIT) syscall is used to
wait for the other guy to release it.  The kernel creates a 'futex queue'
internally, so that it can later on match up the waiter with the waker -
without them having to know about each other.  When the owner thread releases
the futex, it notices (via the variable value) that there were waiter(s)
pending, and does the sys_futex(FUTEX_WAKE) syscall to wake them up.  Once all
waiters have taken and released the lock, the futex is again back to
'uncontended' state, and there's no in-kernel state associated with it.  The
kernel completely forgets that there ever was a futex at that address.  This
method makes futexes very lightweight and scalable.

"Robustness" is about dealing with crashes while holding a lock: if a process
exits prematurely while holding a pthread_mutex_t lock that is also shared
with some other process (e.g.  yum segfaults while holding a pthread_mutex_t,
or yum is kill -9-ed), then waiters for that lock need to be notified that the
last owner of the lock exited in some irregular way.

To solve such types of problems, "robust mutex" userspace APIs were created:
pthread_mutex_lock() returns an error value if the owner exits prematurely -
and the new owner can decide whether the data protected by the lock can be
recovered safely.

There is a big conceptual problem with futex based mutexes though: it is the
kernel that destroys the owner task (e.g.  due to a SEGFAULT), but the kernel
cannot help with the cleanup: if there is no 'futex queue' (and in most cases
there is none, futexes being fast lightweight locks) then the kernel has no
information to clean up after the held lock!  Userspace has no chance to clean
up after the lock either - userspace is the one that crashes, so it has no
opportunity to clean up.  Catch-22.

In practice, when e.g.  yum is kill -9-ed (or segfaults), a system reboot is
needed to release that futex based lock.  This is one of the leading
bugreports against yum.

To solve this problem, 'Robust Futex' patches were created and presented on
lkml: the one written by Todd Kneisel and David Singleton is the most advanced
at the moment.  These patches all tried to extend the futex abstraction by
registering futex-based locks in the kernel - and thus give the kernel a
chance to clean up.

E.g.  in David Singleton's robust-futex-6.patch, there are 3 new syscall
variants to sys_futex(): FUTEX_REGISTER, FUTEX_DEREGISTER and FUTEX_RECOVER.
The kernel attaches such robust futexes to vmas (via
vma->vm_file->f_mapping->robust_head), and at do_exit() time, all vmas are
searched to see whether they have a robust_head set.

Lots of work went into the vma-based robust-futex patch, and recently it has
improved significantly, but unfortunately it still has two fundamental
problems left:

 - they have quite complex locking and race scenarios.  The vma-based
   patches had been pending for years, but they are still not completely
   reliable.

 - they have to scan _every_ vma at sys_exit() time, per thread!

The second disadvantage is a real killer: pthread_exit() takes around 1
microsecond on Linux, but with thousands (or tens of thousands) of vmas every
pthread_exit() takes a millisecond or more, also totally destroying the CPU's
L1 and L2 caches!

This is very much noticeable even for normal process sys_exit_group() calls:
the kernel has to do the vma scanning unconditionally!  (this is because the
kernel has no knowledge about how many robust futexes there are to be cleaned
up, because a robust futex might have been registered in another task, and the
futex variable might have been simply mmap()-ed into this process's address
space).

This huge overhead forced the creation of CONFIG_FUTEX_ROBUST, but worse than
that: the overhead makes robust futexes impractical for any type of generic
Linux distribution.

So it became clear to us, something had to be done.  Last week, when Thomas
Gleixner tried to fix up the vma-based robust futex patch in the -rt tree, he
found a handful of new races and we were talking about it and were analyzing
the situation.  At that point a fundamentally different solution occured to
me.  This patchset (written in the past couple of days) implements that new
solution.  Be warned though - the patchset does things we normally dont do in
Linux, so some might find the approach disturbing.  Parental advice
recommended ;-)

  New approach to robust futexes
  ------------------------------

At the heart of this new approach there is a per-thread private list of robust
locks that userspace is holding (maintained by glibc) - which userspace list
is registered with the kernel via a new syscall [this registration happens at
most once per thread lifetime].  At do_exit() time, the kernel checks this
user-space list: are there any robust futex locks to be cleaned up?

In the common case, at do_exit() time, there is no list registered, so the
cost of robust futexes is just a simple current->robust_list != NULL
comparison.  If the thread has registered a list, then normally the list is
empty.  If the thread/process crashed or terminated in some incorrect way then
the list might be non-empty: in this case the kernel carefully walks the list
[not trusting it], and marks all locks that are owned by this thread with the
FUTEX_OWNER_DEAD bit, and wakes up one waiter (if any).

The list is guaranteed to be private and per-thread, so it's lockless.  There
is one race possible though: since adding to and removing from the list is
done after the futex is acquired by glibc, there is a few instructions window
for the thread (or process) to die there, leaving the futex hung.  To protect
against this possibility, userspace (glibc) also maintains a simple per-thread
'list_op_pending' field, to allow the kernel to clean up if the thread dies
after acquiring the lock, but just before it could have added itself to the
list.  Glibc sets this list_op_pending field before it tries to acquire the
futex, and clears it after the list-add (or list-remove) has finished.

That's all that is needed - all the rest of robust-futex cleanup is done in
userspace [just like with the previous patches].

Ulrich Drepper has implemented the necessary glibc support for this new
mechanism, which fully enables robust mutexes.  (Ulrich plans to commit these
changes to glibc-HEAD later today.)

Key differences of this userspace-list based approach, compared to the vma
based method:

 - it's much, much faster: at thread exit time, there's no need to loop
   over every vma (!), which the VM-based method has to do.  Only a very
   simple 'is the list empty' op is done.

 - no VM changes are needed - 'struct address_space' is left alone.

 - no registration of individual locks is needed: robust mutexes dont need
   any extra per-lock syscalls.  Robust mutexes thus become a very lightweight
   primitive - so they dont force the application designer to do a hard choice
   between performance and robustness - robust mutexes are just as fast.

 - no per-lock kernel allocation happens.

 - no resource limits are needed.

 - no kernel-space recovery call (FUTEX_RECOVER) is needed.

 - the implementation and the locking is "obvious", and there are no
   interactions with the VM.

  Performance
  -----------

I have benchmarked the time needed for the kernel to process a list of 1
million (!) held locks, using the new method [on a 2GHz CPU]:

 - with FUTEX_WAIT set [contended mutex]: 130 msecs
 - without FUTEX_WAIT set [uncontended mutex]: 30 msecs

I have also measured an approach where glibc does the lock notification [which
it currently does for !pshared robust mutexes], and that took 256 msecs -
clearly slower, due to the 1 million FUTEX_WAKE syscalls userspace had to do.

(1 million held locks are unheard of - we expect at most a handful of locks to
be held at a time.  Nevertheless it's nice to know that this approach scales
nicely.)

  Implementation details
  ----------------------

The patch adds two new syscalls: one to register the userspace list, and one
to query the registered list pointer:

 asmlinkage long
 sys_set_robust_list(struct robust_list_head __user *head,
                     size_t len);

 asmlinkage long
 sys_get_robust_list(int pid, struct robust_list_head __user **head_ptr,
                     size_t __user *len_ptr);

List registration is very fast: the pointer is simply stored in
current->robust_list.  [Note that in the future, if robust futexes become
widespread, we could extend sys_clone() to register a robust-list head for new
threads, without the need of another syscall.]

So there is virtually zero overhead for tasks not using robust futexes, and
even for robust futex users, there is only one extra syscall per thread
lifetime, and the cleanup operation, if it happens, is fast and
straightforward.  The kernel doesnt have any internal distinction between
robust and normal futexes.

If a futex is found to be held at exit time, the kernel sets the highest bit
of the futex word:

	#define FUTEX_OWNER_DIED        0x40000000

and wakes up the next futex waiter (if any). User-space does the rest of
the cleanup.

Otherwise, robust futexes are acquired by glibc by putting the TID into the
futex field atomically.  Waiters set the FUTEX_WAITERS bit:

	#define FUTEX_WAITERS           0x80000000

and the remaining bits are for the TID.

  Testing, architecture support
  -----------------------------

I've tested the new syscalls on x86 and x86_64, and have made sure the parsing
of the userspace list is robust [ ;-) ] even if the list is deliberately
corrupted.

i386 and x86_64 syscalls are wired up at the moment, and Ulrich has tested the
new glibc code (on x86_64 and i386), and it works for his robust-mutex
testcases.

All other architectures should build just fine too - but they wont have the
new syscalls yet.

Architectures need to implement the new futex_atomic_cmpxchg_inuser() inline
function before writing up the syscalls (that function returns -ENOSYS right
now).

This patch:

Add placeholder futex_atomic_cmpxchg_inuser() implementations to every
architecture that supports futexes.  It returns -ENOSYS.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Acked-by: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:49 -08:00
Ingo Molnar
62ac285f3c [PATCH] mips: add ptr_to_compat()
Add ptr_to_compat() - needed by the new robust futex code.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:49 -08:00
Ingo Molnar
213b63b76a [PATCH] parisc: add ptr_to_compat()
Add ptr_to_compat() to parisc - needed by the new robust futex code.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Grant Grundler <iod00d@hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:48 -08:00
Ingo Molnar
f267fa9f5b [PATCH] s390: add ptr_to_compat()
Add ptr_to_compat() to s390 - needed by the new robust-futex code.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>

untested. CHECKME: am i right about the 0x7fffffffUL masking?

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:48 -08:00
Ingo Molnar
66e863acd7 [PATCH] ia64: add ptr_to_compat()
Add ptr_to_compat() to ia64 - needed by the robust-futex code.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:48 -08:00
Dave Hansen
22a9835c35 [PATCH] unify PFN_* macros
Just about every architecture defines some macros to do operations on pfns.
 They're all virtually identical.  This patch consolidates all of them.

One minor glitch is that at least i386 uses them in a very skeletal header
file.  To keep away from #include dependency hell, I stuck the new
definitions in a new, isolated header.

Of all of the implementations, sh64 is the only one that varied by a bit.
It used some masks to ensure that any sign-extension got ripped away before
the arithmetic is done.  This has been posted to that sh64 maintainers and
the development list.

Compiles on x86, x86_64, ia64 and ppc64.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:48 -08:00
KAMEZAWA Hiroyuki
95144c788d [PATCH] uninline zone helpers
Helper functions for for_each_online_pgdat/for_each_zone look too big to be
inlined.  Speed of these helper macro itself is not very important.  (inner
loops are tend to do more work than this)

This patch make helper function to be out-of-lined.

	inline		out-of-line
.text   005c0680        005bf6a0

005c0680 - 005bf6a0 = FE0 = 4Kbytes.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:48 -08:00
KAMEZAWA Hiroyuki
ae0f15fb91 [PATCH] for_each_online_pgdat: remove pgdat_list
By using for_each_online_pgdat(), pgdat_list is not necessary now.  This patch
removes it.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:48 -08:00
KAMEZAWA Hiroyuki
679bc9fbb5 [PATCH] for_each_online_pgdat: for_each_bootmem
Add a list_head to bootmem_data_t and make bootmems use it.  bootmem list is
sorted by node_boot_start.

Only nodes against which init_bootmem() is called are linked to the list.
(i386 allocates bootmem only from one node(0) not from all online nodes.)

A summary:
 1. for_each_online_pgdat() traverses all *online* nodes.
 2. alloc_bootmem() allocates memory only from initialized-for-bootmem nodes.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:47 -08:00
KAMEZAWA Hiroyuki
8357f8695d [PATCH] define for_each_online_pgdat
This patch defines for_each_online_pgdat() as a replacement of
for_each_pgdat()

Now, online nodes are managed by node_online_map.  But for_each_pgdat()
uses pgdat_link to iterate over all nodes(pgdat).  This means management
structure for online pgdat is duplicated.

I think using node_online_map for for_each_pgdat() is simple and sane
rather ather than pgdat_link.  New macro is named as
for_each_online_pgdat().  Following patch will fix callers of
for_each_pgdat().

The bootmem allocater uses for_each_pgdat() before pgdat initialization.  I
don't think it's sane.  Following patch will fix it.

Signed-off-by: Yasunori Goto     <y-goto@jp.fujitsu.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:47 -08:00
KAMEZAWA Hiroyuki
a0140c1d85 [PATCH] remove zone_mem_map
This patch removes zone_mem_map.

pfn_to_page uses pgdat, page_to_pfn uses zone.  page_to_pfn can use pgdat
instead of zone, which is only one user of zone_mem_map.  By modifing it,
we can remove zone_mem_map.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Christoph Lameter <christoph@lameter.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:47 -08:00
KAMEZAWA Hiroyuki
0ecd702bcb [PATCH] unify pfn_to_page: ia64 pfn_to_page
ia64 has special config CONFIG_VIRTUAL_MEM_MAP.
CONFIG_DISCONTIGMEM=y && CONFIG_VIRTUAL_MEM_MAP!=y is bug ?

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:47 -08:00
KAMEZAWA Hiroyuki
655a044347 [PATCH] unify pfn_to_page: xtensa pfn_to_page
xtensa can use generic funcs.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Chris Zankel <chris@zankel.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:47 -08:00
KAMEZAWA Hiroyuki
e6009f1ba5 [PATCH] unify pfn_to_page: v850 pfn_to_page
v850 can use generic funcs.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:47 -08:00
KAMEZAWA Hiroyuki
9828c185db [PATCH] unify pfn_to_page: uml pfn_to_page
UML can use generic funcs.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:46 -08:00
KAMEZAWA Hiroyuki
064b2a0762 [PATCH] unify pfn_to_page: sparc pfn_to_page
sparc can use generic funcs.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:46 -08:00
KAMEZAWA Hiroyuki
309d34bbe5 [PATCH] unify pfn_to_page: sh64 pfn_to_page
sh64 can use generic funcs.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Richard Curnow <rc@rc0.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:46 -08:00
KAMEZAWA Hiroyuki
104b8deaa5 [PATCH] unify pfn_to_page: sh pfn_to_page
sh can use generic funcs.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:46 -08:00
KAMEZAWA Hiroyuki
aed630434c [PATCH] unify pfn_to_page: s390 pfn_to_page
s390 can use generic funcs.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:46 -08:00
KAMEZAWA Hiroyuki
f68d4c996d [PATCH] unify pfn_to_page: ppc pfn_to_page
PPC can use generic funcs.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:46 -08:00
KAMEZAWA Hiroyuki
0d833b4109 [PATCH] unify pfn_to_page: parisc pfn_to_page
PARISC can use generic funcs.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:46 -08:00
KAMEZAWA Hiroyuki
a02036e796 [PATCH] unify pfn_to_page: mips pfn_to_page
MIPS can use generic funcs.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:45 -08:00
KAMEZAWA Hiroyuki
7126cffe74 [PATCH] unify pfn_to_page: m32r pfn_to_page
m32r can use generic funcs.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hirokazu Takata <takata.hirokazu@renesas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:45 -08:00
KAMEZAWA Hiroyuki
dd6cc7631c [PATCH] unify pfn_to_page: h8300 pfn_to_page
H8300 can use generic funcs.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:45 -08:00
KAMEZAWA Hiroyuki
5cdac7ca1b [PATCH] unify pfn_to_page: FRV pfn_to_page
FRV can use generic funcs.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:45 -08:00
KAMEZAWA Hiroyuki
bb872f787a [PATCH] unify pfn_to_page: cris pfn_to_page
cris can use generic funcs.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mikael Starvik <starvik@axis.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:45 -08:00
KAMEZAWA Hiroyuki
89fccaf2c4 [PATCH] unify pfn_to_page: arm26 pfn_to_page
arm26 can use generic funcs.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Ian Molton <spyro@f2s.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:44 -08:00
KAMEZAWA Hiroyuki
7eb98a2f3b [PATCH] unify pfn_to_page: arm pfn_to_page
ARM can use generic funcs.
PFN_TO_NID, LOCAL_MAP_NR are defined by sub-archs.

Signed-off-by: KAMEZAWA Hirotuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:44 -08:00
KAMEZAWA Hiroyuki
1c05dda2b6 [PATCH] unify pfn_to_page: alpha pfn_to_page
Alpha can use generic funcs.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:44 -08:00
KAMEZAWA Hiroyuki
659e35051b [PATCH] unify pfn_to_page: powerpc pfn_to_page
PowerPC can use generic ones.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:44 -08:00
KAMEZAWA Hiroyuki
dc8ecb4370 [PATCH] unify pfn_to_page: x86_64 pfn_to_page
x86_64 can use generic funcs.
For DISCONTIGMEM, CONFIG_OUT_OF_LINE_PFN_TO_PAGE is selected.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:44 -08:00
KAMEZAWA Hiroyuki
ad658b385e [PATCH] unify pfn_to_page: i386 pfn_to_page
i386 can use generic funcs.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:44 -08:00
KAMEZAWA Hiroyuki
a117e66ed4 [PATCH] unify pfn_to_page: generic functions
There are 3 memory models, FLATMEM, DISCONTIGMEM, SPARSEMEM.
Each arch has its own page_to_pfn(), pfn_to_page() for each models.
But most of them can use the same arithmetic.

This patch adds asm-generic/memory_model.h, which includes generic
page_to_pfn(), pfn_to_page() definitions for each memory model.

When CONFIG_OUT_OF_LINE_PFN_TO_PAGE=y, out-of-line functions are
used instead of macro. This is enabled by some archs and  reduces
text size.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Hirokazu Takata <takata.hirokazu@renesas.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Chris Zankel <chris@zankel.net>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:44 -08:00
Siddha, Suresh B
1e9f28fa1e [PATCH] sched: new sched domain for representing multi-core
Add a new sched domain for representing multi-core with shared caches
between cores.  Consider a dual package system, each package containing two
cores and with last level cache shared between cores with in a package.  If
there are two runnable processes, with this appended patch those two
processes will be scheduled on different packages.

On such systems, with this patch we have observed 8% perf improvement with
specJBB(2 warehouse) benchmark and 35% improvement with CFP2000 rate(with 2
users).

This new domain will come into play only on multi-core systems with shared
caches.  On other systems, this sched domain will be removed by domain
degeneration code.  This new domain can be also used for implementing power
savings policy (see OLS 2005 CMP kernel scheduler paper for more details..
I will post another patch for power savings policy soon)

Most of the arch/* file changes are for cpu_coregroup_map() implementation.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:43 -08:00
Adrian Bunk
74cae61ab4 [PATCH] fs/nfsd/export.c,net/sunrpc/cache.c: make needlessly global code static
We can now make some code static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Neil Brown <neilb@suse.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:43 -08:00
NeilBrown
baab935ff3 [PATCH] knfsd: Convert sunrpc_cache to use krefs
.. it makes some of the code nicer.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:43 -08:00
NeilBrown
ebd0cb1af3 [PATCH] knfsd: Unexport cache_fresh and fix a small race
Cache_fresh is now only used in cache.c, so unexport it.

Part of cache_fresh (setting CACHE_VALID) should really be done under the
lock, while part (calling cache_revisit_request etc) must be done outside the
lock.  So we split it up appropriately.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:43 -08:00
NeilBrown
4d90452cb2 [PATCH] knfsd: Remove DefineCacheLookup
This has been replaced by more traditional code.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:42 -08:00
NeilBrown
15a5f6bd23 [PATCH] knfsd: Create cache_lookup function instead of using a macro to declare one
The C++-like 'template' approach proves to be too ugly and hard to work with.

The old 'template' won't go away until all users are updated.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:41 -08:00
NeilBrown
7d317f2c9f [PATCH] knfsd: Get rid of 'inplace' sunrpc caches
These were an unnecessary wart.  Also only have one 'DefineSimpleCache..'
instead of two.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:41 -08:00
NeilBrown
eab7e2e647 [PATCH] knfsd: Break the hard linkage from svc_expkey to svc_export
Current svc_expkey holds a pointer to the svc_export structure, so updates to
that structure have to be in-place, which is a wart on the whole cache
infrastruct.  So we break that linkage and just do a second lookup.

If this became a performance issue, it would be possible to put a direct link
back in which was only used conditionally.  i.e.  when an object is replaced
in the cache, we set a flag in the old object.  When dereferencing the link
from svc_expkey, if the flag is set, we drop the reference and do a fresh
lookup.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:41 -08:00
NeilBrown
efc36aa560 [PATCH] knfsd: Change the store of auth_domains to not be a 'cache'
The 'auth_domain's are simply handles on internal data structures.  They do
not cache information from user-space, and forcing them into the mold of a
'cache' misrepresents their true nature and causes confusion.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:41 -08:00
Ian Kent
5c0a32fc2c [PATCH] autofs4: add new packet type for v5 communications
This patch define a new autofs packet for autofs v5 and updates the waitq.c
functions to handle the additional packet type.

Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:40 -08:00
Ian Kent
f75ba3ade8 [PATCH] autofs4: increase module version
Update autofs4 version.

Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:40 -08:00
Jeff Dike
bb83da0533 [PATCH] uml: fix build warnings in __get_user
Fix a gcc warning about losing qualifiers to the first argument of
copy_from_user.  The typeof change for correctness, and fixes a lot of the
warnings, but there are some cases where x has some extra qualifiers, like
volatile, which copy_from_user can't know about.  For these, the void * cast
seems to be necessary.

Also cleaned up some of the whitespace and got rid of the emacs comment at the
bottom.

Signed-off-by: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:37 -08:00
Paul Mackerras
9618edab82 powerpc: Fix event-scan code for 32-bit CHRP
On CHRP machines we are supposed to call into firmware (RTAS)
periodically, to give it a chance to check for errors and other
events.  Under ppc we had some special code in timer_interrupt
to do this, but that didn't get transferred over to arch/powerpc.
Instead, we use an array of timer_list structs, one per CPU,
and use add_timer_on to make sure each one gets called on the
appropriate CPU.

With this we can remove the heartbeat_* elements of the ppc_md
struct.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-27 21:48:57 +11:00
Ben Woodard
837c787877 [BLOCK] increase size of disk stat counters
The kernel's representation of the disk statistics uses the type unsigned
which is 32b on both 32b and 64b platforms.  Unfortunately, most system
tools that work with these numbers that are exported in /proc/diskstats
including iostat read these numbers into unsigned longs.  This works fine
on 32b platforms and when the number of IO transactions are small on 64b
platforms.  However, when the numbers wrap on 64b platforms & you read the
numbers into unsigned longs, and compare the numbers to previous readings,
then you get an unsigned representation of a negative number.  This looks
like a very large 64b number & gives you bizarre readouts in iostat:

ilc4: Device:    rrqm/s wrqm/s r/s    w/s  rsec/s  wsec/s    rkB/s wkB/s avgrq-sz avgqu-sz   await  svctm  %util
ilc4: sda        5.50   0.00   143.96 0.00 307496983987862656.00 0.00 153748491993931328.00     0.00 2136028725038430.00     7.94   55.12    5.59  80.42

Though fixing iostat in user space is possible, and a quick survey
indicates that several other similar tools also use unsigned longs when
processing /proc/diskstats.  Therefore, it seems like a better approach
would be to extend the length of the disk_stats structure on 64b
architectures to 64b.  The following patch does that.  It should not affect
the operation on 32b platforms.

Signed-off-by: Ben Woodard <woodard@redhat.com>
Cc: Rick Lindsley <ricklind@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-03-27 09:29:02 +02:00
Paul Mackerras
fbd7740fdf powerpc: Simplify pSeries idle loop
Since pSeries only wants to do something different in the idle loop when
there is no work to do, we can simplify the code by implementing
ppc_md.power_save functions instead of complete idle loops.  There are
two versions: one for shared-processor partitions and one for dedicated-
processor partitions.

With this we also do a cede_processor() call on dedicated processor
partitions if the poll_pending() call indicates that the hypervisor
has work it wants to do.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-27 15:06:20 +11:00
Paul Mackerras
a0652fc9a2 powerpc: Unify the 32 and 64 bit idle loops
This unifies the 32-bit (ARCH=ppc and ARCH=powerpc) and 64-bit idle
loops.  It brings over the concept of having a ppc_md.power_save
function from 32-bit to ARCH=powerpc, which lets us get rid of
native_idle().  With this we will also be able to simplify the idle
handling for pSeries and cell.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-27 15:03:03 +11:00
Sylvain Munaut
1f5e3b028c [PATCH] ppc32: Reorganize and complete MPC52xx initial cpu setup
ppc32: Reorganize and complete MPC52xx initial cpu setup

This patch splits up the CPU setup into a generic part and a
platform specific part. We also add a few missing init at the
same time.

Signed-off-by: Sylvain Munaut <tnt@246tNt.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-27 14:49:00 +11:00
Sylvain Munaut
5931c43500 [PATCH] ppc32: Adds support for the PCI hostbridge in MPC5200B
ppc32: Adds support for the PCI hostbridge in MPC5200B

Signed-off-by: John Rigby <jrigby@freescale.com>
Signed-off-by: Sylvain Munaut <tnt@246tNt.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-27 14:48:56 +11:00
Anton Blanchard
4df20460a3 [PATCH] powerpc: Allow non zero boot cpuids
We currently have a hack to flip the boot cpu and its secondary thread
to logical cpuid 0 and 1. This means the logical - physical mapping will
differ depending on which cpu is boot cpu. This is most apparent on
kexec, where we might kexec on any cpu and therefore change the mapping
from boot to boot.

The patch below does a first pass early on to work out the logical cpuid
of the boot thread. We then fix up some paca structures to match.

Ive also removed the boot_cpuid_phys variable for ppc64, to be
consistent we use get_hard_smp_processor_id(boot_cpuid) everywhere.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-27 14:48:48 +11:00
Mark Nutter
6df10a82f8 [PATCH] spufs: enable SPE problem state MMIO access.
This patch is layered on top of CONFIG_SPARSEMEM
and is patterned after direct mapping of LS.

This patch allows mmap() of the following regions:
"mfc", which represents the area from [0x3000 - 0x3fff];
"cntl", which represents the area from [0x4000 - 0x4fff];
"signal1" which begins at offset 0x14000; "signal2" which
begins at offset 0x1c000.

The signal1 & signal2 files may be mmap()'d by regular user
processes.  The cntl and mfc file, on the other hand, may
only be accessed if the owning process has CAP_SYS_RAWIO,
because they have the potential to confuse the kernel
with regard to parallel access to the same files with
regular file operations: the kernel always holds a spinlock
when accessing registers in these areas to serialize them,
which can not be guaranteed with user mmaps,

Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-27 14:48:28 +11:00
Arnd Bergmann
a33a7d7309 [PATCH] spufs: implement mfc access for PPE-side DMA
This patch adds a new file called 'mfc' to each spufs directory.
The file accepts DMA commands that are a subset of what would
be legal DMA commands for problem state register access. Upon
reading the file, a bitmask is returned with the completed
tag groups set.

The file is meant to be used from an abstraction in libspe
that is added by a different patch.

From the kernel perspective, this means a process can now
offload a memory copy from or into an SPE local store
without having to run code on the SPE itself.

The transfer will only be performed while the SPE is owned
by one thread that is waiting in the spu_run system call
and the data will be transferred into that thread's
address space, independent of which thread started the
transfer.

Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-27 14:48:26 +11:00
Arnd Bergmann
2dd14934c9 [PATCH] spufs: allow SPU code to do syscalls
An SPU does not have a way to implement system calls
itself, but it can create intercepts to the kernel.

This patch uses the method defined by the JSRE interface
for C99 host library calls from an SPU to implement
Linux system calls. It uses the reserved SPU stop code
0x2104 for this, using the structure layout and syscall
numbers for ppc64-linux.

I'm still undecided wether it is better to have a list
of allowed syscalls or a list of forbidden syscalls,
since we can't allow an SPU to call all syscalls that
are defined for ppc64-linux.

This patch implements the easier choice of them, with a
blacklist that only prevents an SPU from calling anything
that interacts with its own execution, e.g fork, execve,
clone, vfork, exit, spu_run and spu_create and everything
that deals with signals.

Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-27 14:48:24 +11:00
Arnd Bergmann
a7f31841a4 [PATCH] powerpc: declare arch syscalls in <asm/syscalls.h>
powerpc currently declares some of its own system calls
in <asm/unistd.h>, but not all of them. That place also
contains remainders of the now almost unused kernel syscall
hack.

 - Add a new <asm/syscalls.h> with clean declarations
 - Include that file from every source that implements one
   of these
 - Get rid of old declarations in <asm/unistd.h>

This patch is required as a base for implementing system
calls from an SPU, but also makes sense as a general
cleanup.

Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-27 14:48:22 +11:00
Michael Ellerman
dd4d7bfad6 [PATCH] powerpc: Change firmware_has_feature() to a macro
So that we can use firmware_has_feature() in a BUG_ON() and have the compiler
elide the code entirely if the feature can never be set, change
firmware_has_feature to a macro. Unfortunate, but necessary at least until
GCC bug #26724 is fixed.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-27 14:48:12 +11:00
Michael Ellerman
e3f94b85f9 [PATCH] powerpc: Make BUG_ON & WARN_ON play nice with compile-time optimisations
Change BUG_ON and WARN_ON to give the compiler a chance to perform
compile-time optimsations. Depending on the complexity of the condition,
the compiler may not do this very well, so if it's important check the
object code.

Current GCC's (4.x) produce good code as long as the condition does not
include a function call, including a static inline.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-27 14:48:10 +11:00
Stephen Rothwell
7c92943c7b [PATCH] powerpc: work around sparse warnings in cputable.h
Christoph noticed that sparse warned about all the enum tags in cuptable.h
that had values that required them to be type log. (enum tags are ints
according to the standard.)

This patch attempts to fix them in the least intrusive way possible by
turning them all into #defines except for the 32 bit CPU_FTRS_POSSIBLE and
CPU_FTRS_ALWAYS which are hard to construct that way.  This works because
these last two contain no bits above 2^31.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-03-27 14:48:06 +11:00
Norbert Kiesel
3eb4801d7b [NET]: drop duplicate assignment in request_sock
Just noticed that request_sock.[ch] contain a useless assignment of
rskq_accept_head to itself.  I assume this is a typo and the 2nd one
was supposed to be _tail.  However, setting _tail to NULL is not
needed, so the patch below just drops the 2nd assignment.

Signed-off-By: Norbert Kiesel <nkiesel@tbdnetworks.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-26 17:39:55 -08:00
Russell King
fbb18a277a [SERIAL] amba-pl010: allow platforms to specify modem control method
The amba-pl010 hardware does not provide RTS and DTR control lines; it
is expected that these will be implemented using GPIO.  Allow platforms
to supply a function to implement manipulation of modem control lines.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-03-26 23:13:39 +01:00
Lennert Buytenhek
c9b4470e2d [ARM] 3414/1: ep93xx: reset ethernet controller before uncompressing
Patch from Lennert Buytenhek

The Redboot version that cirrus supplies for the cirrus ep93xx doesn't
turn off DMA from the ethernet MAC before jumping to linux, which means
that we might end up with bits of RX status and packet data scribbled
over the uncompressed kernel image.

Work around this by resetting the ethernet MAC before we uncompress.

We don't usually work around bootloader bugs, but considering that the
large majority of ep93xx boards out there have this problem, I figured
this it was justified in this case.

Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-03-26 21:40:27 +01:00
Ralf Baechle
335bd9dff3 [SERIAL] Remove obsoleted au1x00_uart driver
As announced in feature-removal-schedule.txt.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-03-26 21:25:57 +01:00
Akinobu Mita
e9bebd6f3a [PATCH] bitops: remove unused generic bitops in include/linux/bitops.h
generic_{ffs,fls,fls64,hweight{64,32,16,8}}() were moved into
include/asm-generic/bitops.h.  So all architectures don't use them.

Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:15 -08:00
Akinobu Mita
6ecf66ae49 [PATCH] bitops: sh: make thread_info.flags an unsigned long
The test_bit() routines are defined to work on a pointer to unsigned long.
But thread_info.flags is __u32 (unsigned int) on sh and it is passed to flag
set/clear/test wrappers in include/linux/thread_info.h.  So the compiler will
print warnings.

This patch changes to unsigned long instead.

Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:15 -08:00
Akinobu Mita
f51a05c16d [PATCH] bitops: update include/asm-generic/bitops.h
Currently include/asm-generic/bitops.h is not referenced from anywhere.  But
it will be the benefit of those who are trying to port Linux to another
architecture.

So update it by same manner

Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:15 -08:00
Akinobu Mita
d4337aa528 [PATCH] bitops: xtensa: use generic bitops
- remove {,test_and_}{set,clear,change}_bit()
- remove __{,test_and_}{set,clear,change}_bit() and test_bit()
- remove generic_fls64()
- remove find_{next,first}{,_zero}_bit()
- remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit()
- remove generic_hweight{32,16,8}()
- remove sched_find_first_bit()
- remove minix_{test,set,test_and_clear,test,find_first_zero}_bit()

Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Cc: Chris Zankel <chris@zankel.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:15 -08:00
Akinobu Mita
f33e2fbacc [PATCH] bitops: x86_64: use generic bitops
- remove sched_find_first_bit()
- remove generic_hweight{64,32,16,8}()
- remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit()
- remove minix_{test,set,test_and_clear,test,find_first_zero}_bit()

Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:14 -08:00
Akinobu Mita
a58259cddf [PATCH] bitops: v850: use generic bitops
- remove ffz()
- remove find_{next,first}{,_zero}_bit()
- remove generic_ffs()
- remove generic_fls()
- remove generic_fls64()
- remove __ffs()
- remove sched_find_first_bit()
- remove generic_hweight{32,16,8}()
- remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit()
- remove minix_{test,set,test_and_clear,test,find_first_zero}_bit()

Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:14 -08:00
Akinobu Mita
2d78d4beb6 [PATCH] bitops: sparc64: use generic bitops
- remove __{,test_and_}{set,clear,change}_bit() and test_bit()
- remove ffz()
- remove __ffs()
- remove generic_fls()
- remove generic_fls64()
- remove sched_find_first_bit()
- remove ffs()

- unless defined(ULTRA_HAS_POPULATION_COUNT)

  - remove generic_hweight{64,32,16,8}()

- remove find_{next,first}{,_zero}_bit()
- remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit()
- remove minix_{test,set,test_and_clear,test,find_first_zero}_bit()

Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:14 -08:00
Akinobu Mita
d59288b757 [PATCH] bitops: sparc: use generic bitops
- remove __{,test_and_}{set,clear,change}_bit() and test_bit()
- remove ffz()
- remove __ffs()
- remove sched_find_first_bit()
- remove ffs()
- remove generic_fls()
- remove generic_fls64()
- remove generic_hweight{32,16,8}()
- remove find_{next,first}{,_zero}_bit()
- remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit()
- remove ext2_{set,clear}_bit_atomic()
- remove minix_{test,set,test_and_clear,test,find_first_zero}_bit()

Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:14 -08:00
Akinobu Mita
62f1b2465b [PATCH] bitops: sh64: use generic bitops
- remove __{,test_and_}{set,clear,change}_bit() and test_bit()
- remove __ffs()
- remove find_{next,first}{,_zero}_bit()
- remove generic_hweight{32,16,8}()
- remove sched_find_first_bit()
- remove generic_ffs()
- remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit()
- remove ext2_{set,clear}_bit_atomic()
- remove minix_{test,set,test_and_clear,test,find_first_zero}_bit()
- remove generic_fls()
- remove generic_fls64()

Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Richard Curnow <rc@rc0.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:14 -08:00
Akinobu Mita
e2268c7129 [PATCH] bitops: sh: use generic bitops
- remove __{,test_and_}{set,clear,change}_bit() and test_bit()
- remove find_{next,first}{,_zero}_bit()
- remove generic_ffs()
- remove generic_hweight{32,16,8}()
- remove sched_find_first_bit()
- remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit()
- remove ext2_{set,clear}_bit_atomic()
- remove minix_{test,set,test_and_clear,test,find_first_zero}_bit()
- remove generic_fls()
- remove generic_fls64()

Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:14 -08:00
Akinobu Mita
7e33db4e2e [PATCH] bitops: s390: use generic bitops
- remove generic_ffs()
- remove generic_fls()
- remove generic_fls64()
- remove generic_hweight{64,32,16,8}()
- remove minix_{test,set,test_and_clear,test,find_first_zero}_bit()

Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:14 -08:00
Akinobu Mita
e779b2f95f [PATCH] bitops: powerpc: use generic bitops
- remove __{,test_and_}{set,clear,change}_bit() and test_bit()
- remove generic_fls64()
- remove generic_hweight{64,32,16,8}()
- remove sched_find_first_bit()

Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:14 -08:00
Akinobu Mita
59e18a2e1c [PATCH] bitops: parisc: use generic bitops
- remove __{,test_and_}{set,clear,change}_bit() and test_bit()
- remove ffz()
- remove generic_fls64()
- remove generic_hweight{32,16,8}()
- remove generic_hweight64()
- remove sched_find_first_bit()
- remove find_{next,first}{,_zero}_bit()
- remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit()

Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:14 -08:00
Akinobu Mita
3c9ee7ef87 [PATCH] bitops: mips: use generic bitops
- remove __{,test_and_}{set,clear,change}_bit() and test_bit()

- unless defined(CONFIG_CPU_MIPS32) or defined(CONFIG_CPU_MIPS64)

  - remove __ffs()
  - remove ffs()
  - remove ffz()
  - remove fls()

- remove fls64()
- remove find_{next,first}{,_zero}_bit()
- remove sched_find_first_bit()
- remove generic_hweight64()
- remove generic_hweight{32,16,8}()
- remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit()
- remove ext2_{set,clear}_bit_atomic()
- remove minix_{test,set,test_and_clear,test,find_first_zero}_bit()

Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:13 -08:00
Akinobu Mita
d2d7cdcf6e [PATCH] bitops: m68knommu: use generic bitops
- remove ffs()
- remove __ffs()
- remove sched_find_first_bit()
- remove ffz()
- remove find_{next,first}{,_zero}_bit()
- remove generic_hweight()
- remove minix_{test,set,test_and_clear,test,find_first_zero}_bit()
- remove generic_fls()
- remove generic_fls64()

Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Cc: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:13 -08:00
Akinobu Mita
d5728b45da [PATCH] m68k: fix undefined reference to generic_find_next_zero_le_bit
This patch reverts ext2 bitmap functions.

Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:13 -08:00
Akinobu Mita
ba1a5b32ba [PATCH] bitops: m68k: use generic bitops
- remove generic_fls64()
- remove sched_find_first_bit()
- remove generic_hweight()
- remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit()

Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:12 -08:00
Akinobu Mita
6d9f937b55 [PATCH] bitops: m32r: use generic bitops
- remove __{,test_and_}{set,clear,change}_bit() and test_bit()
- remove ffz()
- remove find_{next,first}{,_zero}_bit()
- remove __ffs()
- remove generic_fls()
- remove generic_fls64()
- remove sched_find_first_bit()
- remove generic_ffs()
- remove generic_hweight{32,16,8}()
- remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit()
- remove ext2_{set,clear}_bit_atomic()
- remove minix_{test,set,test_and_clear,test,find_first_zero}_bit()

Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:12 -08:00
Akinobu Mita
2875aef8bd [PATCH] bitops: ia64: use generic bitops
- remove generic_fls64()
- remove find_{next,first}{,_zero}_bit()
- remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit()
- remove minix_{test,set,test_and_clear,test,find_first_zero}_bit()
- remove sched_find_first_bit()

Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:12 -08:00
Akinobu Mita
1cc2b9943b [PATCH] bitops: i386: use generic bitops
- remove generic_fls64()
- remove sched_find_first_bit()
- remove generic_hweight{32,16,8}()
- remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit()
- remove minix_{test,set,test_and_clear,test,find_first_zero}_bit()

Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:12 -08:00
Akinobu Mita
f6e0213f7c [PATCH] bitops: h8300: use generic bitops
- remove generic_ffs()
- remove find_{next,first}{,_zero}_bit()
- remove sched_find_first_bit()
- remove generic_hweight{32,16,8}()
- remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit()
- remove ext2_{set,clear}_bit_atomic()
- remove minix_{test,set,test_and_clear,test,find_first_zero}_bit()
- remove generic_fls()
- remove generic_fls64()

Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:12 -08:00
Akinobu Mita
1f6d7a93c1 [PATCH] bitops: frv: use generic bitops
- remove ffz()
- remove find_{next,first}{,_zero}_bit()
- remove generic_ffs()
- remove __ffs()
- remove generic_fls64()
- remove sched_find_first_bit()
- remove generic_hweight{32,16,8}()
- remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit()
- remove minix_{test,set,test_and_clear,test,find_first_zero}_bit()

Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:12 -08:00
Akinobu Mita
e9f26df172 [PATCH] bitops: cris: use generic bitops
- remove __{,test_and_}{set,clear,change}_bit() and test_bit()
- remove generic_fls()
- remove generic_fls64()
- remove generic_hweight{32,16,8}()
- remove find_{next,first}{,_zero}_bit()
- remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit()
- remove minix_{test,set,test_and_clear,test,find_first_zero}_bit()
- remove sched_find_first_bit()

Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Acked-by: Mikael Starvik <starvik@axis.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:12 -08:00
Akinobu Mita
d142d86019 [PATCH] bitops: arm26: use generic bitops
- remove __{,test_and_}{set,clear,change}_bit() and test_bit()
- remove ffz()
- remove __ffs()
- remove generic_fls()
- remove generic_fls64()
- remove generic_ffs()
- remove sched_find_first_bit()
- remove generic_hweight{32,16,8}()

Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Cc: Ian Molton <spyro@f2s.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:12 -08:00
Akinobu Mita
b89c3b165f [PATCH] bitops: arm: use generic bitops
- remove __{,test_and_}{set,clear,change}_bit() and test_bit()

- if __LINUX_ARM_ARCH__ < 5

  - remove ffz()
  - remove __ffs()
  - remove generic_fls()
  - remove generic_ffs()

- remove generic_fls64()
- remove sched_find_first_bit()
- remove generic_hweight{32,16,8}()

Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:12 -08:00
Akinobu Mita
f7c2967873 [PATCH] bitops: alpha: use generic bitops
- unless defined(__alpha_cix__) and defined(__alpha_fix__)

  - remove generic_fls()
  - remove generic_hweight{64,32,16,8}()

- remove generic_fls64()
- remove find_{next,first}{,_zero}_bit()
- remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit()
- remove minix_{test,set,test_and_clear,test,find_first_zero}_bit()

Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:11 -08:00
Akinobu Mita
b1bb9522da [PATCH] bitops: generic minix_{test,set,test_and_clear,test,find_first_zero}_bit()
This patch introduces the C-language equivalents of the functions below:

int minix_test_and_set_bit(int nr, volatile unsigned long *addr);
int minix_set_bit(int nr, volatile unsigned long *addr);
int minix_test_and_clear_bit(int nr, volatile unsigned long *addr);
int minix_test_bit(int nr, const volatile unsigned long *addr);
unsigned long minix_find_first_zero_bit(const unsigned long *addr,
                                        unsigned long size);

In include/asm-generic/bitops/minix.h
   and include/asm-generic/bitops/minix-le.h

This code largely copied from: include/asm-sparc/bitops.h

Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:11 -08:00
Akinobu Mita
765f34fe32 [PATCH] bitops: generic ext2_{set,clear}_bit_atomic()
This patch introduces the C-language equivalents of the functions below:

int ext2_set_bit_atomic(int nr, volatile unsigned long *addr);
int ext2_clear_bit_atomic(int nr, volatile unsigned long *addr);

In include/asm-generic/bitops/ext2-atomic.h

This code largely copied from: include/asm-sparc/bitops.h

Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:11 -08:00
Akinobu Mita
930ae745f5 [PATCH] bitops: generic ext2_{set,clear,test,find_first_zero,find_next_zero}_bit()
This patch introduces the C-language equivalents of the functions below:

int ext2_set_bit(int nr, volatile unsigned long *addr);
int ext2_clear_bit(int nr, volatile unsigned long *addr);
int ext2_test_bit(int nr, const volatile unsigned long *addr);
unsigned long ext2_find_first_zero_bit(const unsigned long *addr,
                                       unsigned long size);
unsinged long ext2_find_next_zero_bit(const unsigned long *addr,
                                      unsigned long size);

In include/asm-generic/bitops/ext2-non-atomic.h

This code largely copied from:

include/asm-powerpc/bitops.h
include/asm-parisc/bitops.h

Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:11 -08:00
Akinobu Mita
a54baa1487 [PATCH] fix error: __u32 undeclared
Build fix for s390 declare __u32 and __u64.

Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:11 -08:00
Akinobu Mita
3b9ed1a5d2 [PATCH] bitops: generic hweight{64,32,16,8}()
This patch introduces the C-language equivalents of the functions below:

unsigned int hweight32(unsigned int w);
unsigned int hweight16(unsigned int w);
unsigned int hweight8(unsigned int w);
unsigned long hweight64(__u64 w);

In include/asm-generic/bitops/hweight.h

This code largely copied from: include/linux/bitops.h

Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:11 -08:00
Akinobu Mita
09020adb61 [PATCH] bitops: generic ffs()
This patch introduces the C-language equivalent of the function: int ffs(int
x);

In include/asm-generic/bitops/ffs.h

This code largely copied from: include/linux/bitops.h

Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:11 -08:00
Akinobu Mita
6d29ea23da [PATCH] bitops: generic sched_find_first_bit()
This patch introduces the C-language equivalent of the function: int
sched_find_first_bit(const unsigned long *b);

In include/asm-generic/bitops/sched.h

This code largely copied from: include/asm-powerpc/bitops.h

Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:11 -08:00
Akinobu Mita
c7f612cdf0 [PATCH] bitops: generic find_{next,first}{,_zero}_bit()
This patch introduces the C-language equivalents of the functions below:

unsigned logn find_next_bit(const unsigned long *addr, unsigned long size,
                            unsigned long offset);
unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
                                 unsigned long offset);
unsigned long find_first_zero_bit(const unsigned long *addr,
                                  unsigned long size);
unsigned long find_first_bit(const unsigned long *addr, unsigned long size);

In include/asm-generic/bitops/find.h

This code largely copied from: arch/powerpc/lib/bitops.c

Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:11 -08:00
Akinobu Mita
2dfc383ad5 [PATCH] bitops: generic fls64()
This patch introduces the C-language equivalent of the function: int
fls64(__u64 x);

In include/asm-generic/bitops/fls64.h

This code largely copied from: include/linux/bitops.h

Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:10 -08:00
Akinobu Mita
136abb32d6 [PATCH] bitops: generic fls()
This patch introduces the C-language equivalent of the function: int fls(int
x);

In include/asm-generic/bitops/fls.h

This code largely copied from: include/linux/bitops.h

Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:10 -08:00
Akinobu Mita
176d8b0c27 [PATCH] bitops: generic ffz()
This patch introduces the C-language equivalent of the function: unsigned long
ffz(unsigned long word);

In include/asm-generic/bitops/ffz.h

This code largely copied from: include/asm-parisc/bitops.h

Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:10 -08:00
Akinobu Mita
c1226a005e [PATCH] bitops: generic __ffs()
This patch introduces the C-language equivalent of the function: unsigned long
__ffs(unsigned long word);

In include/asm-generic/bitops/__ffs.h

This code largely copied from: include/asm-sparc/bitops.h

Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:10 -08:00
Akinobu Mita
4117b02132 [PATCH] bitops: generic __{,test_and_}{set,clear,change}_bit() and test_bit()
This patch introduces the C-language equivalents of the functions below:

void __set_bit(int nr, volatile unsigned long *addr);
void __clear_bit(int nr, volatile unsigned long *addr);
void __change_bit(int nr, volatile unsigned long *addr);
int __test_and_set_bit(int nr, volatile unsigned long *addr);
int __test_and_clear_bit(int nr, volatile unsigned long *addr);
int __test_and_change_bit(int nr, volatile unsigned long *addr);
int test_bit(int nr, const volatile unsigned long *addr);

In include/asm-generic/bitops/non-atomic.h

This code largely copied from: asm-powerpc/bitops.h

Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:10 -08:00
Akinobu Mita
7a8a242995 [PATCH] bitops: generic {,test_and_}{set,clear,change}_bit()
This patch introduces the C-language equivalents of the functions below:

void set_bit(int nr, volatile unsigned long *addr);
void clear_bit(int nr, volatile unsigned long *addr);
void change_bit(int nr, volatile unsigned long *addr);
int test_and_set_bit(int nr, volatile unsigned long *addr);
int test_and_clear_bit(int nr, volatile unsigned long *addr);
int test_and_change_bit(int nr, volatile unsigned long *addr);

In include/asm-generic/bitops/atomic.h

This code largely copied from:

include/asm-powerpc/bitops.h
include/asm-parisc/bitops.h
include/asm-parisc/atomic.h

Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:10 -08:00
Akinobu Mita
67b0ad574b [PATCH] bitops: use non atomic operations for minix_*_bit() and ext2_*_bit()
Bitmap functions for the minix filesystem and the ext2 filesystem except
ext2_set_bit_atomic() and ext2_clear_bit_atomic() do not require the atomic
guarantees.

But these are defined by using atomic bit operations on several architectures.
 (cris, frv, h8300, ia64, m32r, m68k, m68knommu, mips, s390, sh, sh64, sparc,
sparc64, v850, and xtensa)

This patch switches to non atomic bit operation.

Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:10 -08:00
Akinobu Mita
72b61a3cfd [PATCH] bitops: cris: remove unnecessary local_irq_restore()
remove unnecessary local_irq_restore() after cris_atomic_restore() in
test_and_set_bit().

Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Cc: Mikael Starvik <starvik@axis.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:10 -08:00
Akinobu Mita
3156054b02 [PATCH] bitops: parisc: add ()-pair in __ffz() macro
Noticed by Michael Tokarev

add missing ()-pair in __ffz() macro for parisc

Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:10 -08:00
Akinobu Mita
4b417d0c7c [PATCH] bitops: alpha: use config options instead of __alpha_fix__ and __alpha_cix__
Use config options instead of gcc builtin definition to tell the use of
instruction set extensions (CIX and FIX).

This is introduced to tell the kbuild system the use of opmized hweight*()
routines on alpha architecture.

Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:09 -08:00
Akinobu Mita
9363513366 [PATCH] arm: fix undefined reference to generic_fls
This patch defines constant_fls() instead of removed generic_fls().

Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:09 -08:00
Akinobu Mita
0b28002fdf [PATCH] more s/fucn/func/ typo fixes
s/fucntion/function/ typo fixes

Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:09 -08:00
Hansjoerg Lipp
ee8a4b7f85 [PATCH] isdn4linux: Siemens Gigaset drivers - tty interface
And: Tilman Schmidt <tilman@imap.cc>

This patch adds the tty interface to the gigaset module.  The tty interface
provides direct access to the AT command set of the Gigaset devices.

Signed-off-by: Hansjoerg Lipp <hjlipp@web.de>
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Cc: Karsten Keil <kkeil@suse.de>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:05 -08:00
Masami Hiramatsu
311ac88fd2 [PATCH] x86: kprobes-booster
Current kprobe copies the original instruction at the probe point and replaces
it with a breakpoint instruction (int3).  When the kernel hits the probe
point, kprobe handler is invoked.  And the copied instruction is single-step
executed on the copied buffer (not on the original address) by kprobe.  After
that, the kprobe checks registers and modify it (if need) as if the
instructions was executed on the original address.

My proposal is based on the fact there are many instructions which do NOT
require the register modification after the single-step execution.  When the
copied instruction is a kind of them, kprobe just jumps back to the next
instruction after single-step execution.  If so, why don't we execute those
instructions directly?

With kprobe-booster patch, kprobes will execute a copied instruction directly
and (if need) jump back to original code.  This direct execution is executed
when the kprobe don't have both post_handler and break_handler, and the copied
instruction can be executed directly.

I sorted instructions which can be executed directly or not;

- Call instructions are NG(can not be executed directly).
  We should correct the return address pushed into top of stack.
- Indirect instructions except for absolute indirect-jumps
  are NG. Those instructions changes EIP randomly. We should
  check EIP and correct it.
- Instructions that change EIP beyond the range of the
  instruction buffer are NG.
- Instructions that change EIP to tail 5 bytes of the
  instruction buffer (it is the size of a jump instruction).
  We must write a jump instruction which backs to original
  kernel code in the instruction buffer.
- Break point instruction is NG. We should not touch EIP and
  pass to other handlers.
- Absolute direct/indirect jumps are OK.- Conditional Jumps are NG.
- Halt and software-interruptions are NG. Because it will stay on
  the instruction buffer of kprobes.
- Prefixes are NG.
- Unknown/reserved opcode is NG.
- Other 1 byte instructions are OK. But those instructions need a
  jump back code.
- 2 bytes instructions are mapped sparsely. So, in this release,
  this patch don't boost those instructions.

>From Intel's IA-32 opcode map described in IA-32 Intel Architecture Software
Developer's Manual Vol.2 B, I determined that following opcodes are not
boostable.

- 0FH (2byte escape)
- 70H - 7FH (Jump on condition)
- 9AH (Call) and 9CH (Pushf)
- C0H-C1H (Grp 2: includes reserved opcode)
- C6H-C7H (Grp11: includes reserved opcode)
- CCH-CEH (Software-interrupt)
- D0H-D3H (Grp2: includes reserved opcode)
- D6H (Reserved)
- D8H-DFH (Coprocessor)
- E0H-E3H (loop/conditional jump)
- E8H (Call)
- F0H-F3H (Prefixes and reserved)
- F4H (Halt)
- F6H-F7H (Grp3: includes reserved opcode)
- FEH-FFH(Grp4,5: includes reserved opcode)

Kprobe-booster checks whether target instruction can be boosted (can be
executed directly) at arch_copy_kprobe() function.  If the target instruction
can be boosted, it clears "boostable" flag.  If not, it sets "boostable" flag
-1.  This is disabled status.  In resume_execution() function, If "boostable"
flag is cleared, kprobe-booster measures the size of the target instruction
and sets "boostable" flag 1.

In kprobe_handler(), kprobe checks the "boostable" flag.  If the flag is 1, it
resets current kprobe and executes instruction buffer directly instead of
single stepping.

When unregistering a boosted kprobe, it calls synchronize_sched()
after "int3" is removed. So we can ensure followings after
the synchronize_sched() called.
- interrupt handlers are finished on all CPUs.
- instruction buffer is not executed on all CPUs.
And we can release the boosted kprobe safely.

And also, on preemptible kernel, the booster is not enabled where the kernel
preemption is enabled.  So, there are no preempted threads on the instruction
buffer.

The description of kretprobe-booster:
====================================

In the normal operation, kretprobe make a target function return to trampoline
code.  And a kprobe (called trampoline_probe) have been inserted at the
trampoline code.  When the kernel hits this kprobe, it calls kretprobe's
handler and it returns to original return address.

Kretprobe-booster patch removes the trampoline_probe.  It allows the
trampoline code to call kretprobe's handler directly instead of invoking
kprobe.  And tranpoline code returns to original return address.

This new trampoline code stores and restores registers, so the kretprobe
handler is still able to access those registers.

Current kprobe has about 1.3 usec/probe(*) overhead, and kprobe-booster patch
reduces it to 0.6 usec/probe(*).  Also current kretprobe has about 2.0
usec/probe(*) overhead.  Kprobe-booster patch reduces it to 1.3 usec/probe(*),
and the combination of both kprobe-booster patch and kretprobe-booster patch
reduces it to 0.9 usec/probe(*).

I expect the combination of both patches can reduce half of a probing
overhead.

Performance numbers strongly depend on the processor model.

Andrew Morton wrote:
> These preempt tricks look rather nasty.  Can you please describe what the
> problem is, precisely?  And how this code avoids it?  Perhaps we can find
> something cleaner.

The problem is how to remove the copied instructions of the
kprobe *safely* on the preemptable kernel (CONFIG_PREEMPT=y).

Kprobes basically executes the following actions;

(1)int3
(2)preempt_disable()
(3)kprobe_prehandler()
(4)copied instructioin(single step)
(5)kprobe_posthandler()
(6)preempt_enable()
(7)return to the original code

During the execution of copied instruction, preemption is
disabled (from step (2) to (6)).
When unregistering the probes, Kprobe waits for RCU
quiescent state by using synchronize_sched() after removing
int3 instruction.
Thus we can ensure the copied instruction is not executed.

On the other hand, kprobe-booster executes the following actions;

(1)int3
(2)preempt_disable()
(3)kprobe_prehandler()
(4)preempt_enable()             <-- this one is added by my patch
(5)copied instruction(direct execution)
(6)jmp back to the original code

The problem is that we have no way to prevent preemption on
step (5) or (6). We cannot call preempt_disable() after step (6),
because there are no rooms to do that. Thus, some other
processes may be preempted at step(5) or (6) on preemptable kernel.
And I couldn't find the easy way to ensure that other processes'
stack do *not* have the address of them. (I thought some way
to do that, but those are very costly.)

So currently, I simply boost the kprobe only when the probe
point is already preemption disabled.

> Also, the patch adds a preempt_enable() but I don't see a corresponding
> preempt_disable().  Am I missing something?

It is corresponding to the preempt_disable() in the top of
kprobe_handler().
I copied the code of kprobe_handler() here:

static int __kprobes kprobe_handler(struct pt_regs *regs)
{
        struct kprobe *p;
        int ret = 0;
        kprobe_opcode_t *addr = NULL;
        unsigned long *lp;
        struct kprobe_ctlblk *kcb;

        /*
         * We don't want to be preempted for the entire
         * duration of kprobe processing
         */
        preempt_disable();             <-- HERE
        kcb = get_kprobe_ctlblk();

Signed-off-by: Masami Hiramatsu <hiramatu@sdl.hitachi.co.jp>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:04 -08:00
Roman Zippel
05cfb614dd [PATCH] hrtimers: remove data field
The nanosleep cleanup allows to remove the data field of hrtimer.  The
callback function can use container_of() to get it's own data.  Since the
hrtimer structure is anyway embedded in other structures, this adds no
overhead.

Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:03 -08:00
Roman Zippel
df869b630d [PATCH] hrtimers: remove nsec_t typedef
nsec_t predates ktime_t and has mostly been superseded by it.  In the few
places that are left it's better to make it explicit that we're dealing with
64 bit values here.

Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:03 -08:00
Roman Zippel
272705c597 [PATCH] hrtimers: remove DEFINE_KTIME and ktime_to_clock_t()
Now that it_real_value is gone, the last user of DEFINE_KTIME and
ktime_to_clock_t are also gone, so remove it before someone starts using it
again.

Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:03 -08:00
Roman Zippel
b75f7a51ca [PATCH] hrtimers: remove state field
Remove the state field and encode this information in the rb_node similiar to
normal timer.

Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:02 -08:00
Roman Zippel
432569bb9d [PATCH] hrtimers: simplify nanosleep
nanosleep is the only user of the expired state, so let it manage this itself,
which makes the hrtimer code a bit simpler.  The remaining time is also only
calculated if requested.

Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:02 -08:00
Roman Zippel
44f2147551 [PATCH] hrtimers: pass current time to hrtimer_forward()
Pass current time to hrtimer_forward().  This allows to use the softirq time
in the timer base when the forward function is called from the timer callback.
 Other places pass current time with a call to timer->base->get_time().

Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:02 -08:00
Thomas Gleixner
92127c7a45 [PATCH] hrtimers: optimize softirq runqueues
The hrtimer softirq is called from the timer softirq every tick.  Retrieve the
current time from xtime and wall_to_monotonic instead of calling
base->get_time() for each timer base.  Store the time in the base structure
and provide a hook once clock source abstractions are in place and to keep the
code open for new base clocks.

Based on a patch from: Roman Zippel <zippel@linux-m68k.org>

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:02 -08:00
Badari Pulavarty
1d8fa7a2b9 [PATCH] remove ->get_blocks() support
Now that get_block() can handle mapping multiple disk blocks, no need to have
->get_blocks().  This patch removes fs specific ->get_blocks() added for DIO
and makes it users use get_block() instead.

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:01 -08:00
Badari Pulavarty
b0cf2321c6 [PATCH] pass b_size to ->get_block()
Pass amount of disk needs to be mapped to get_block().  This way one can
modify the fs ->get_block() functions to map multiple blocks at the same time.

[akpm@osdl.org: performance tweak]
[akpm@osdl.org: remove unneeded assignments]
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:01 -08:00
Badari Pulavarty
205f87f6b3 [PATCH] change buffer_head.b_size to size_t
Increase the size of the buffer_head b_size field (only) for 64 bit platforms.
Update some old and moldy comments in and around the structure as well.

The b_size increase allows us to perform larger mappings and allocations for
large I/O requests from userspace, which tie in with other changes allowing
the get_block_t() interface to map multiple blocks at once.

Signed-off-by: Nathan Scott <nathans@sgi.com>
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:01 -08:00
Mingming Cao
b54e41ec17 [PATCH] ext3_get_blocks: support multiple blocks allocation in ext3_new_block()
Change ext3_try_to_allocate() (called via ext3_new_blocks()) to try to
allocate the requested number of blocks on a best effort basis: After
allocated the first block, it will always attempt to allocate the next few(up
to the requested size and not beyond the reservation window) adjacent blocks
at the same time.

Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:01 -08:00
Mingming Cao
89747d369d [PATCH] ext3_get_blocks: Mapping multiple blocks at a once
Currently ext3_get_block() only maps or allocates one block at a time.  This
is quite inefficient for sequential IO workload.

I have posted a early implements a simply multiple block map and allocation
with current ext3.  The basic idea is allocating the 1st block in the existing
way, and attempting to allocate the next adjacent blocks on a best effort
basis.  More description about the implementation could be found here:
http://marc.theaimsgroup.com/?l=ext2-devel&m=112162230003522&w=2

The following the latest version of the patch: break the original patch into 5
patches, re-worked some logicals, and fixed some bugs.  The break ups are:

 [patch 1] Adding map multiple blocks at a time in ext3_get_blocks()
 [patch 2] Extend ext3_get_blocks() to support multiple block allocation
 [patch 3] Implement multiple block allocation in ext3-try-to-allocate
 (called via ext3_new_block()).
 [patch 4] Proper accounting updates in ext3_new_blocks()
 [patch 5] Adjust reservation window size properly (by the given number
 of blocks to allocate) before block allocation to increase the
 possibility of allocating multiple blocks in a single call.

Tests done so far includes fsx,tiobench and dbench.  The following numbers
collected from Direct IO tests (1G file creation/read) shows the system time
have been greatly reduced (more than 50% on my 8 cpu system) with the patches.

 1G file DIO write:
 	2.6.15		2.6.15+patches
 real    0m31.275s	0m31.161s
 user    0m0.000s	0m0.000s
 sys     0m3.384s	0m0.564s

 1G file DIO read:
 	2.6.15		2.6.15+patches
 real    0m30.733s	0m30.624s
 user    0m0.000s	0m0.004s
 sys     0m0.748s	0m0.380s

Some previous test we did on buffered IO with using multiple blocks allocation
and delayed allocation shows noticeable improvement on throughput and system
time.

This patch:

Add support of mapping multiple blocks in one call.

This is useful for DIO reads and re-writes (where blocks are already
allocated), also is in line with Christoph's proposal of using getblocks() in
mpage_readpage() or mpage_readpages().

Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:00 -08:00
Takashi Sato
e2d53f9525 [PATCH] 2TB files: change type of kstatfs entries
This fix was proposed by Trond Myklebust.  He says: The type "sector_t" is
heavily tied in to the block layer interface as an offset/handle to a block,
and is subject to a supposedly block-specific configuration option:
CONFIG_LBD.  Despite this, it is used in struct kstatfs to save a couple of
bytes on the stack whenever we call the filesystems' ->statfs().

So kstatfs's entries related to blocks are invalid on statfs64 for a network
filesystem which has more than 2^32-1 blocks when CONFIG_LBD is disabled.

- struct kstatfs
  Change the type of following entries from sector_t to u64.
  f_blocks
  f_bfree
  f_bavail
  f_files
  f_ffree

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Takashi Sato <sho@tnes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:00 -08:00
Takashi Sato
a0f62ac636 [PATCH] 2TB files: add blkcnt_t
Add blkcnt_t as the type of inode.i_blocks.  This enables you to make the size
of blkcnt_t either 4 bytes or 8 bytes on 32 bits architecture with CONFIG_LSF.

- CONFIG_LSF
  Add new configuration parameter.
- blkcnt_t
  On h8300, i386, mips, powerpc, s390 and sh that define sector_t,
  blkcnt_t is defined as u64 if CONFIG_LSF is enabled; otherwise it is
  defined as unsigned long.
  On other architectures, it is defined as unsigned long.
- inode.i_blocks
  Change the type from sector_t to blkcnt_t.

Signed-off-by: Takashi Sato <sho@tnes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:00 -08:00
Takashi Sato
abcb6c9fd1 [PATCH] 2TB files: st_blocks is invalid when calling stat64
This patch series fixes the following problems on 32 bits architecture.

o stat64 returns the lower 32 bits of blocks, although userland st_blocks
  has 64 bits, because i_blocks has only 32 bits.  The ioctl with FIOQSIZE has
  the same problem.

o As Dave Kleikamp said, making >2TB file on JFS results in writing an
  invalid block number to disk inode.  The cause is the same as above too.

o In generic quota code dquot_transfer(), the file usage is calculated from
  i_blocks via inode_get_bytes().  If the file is over 2TB, the change of
  usage is less than expected.  The cause is the same as above too.

o As Trond Myklebust said, statfs64's entries related to blocks are invalid
  on statfs64 for a network filesystem which has more than 2^32-1 blocks with
  CONFIG_LBD disabled.  [PATCH 3/3]

We made patches to fix problems that occur when handling a large filesystem
and a large file.  It was discussed on the mails titled "stat64 for over 2TB
file returned invalid st_blocks".

Signed-off-by: Takashi Sato <sho@tnes.nec.co.jp>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Jan Kara <jack@ucw.cz>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:00 -08:00
Matthew Dobson
93d2341c75 [PATCH] mempool: use mempool_create_slab_pool()
Modify well over a dozen mempool users to call mempool_create_slab_pool()
rather than calling mempool_create() with extra arguments, saving about 30
lines of code and increasing readability.

Signed-off-by: Matthew Dobson <colpatch@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:00 -08:00
Matthew Dobson
fec433aaaa [PATCH] mempool: add mempool_create_slab_pool()
Create a simple wrapper function for the common case of creating a slab-based
mempool.

Signed-off-by: Matthew Dobson <colpatch@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:00 -08:00
Matthew Dobson
f183323d38 [PATCH] mempool: add kzalloc allocator
Add another allocator to the common mempool code: a kzalloc/kfree allocator

This will be used by the next patch in the series to replace a mempool-backed
kzalloc allocator.  It is also very likely that there will be more users in
the future.

Signed-off-by: Matthew Dobson <colpatch@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:56:59 -08:00
Matthew Dobson
53184082b0 [PATCH] mempool: add kmalloc allocator
Add another allocator to the common mempool code: a kmalloc/kfree allocator

This will be used by the next patch in the series to replace duplicate
mempool-backed kmalloc allocators in several places in the kernel.  It is also
very likely that there will be more users in the future.

Signed-off-by: Matthew Dobson <colpatch@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:56:59 -08:00
Matthew Dobson
6e0678f394 [PATCH] mempool: add page allocator
This will be used by the next patch in the series to replace duplicate
mempool-backed page allocators in 2 places in the kernel.  It is also likely
that there will be more users in the future.

Signed-off-by: Matthew Dobson <colpatch@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:56:59 -08:00
Maneesh Soni
22e6c1b39c [PATCH] Use loff_t for size in struct proc_dir_entry
Change proc_dir_entry->size to be loff_t to represent files like
/proc/vmcore for 32bit systems with more than 4G memory.

Needed for seeing correct size for /proc/vmcore for 32-bit systems with >
4G RAM.

Signed-off-by: Maneesh Soni <maneesh@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:56:57 -08:00
Stephen Rothwell
3158e9411a [PATCH] consolidate sys32/compat_adjtimex
Create compat_sys_adjtimex and use it an all appropriate places.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:56:57 -08:00
Stephen Rothwell
88959ea968 [PATCH] create struct compat_timex and use it everywhere
We had a copy of the compatibility version of struct timex in each 64 bit
architecture.  This patch just creates a global one and replaces all the
usages of the old ones.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Acked-by: Kyle McMartin <kyle@parisc-linux.org>
Acked-by: Tony Luck <tony.luck@intel.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:56:57 -08:00
Andy Adamson
5842add2f3 [PATCH] VFS,fs/locks.c,NFSD4: add race_free posix_lock_file_conf() interface
Lockd and the NFSv4 server both exercise a race condition where
posix_test_lock() is called either before or after posix_lock_file() to
deal with a denied lock request due to a conflicting lock.

Remove the race condition for the NFSv4 server by adding a new conflicting
lock parameter to __posix_lock_file() , changing the name to
__posix_lock_file_conf().

Keep posix_lock_file() interface, add posix_lock_conf() interface, both
call __posix_lock_file_conf().

[akpm@osdl.org: Put the EXPORT_SYMBOL() where it belongs]
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:56:56 -08:00
Randy Dunlap
878a9f30d7 [PATCH] hpet header sanitization
Add __KERNEL__ block.
Use __KERNEL__ to allow ioctl interface to be usable.

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:56:56 -08:00
Corey Minyard
50c812b2b9 [PATCH] ipmi: add full sysfs support
Add full driver model support for the IPMI driver.  It links in the proper
bus and device support.

It adds an "ipmi" driver interface that has each BMC discovered by the
driver (as a device).  These BMCs appear in the devices/platform directory.
 If there are multiple interfaces to the same BMC, the driver should
discover this and will only have one BMC entry.  The BMC entry will have
pointers to each interface device that connects to it.

The device information (statistics and config information) has not yet been
ported over to the driver model from proc, that will come later.

This work was based on work by Yani Ioannou.  I basically rewrote it using
that code as a guide, but he still deserves credit :).

[bunk@stusta.de: make ipmi_find_bmc_guid() static]
Signed-off-by: Corey Minyard <minyard@acm.org>
Signed-off-by: Yani Ioannou <yani.ioannou@gmail.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:56:56 -08:00
Con Kolivas
3c30b06df4 [PATCH] cleanup smp_call_function UP build
net/core/flow.c: In function 'flow_cache_flush':
net/core/flow.c:299: warning: statement with no effect

Signed-off-by: Con Kolivas <kernel@kolivas.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:56:55 -08:00
NeilBrown
2ff28e22bd [PATCH] Make address_space_operations->invalidatepage return void
The return value of this function is never used, so let's be honest and
declare it as void.

Some places where invalidatepage returned 0, I have inserted comments
suggesting a BUG_ON.

[akpm@osdl.org: JBD BUG fix]
[akpm@osdl.org: rework for git-nfs]
[akpm@osdl.org: don't go BUG in block_invalidate_page()]
Signed-off-by: Neil Brown <neilb@suse.de>
Acked-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:56:55 -08:00
NeilBrown
3978d7179d [PATCH] Make address_space_operations->sync_page return void
The only user ignores the return value, and the only instanace
(block_sync_page) always returns 0...

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:56:55 -08:00
Bjorn Helgaas
b2c99e3c70 [PATCH] EFI: keep physical table addresses in efi structure
Almost all users of the table addresses from the EFI system table want
physical addresses.  So rather than doing the pa->va->pa conversion, just keep
physical addresses in struct efi.

This fixes a DMI bug: the efi structure contained the physical SMBIOS address
on x86 but the virtual address on ia64, so dmi_scan_machine() used ioremap()
on a virtual address on ia64.

This is essentially the same as an earlier patch by Matt Tolentino:
	http://marc.theaimsgroup.com/?l=linux-kernel&m=112130292316281&w=2
except that this changes all table addresses, not just ACPI addresses.

Matt's original patch was backed out because it caused MCAs on HP sx1000
systems.  That problem is resolved by the ioremap() attribute checking added
for ia64.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Matt Domsch <Matt_Domsch@dell.com>
Cc: "Tolentino, Matthew E" <matthew.e.tolentino@intel.com>
Cc: "Brown, Len" <len.brown@intel.com>
Cc: Andi Kleen <ak@muc.de>
Acked-by: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:56:54 -08:00
Bjorn Helgaas
e9b0a07121 [PATCH] ia64: ioremap: check EFI for valid memory attributes
Check the EFI memory map so we can use the correct memory attributes for
ioremap().  Previously, we always used uncacheable access, which blows up on
some machines for regular system memory.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Matt Domsch <Matt_Domsch@dell.com>
Cc: "Tolentino, Matthew E" <matthew.e.tolentino@intel.com>
Cc: "Brown, Len" <len.brown@intel.com>
Cc: Andi Kleen <ak@muc.de>
Acked-by: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:56:54 -08:00
Bjorn Helgaas
136939a2b5 [PATCH] EFI, /dev/mem: simplify efi_mem_attribute_range()
Pass the size, not a pointer to the size, to efi_mem_attribute_range().

This function validates memory regions for the /dev/mem read/write/mmap paths.
The pointer allows arches to reduce the size of the range, but I think that's
unnecessary complexity.  Simplifying it will let me use
efi_mem_attribute_range() to improve the ia64 ioremap() implementation.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Matt Domsch <Matt_Domsch@dell.com>
Cc: "Tolentino, Matthew E" <matthew.e.tolentino@intel.com>
Cc: "Brown, Len" <len.brown@intel.com>
Cc: Andi Kleen <ak@muc.de>
Acked-by: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:56:54 -08:00
Matt Domsch
3ed3bce846 [PATCH] ia64: use i386 dmi_scan.c
Enable DMI table parsing on ia64.

Andi Kleen has a patch in his x86_64 tree which enables the use of i386
dmi_scan.c on x86_64.  dmi_scan.c functions are being used by the
drivers/char/ipmi/ipmi_si_intf.c driver for autodetecting the ports or
memory spaces where the IPMI controllers may be found.

This patch adds equivalent changes for ia64 as to what is in the x86_64
tree.  In addition, I reworked the DMI detection, such that on EFI-capable
systems, it uses the efi.smbios pointer to find the table, rather than
brute-force searching from 0xF0000.  On non-EFI systems, it continues the
brute-force search.

My test system, an Intel S870BN4 'Tiger4', aka Dell PowerEdge 7250, with
latest BIOS, does not list the IPMI controller in the ACPI namespace, nor
does it have an ACPI SPMI table.  Also note, currently shipping Dell x8xx
EM64T servers don't have these either, so DMI is the only method for
obtaining the address of the IPMI controller.

Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Acked-by: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:56:54 -08:00
James Bottomley
5a3a5a98b6 [PATCH] Add flush_kernel_dcache_page() API
We have a problem in a lot of emulated storage in that it takes a page from
get_user_pages() and does something like

kmap_atomic(page)
modify page
kunmap_atomic(page)

However, nothing has flushed the kernel cache view of the page before the
kunmap.  We need a lightweight API to do this, so this new API would
specifically be for flushing the kernel cache view of a user page which the
kernel has modified.  The driver would need to add
flush_kernel_dcache_page(page) before the final kunmap.

Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:56:53 -08:00
James Bottomley
03beb07664 [PATCH] Add API for flushing Anon pages
Currently, get_user_pages() returns fully coherent pages to the kernel for
anything other than anonymous pages.  This is a problem for things like
fuse and the SCSI generic ioctl SG_IO which can potentially wish to do DMA
to anonymous pages passed in by users.

The fix is to add a new memory management API: flush_anon_page() which
is used in get_user_pages() to make anonymous pages coherent.

Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:56:53 -08:00
Steven Rostedt
64a07bd82e [PATCH] protect remove_proc_entry
It has been discovered that the remove_proc_entry has a race in the removing
of entries in the proc file system that are siblings.  There's no protection
around the traversing and removing of elements that belong in the same
subdirectory.

This subdirectory list is protected in other areas by the BKL.  So the BKL was
at first used to protect this area too, but unfortunately, remove_proc_entry
may be called with spinlocks held.  The BKL may schedule, so this was not a
solution.

The final solution was to add a new global spin lock to protect this list,
called proc_subdir_lock.  This lock now protects the list in
remove_proc_entry, and I also went around looking for other areas that this
list is modified and added this protection there too.  Care must be taken
since these locations call several functions that may also schedule.

Since I don't see any location that these functions that modify the
subdirectory list are called by interrupts, the irqsave/restore versions of
the spin lock was _not_ used.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:56:53 -08:00
Linus Torvalds
36ddf5bbde Merge master.kernel.org:/home/rmk/linux-2.6-serial
* master.kernel.org:/home/rmk/linux-2.6-serial:
  [ARM] 3383/3: ixp2000: ixdp2x01 platform serial conversion
  [SERIAL] amba-pl010: Remove accessor macros
  [SERIAL] remove 8250_acpi (replaced by 8250_pnp and PNPACPI)
  [SERIAL] icom: select FW_LOADER
2006-03-25 20:31:32 -08:00
Linus Torvalds
a41622eaa9 Merge master.kernel.org:/home/rmk/linux-2.6-arm
* master.kernel.org:/home/rmk/linux-2.6-arm:
  [ARM] 3030/2: fix permission check in the obscur cmpxchg syscall
  [ARM] nommu: rename compressed/head.S symbols to a new style
  [ARM] select TLS_REG_EMUL and NEEDS_SYSCALL_FOR_CMPXCHG
  [ARM] nommu: Move hardware page table definitions to pgtable-hwdef.h
  [ARM] Move read of processor ID out of lookup_processor_type()
  [ARM] Fix typo in tlbflush.h
  [ARM] noMMU: removes TLB codes in nommu mode
  [ARM] noMMU: block sys_fork in nommu mode
  [ARM] 3399/1: Fix link problem when CONFIG_PRINTK is disabled
  [ARM] 3398/1: Fix the VFP registers loading/storing base address
  [ARM] 3397/1: AT91RM9200 Header update
  [ARM] 3385/1: Battery support for sharp zaurus sl-5500 (collie)
  [ARM] SMP: don't set cpu_*_map in smp_prepare_boot_cpu
  include/linux/clk.h is betraying its ARM origins
  [ARM] Move enable_irq and disable_irq to assembler.h
  [ARM] 3391/1: use PLAT8250_DEV_PLATFORM{,1} for platform device id instead of 0/1
2006-03-25 20:29:54 -08:00
Lennert Buytenhek
104c7b03ea [ARM] 3383/3: ixp2000: ixdp2x01 platform serial conversion
Patch from Lennert Buytenhek

Add a PLAT8250_DEV_PLATFORM2, and convert the two ixdp2x01 CPLD serial
ports to use platform serial devices with ids PLAT8250_DEV_PLATFORM[12].
(The on-chip xscale UART is PLAT8250_DEV_PLATFORM, id #0.)

Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-03-25 23:03:13 +00:00
Russell King
0003cedfc5 Merge nommu tree
Fix merge conflict in arch/arm/mm/proc-xscale.S

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-03-25 22:08:55 +00:00
Andrew Victor
1310eda4be [ARM] 3397/1: AT91RM9200 Header update
Patch from Andrew Victor

This patch updates the hardware header to include definitions for the
Memory Controller registers.

Signed-off-by: Andrew Victor <andrew@sanpeople.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-03-25 21:57:59 +00:00
Todd Poynor
686f8c5d77 include/linux/clk.h is betraying its ARM origins
include/linux/clk.h is betraying its ARM origins.

Signed-off-by: Todd Poynor <tpoynor@mvista.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2006-03-25 18:15:24 +00:00
Linus Torvalds
3cbb90a9cb powerpc: fix strncasecmp prototype
It takes a size_t, not an int, as its third argument.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 09:41:40 -08:00
Linus Torvalds
1b9a391736 Merge branch 'audit.b3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current
* 'audit.b3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current: (22 commits)
  [PATCH] fix audit_init failure path
  [PATCH] EXPORT_SYMBOL patch for audit_log, audit_log_start, audit_log_end and audit_format
  [PATCH] sem2mutex: audit_netlink_sem
  [PATCH] simplify audit_free() locking
  [PATCH] Fix audit operators
  [PATCH] promiscuous mode
  [PATCH] Add tty to syscall audit records
  [PATCH] add/remove rule update
  [PATCH] audit string fields interface + consumer
  [PATCH] SE Linux audit events
  [PATCH] Minor cosmetic cleanups to the code moved into auditfilter.c
  [PATCH] Fix audit record filtering with !CONFIG_AUDITSYSCALL
  [PATCH] Fix IA64 success/failure indication in syscall auditing.
  [PATCH] Miscellaneous bug and warning fixes
  [PATCH] Capture selinux subject/object context information.
  [PATCH] Exclude messages by message type
  [PATCH] Collect more inode information during syscall processing.
  [PATCH] Pass dentry, not just name, in fsnotify creation hooks.
  [PATCH] Define new range of userspace messages.
  [PATCH] Filter rule comparators
  ...

Fixed trivial conflict in security/selinux/hooks.c
2006-03-25 09:24:53 -08:00
Linus Torvalds
53846a21c1 Merge git://git.linux-nfs.org/pub/linux/nfs-2.6
* git://git.linux-nfs.org/pub/linux/nfs-2.6: (103 commits)
  SUNRPC,RPCSEC_GSS: spkm3--fix config dependencies
  SUNRPC,RPCSEC_GSS: spkm3: import contexts using NID_cast5_cbc
  LOCKD: Make nlmsvc_traverse_shares return void
  LOCKD: nlmsvc_traverse_blocks return is unused
  SUNRPC,RPCSEC_GSS: fix krb5 sequence numbers.
  NFSv4: Dont list system.nfs4_acl for filesystems that don't support it.
  SUNRPC,RPCSEC_GSS: remove unnecessary kmalloc of a checksum
  SUNRPC: Ensure rpc_call_async() always calls tk_ops->rpc_release()
  SUNRPC: Fix memory barriers for req->rq_received
  NFS: Fix a race in nfs_sync_inode()
  NFS: Clean up nfs_flush_list()
  NFS: Fix a race with PG_private and nfs_release_page()
  NFSv4: Ensure the callback daemon flushes signals
  SUNRPC: Fix a 'Busy inodes' error in rpc_pipefs
  NFS, NLM: Allow blocking locks to respect signals
  NFS: Make nfs_fhget() return appropriate error values
  NFSv4: Fix an oops in nfs4_fill_super
  lockd: blocks should hold a reference to the nlm_file
  NFSv4: SETCLIENTID_CONFIRM should handle NFS4ERR_DELAY/NFS4ERR_RESOURCE
  NFSv4: Send the delegation stateid for SETATTR calls
  ...
2006-03-25 09:18:27 -08:00
Andi Kleen
f271a6f557 [PATCH] x86_64: Removed duplicated declaration of force_iommu
Noticed by Andrew Morton.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 09:14:39 -08:00
Eric Dumazet
dcf36bfa5d [PATCH] x86_64: group memnodemap and memnodeshift in a memnode structure
pfn_to_page() and others need to access both memnode_shift and the very
first bytes of memnodemap[]. If we force memnode_shift to be just before the
memnodemap array, we can reduce the memory footprint to one cache line
instead of two for most setups. This patch introduce a 'memnode' structure
where shift and map[] are carefully placed.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 09:14:38 -08:00
Andi Kleen
9494943619 [PATCH] x86_64: Make local_t 64bit instead of 32bit
For consistency with other architectures

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 09:14:38 -08:00
Andi Kleen
ba22f13563 [PATCH] x86_64: Remove CONFIG_UNORDERED_IO
It was a failed experiment - all benchmarks done with it on both AMD
and Intel showed it was a loss. That was probably because the store
buffers of the CPUs for write combining traffic weren't large enough.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 09:14:38 -08:00
Vivek Goyal
da7ed9f98f [PATCH] x86_64: timer interrupt lockup due to pending interrupt
o check_timer() routine fails while second kernel is booting after a crash
  on an opetron box. Problem happens because timer vector (0x31) seems to be
  locked.

o After a system crash, it is not safe to service interrupts any more, hence
  interrupts are disabled. This leads to pending interrupts at LAPIC. LAPIC
  sends these interrupts to the CPU during early boot of second kernel. Other
  pending interrupts are discarded saying unexpected trap but timer interrupt
  is serviced and CPU does not issue an LAPIC EOI because it think this
  interrupt came from i8259 and sends ack to 8259. This leads to vector 0x31
  locking as LAPIC does not clear respective ISR and keeps on waiting for
  EOI.

o This patch issues extra EOI for the pending interrupts who have ISR set.

o Though today only timer seems to be the special case because in early
  boot it thinks interrupts are coming from i8259 and uses
  mask_and_ack_8259A() as ack handler and does not issue LAPIC EOI. But
  probably doing it in generic manner for all vectors makes sense.

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 09:10:57 -08:00
Andi Kleen
267b48014a [PATCH] x86_64: Try to allocate node memmap near the end of node
This fixes problems with very large nodes (over 128GB) filling up all of
the first 4GB with their mem_map and not leaving enough space for the
swiotlb.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 09:10:56 -08:00
Arjan van de Ven
df92004ce6 [PATCH] x86_64: Reorder one field of the PDA to reduce padding
This reorders the mmu_state int in the pda, such that there is no more
padding (there currently is 4 bytes of padding).  Boot tested.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 09:10:56 -08:00