Commit graph

10811 commits

Author SHA1 Message Date
Linus Torvalds
5701412351 Merge branch 'i2c-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6
* 'i2c-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6:
  i2c: Convert some more new-style drivers to use module aliasing
  i2c: Match dummy devices by type
  i2c-sibyte: Mark i2c_sibyte_add_bus() as static
  i2c-sibyte: Correct a comment about frequency
  i2c: Improve the functionality documentation
  i2c: Improve smbus-protocol documentation
  i2c-piix4: Blacklist two mainboards
  i2c-piix4: Increase the intitial delay for the ServerWorks CSB5
  i2c-mpc: Compare to NO_IRQ instead of zero
2008-05-11 17:09:24 -07:00
Linus Torvalds
c3921ab715 Add new 'cond_resched_bkl()' helper function
It acts exactly like a regular 'cond_resched()', but will not get
optimized away when CONFIG_PREEMPT is set.

Normal kernel code is already preemptable in the presense of
CONFIG_PREEMPT, so cond_resched() is optimized away (see commit
02b67cc3ba "sched: do not do
cond_resched() when CONFIG_PREEMPT").

But when wanting to conditionally reschedule while holding a lock, you
need to use "cond_sched_lock(lock)", and the new function is the BKL
equivalent of that.

Also make fs/locks.c use it.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-11 16:04:48 -07:00
Jean Delvare
60b129d7bf i2c: Match dummy devices by type
As the old driver_name/type matching scheme is going away soon, change
the dummy device mechanism to use the new matching scheme.

This has the downside that dummy i2c clients can no longer choose
their name, they'll all appear as "dummy" in sysfs and in log
messages. I don't think it is a problem in practice though, as there
is little reason to use these i2c clients to log messages.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
2008-05-11 20:37:06 +02:00
Linus Torvalds
633331f389 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
  [libata] revert new check-ready Status register logic
2008-05-11 09:52:45 -07:00
Linus Torvalds
8e3e076c5a BKL: revert back to the old spinlock implementation
The generic semaphore rewrite had a huge performance regression on AIM7
(and potentially other BKL-heavy benchmarks) because the generic
semaphores had been rewritten to be simple to understand and fair.  The
latter, in particular, turns a semaphore-based BKL implementation into a
mess of scheduling.

The attempt to fix the performance regression failed miserably (see the
previous commit 00b41ec261 'Revert
"semaphore: fix"'), and so for now the simple and sane approach is to
instead just go back to the old spinlock-based BKL implementation that
never had any issues like this.

This patch also has the advantage of being reported to fix the
regression completely according to Yanmin Zhang, unlike the semaphore
hack which still left a couple percentage point regression.

As a spinlock, the BKL obviously has the potential to be a latency
issue, but it's not really any different from any other spinlock in that
respect.  We do want to get rid of the BKL asap, but that has been the
plan for several years.

These days, the biggest users are in the tty layer (open/release in
particular) and Alan holds out some hope:

  "tty release is probably a few months away from getting cured - I'm
   afraid it will almost certainly be the very last user of the BKL in
   tty to get fixed as it depends on everything else being sanely locked."

so while we're not there yet, we do have a plan of action.

Tested-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Alexander Viro <viro@ftp.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-10 20:58:02 -07:00
Linus Torvalds
9c3cdc1f83 Move ACCESS_ONCE() to <linux/compiler.h>
It actually makes much more sense there, and we do tend to need it for
non-RCU usage too.  Moving it to <linux/compiler.h> will allow some
other cases that have open-coded the same logic to use the same helper
function that RCU has used.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-10 19:51:16 -07:00
Jeff Garzik
005b1f7495 [libata] revert new check-ready Status register logic
This behavior differs across multiple controllers, so we cannot use
common logic for all controllers.

Revert back to the basic common behavior, and specific drivers will
be updated from here to take into account the unusual Status return
values.

Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-05-09 15:00:55 -04:00
Linus Torvalds
d9a9a23ff2 Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (23 commits)
  [POWERPC] Remove leftover printk in isa-bridge.c
  [POWERPC] Remove duplicate #include
  [POWERPC] Initialize lockdep earlier
  [POWERPC] Document when printk is useable
  [POWERPC] Fix bogus paca->_current initialization
  [POWERPC] Fix of_i2c include for module compilation
  [POWERPC] Make default cputable entries reflect selected CPU family
  [POWERPC] spufs: lockdep annotations for spufs_dir_close
  [POWERPC] spufs: don't requeue victim contex in find_victim if it's not in spu_run
  [POWERPC] 4xx: Fix PCI mem in sequoia DTS
  [POWERPC] 4xx: Add endpoint support to 4xx PCIe driver
  [POWERPC] 4xx: Fix problem with new TLB storage attibute fields on 440x6 core
  [POWERPC] spufs: spu_create should send inotify IM_CREATE event
  [POWERPC] spufs: handle faults while the context switch pending flag is set
  [POWERPC] spufs: fix concurrent delivery of class 0 & 1 exceptions
  [POWERPC] spufs: try to route SPU interrupts to local node
  [POWERPC] spufs: set SPU_CONTEXT_SWITCH_PENDING before synchronising SPU irqs
  [POWERPC] spufs: don't acquire state_mutex interruptible while performing callback
  [POWERPC] spufs: update master runcntl with context lock held
  [POWERPC] spufs: fix post-stopped update of MFC_CNTL register
  ...
2008-05-09 08:06:31 -07:00
Rusty Russell
6c2545eeff module: put modversions in vermagic
Don't allow a module built without versions altogether to be inserted
into a kernel which expects modversions.

modprobe --force will strip vermagic as well as modversions, so it
won't be effected, but this will make sure that a
non-CONFIG_MODVERSIONS module won't be accidentally inserted into a
CONFIG_MODVERSIONS kernel.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-09 07:45:18 -07:00
Jochen Friedrich
8af302e2dc [POWERPC] Fix of_i2c include for module compilation
Remove #ifdef CONFIG_OF_I2C as this breaks module compilation.
Drivers using this header should depend on OF_I2C anyways, so
there's no need to make this conditional.

Signed-off-by: Jochen Friedrich <jochen@scram.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-05-09 20:22:58 +10:00
Linus Torvalds
28a4acb485 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (32 commits)
  net: Added ASSERT_RTNL() to dev_open() and dev_close().
  can: Fix can_send() handling on dev_queue_xmit() failures
  netns: Fix arbitrary net_device-s corruptions on net_ns stop.
  netfilter: Kconfig: default DCCP/SCTP conntrack support to the protocol config values
  netfilter: nf_conntrack_sip: restrict RTP expect flushing on error to last request
  macvlan: Fix memleak on device removal/crash on module removal
  net/ipv4: correct RFC 1122 section reference in comment
  tcp FRTO: SACK variant is errorneously used with NewReno
  e1000e: don't return half-read eeprom on error
  ucc_geth: Don't use RX clock as TX clock.
  cxgb3: Use CAP_SYS_RAWIO for firmware
  pcnet32: delete non NAPI code from driver.
  fs_enet: Fix a memory leak in fs_enet_mdio_probe
  [netdrvr] eexpress: IPv6 fails - multicast problems
  3c59x: use netstats in net_device structure
  3c980-TX needs EXTRA_PREAMBLE
  fix warning in drivers/net/appletalk/cops.c
  e1000e: Add support for BM PHYs on ICH9
  uli526x: fix endianness issues in the setup frame
  uli526x: initialize the hardware prior to requesting interrupts
  ...
2008-05-08 19:03:26 -07:00
Linus Torvalds
7a34912d90 Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  Revert "relay: fix splice problem"
  docbook: fix bio missing parameter
  block: use unitialized_var() in bio_alloc_bioset()
  block: avoid duplicate calls to get_part() in disk stat code
  cfq-iosched: make io priorities inherit CPU scheduling class as well as nice
  block: optimize generic_unplug_device()
  block: get rid of likely/unlikely predictions in merge logic
  vfs: splice remove_suid() cleanup
  cfq-iosched: fix RCU race in the cfq io_context destructor handling
  block: adjust tagging function queue bit locking
  block: sysfs store function needs to grab queue_lock and use queue_flag_*()
2008-05-08 10:48:36 -07:00
Linus Torvalds
0f1bce41fe Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6:
  udf: Fix memory corruption when fs mounted with noadinicb option
  udf: Make udf exportable
  udf: fs/udf/partition.c:udf_get_pblock() mustn't be inline
2008-05-08 10:48:03 -07:00
David S. Miller
33f9936b2b Merge branch 'upstream-davem' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6 2008-05-08 02:35:54 -07:00
Patrick McHardy
ef75d49f11 netfilter: nf_conntrack_sip: restrict RTP expect flushing on error to last request
Some Inovaphone PBXs exhibit very stange behaviour: when dialing for
example "123", the device sends INVITE requests for "1", "12" and
"123" back to back.  The first requests will elicit error responses
from the receiver, causing the SIP helper to flush the RTP
expectations even though we might still see a positive response.

Note the sequence number of the last INVITE request that contained a
media description and only flush the expectations when receiving a
negative response for that sequence number.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-08 01:15:21 -07:00
Jens Axboe
28f13702f0 block: avoid duplicate calls to get_part() in disk stat code
get_part() is fairly expensive, as it O(N) loops over partitions
to find the right one. In lots of normal IO paths we end up looking
up the partition twice, to make matters even worse. Change the
stat add code to accept a passed in partition instead.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-05-07 10:15:46 +02:00
Jens Axboe
6d63c27557 cfq-iosched: make io priorities inherit CPU scheduling class as well as nice
We currently set all processes to the best-effort scheduling class,
regardless of what CPU scheduling class they belong to. Improve that
so that we correctly track idle and rt scheduling classes as well.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-05-07 09:51:23 +02:00
Rasmus Rohde
221e583a73 udf: Make udf exportable
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Rasmus Rohde <rohde@duff.dk>
Signed-off-by: Jan Kara <jack@suse.cz>
2008-05-07 09:48:23 +02:00
Miklos Szeredi
7f3d4ee108 vfs: splice remove_suid() cleanup
generic_file_splice_write() duplicates remove_suid() just because it
doesn't hold i_mutex.  But it grabs i_mutex inside splice_from_pipe()
anyway, so this is rather pointless.

Move locking to generic_file_splice_write() and call remove_suid() and
__splice_from_pipe() instead.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-05-07 09:29:00 +02:00
Linus Torvalds
bb78be8397 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  [PATCH] fix SMP ordering hole in fcntl_setlk()
  [PATCH] kill ->put_inode
  [PATCH] fix reservation discarding in affs
2008-05-06 11:39:57 -07:00
Christoph Hellwig
33dcdac2df [PATCH] kill ->put_inode
And with that last patch to affs killing the last put_inode instance we
can finally, after many years of transition kill this racy and awkward
interface.

(It's kinda funny that even the description in
Documentation/filesystems/vfs.txt was entirely wrong..)

Also remove a very misleading comment above the defintion of
struct super_operations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-05-06 13:45:34 -04:00
Jeff Garzik
54c852a2d6 Merge branch 'for-2.6.26' of git://git.farnsworth.org/dale/linux-2.6-mv643xx_eth into upstream 2008-05-06 12:22:03 -04:00
Andy Fleming
9b9a8bfc8d phylib: Fix some sparse warnings
Declared some things static, declared some things in the header.

Signed-off-by: Andy Fleming <afleming@freescale.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-05-06 12:01:41 -04:00
Mark Lord
10acf3b0d3 libata: export ata_eh_analyze_ncq_error
Export ata_eh_analyze_ncq_error() for subsequent use by sata_mv,
as suggested by Tejun.

Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-05-06 11:37:58 -04:00
Tejun Heo
78ab88f04f libata: improve post-reset device ready test
Some controllers (jmb and inic162x) use 0x77 and 0x7f to indicate that
the device isn't ready yet.  It looks like they use 0xff if device
presence is detected but connection isn't established.  0x77 or 0x7f
after connection is established and use the value from signature FIS
after receiving it.

This patch implements ata_check_ready(), which takes TF status value
and determines whether the port is ready or not considering the above
and other conditions, and use it in @check_ready() functions.  This is
safe as both 0x77 and 0x7f aren't valid ready status value even though
they have BSY bit cleared.

This fixes hot plug detection failures which can be triggered with
certain drives if they aren't already spun up when the data connector
is hot plugged.

Tested on sil, sil24, ahci (jmb/ich), piix and inic162x combined with
eight drives from all major vendors.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-05-06 11:32:02 -04:00
Linus Torvalds
bb896afe20 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-fixes
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-fixes:
  sched: default to n for GROUP_SCHED and FAIR_GROUP_SCHED
  sched: add optional support for CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
  sched, x86: add HAVE_UNSTABLE_SCHED_CLOCK
  sched: fix cpu clock
  sched: fair-group: fix a Div0 error of the fair group scheduler
  sched: fix missing locking in sched_domains code
  sched: make clock sync tunable by architecture code
  sched: fix debugging
  sched: fix sched_info_switch not being called according to documentation
  sched: fix hrtick_start_fair and CPU-Hotplug
  sched: fix SCHED_FAIR wake-idle logic error
  sched: fix RT task-wakeup logic
  sched: add statics, don't return void expressions
  sched: add debug checks to idle functions
  sched: remove old sched doc
  sched: make rt_sched_class, idle_sched_class static
  sched: optimize calc_delta_mine()
  sched: fix normalized sleeper
2008-05-05 17:31:14 -07:00
Linus Torvalds
2e83fc4df5 Merge branch 'powerpc-next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* 'powerpc-next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
  [POWERPC] Assign PDE->data before gluing PDE into /proc tree
  [POWERPC] devres: Add devm_ioremap_prot()
  [POWERPC] macintosh: ADB driver: adb_handler_sem semaphore to mutex
  [POWERPC] macintosh: windfarm_smu_sat: semaphore to mutex
  [POWERPC] macintosh: therm_pm72: driver_lock semaphore to mutex
2008-05-05 15:48:53 -07:00
Peter Zijlstra
3e51f33fcc sched: add optional support for CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
this replaces the rq->clock stuff (and possibly cpu_clock()).

 - architectures that have an 'imperfect' hardware clock can set
   CONFIG_HAVE_UNSTABLE_SCHED_CLOCK

 - the 'jiffie' window might be superfulous when we update tick_gtod
   before the __update_sched_clock() call in sched_clock_tick()

 - cpu_clock() might be implemented as:

     sched_clock_cpu(smp_processor_id())

   if the accuracy proves good enough - how far can TSC drift in a
   single jiffie when considering the filtering and idle hooks?

[ mingo@elte.hu: various fixes and cleanups ]

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-05 23:56:18 +02:00
Ingo Molnar
690229a091 sched: make clock sync tunable by architecture code
make time_sync_thresh tunable to architecture code.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-05 23:56:18 +02:00
Gregory Haskins
8ae121ac86 sched: fix RT task-wakeup logic
Dmitry Adamushko pointed out a logic error in task_wake_up_rt() where we
will always evaluate to "true".  You can find the thread here:

http://lkml.org/lkml/2008/4/22/296

In reality, we only want to try to push tasks away when a wake up request is
not going to preempt the current task.  So lets fix it.

Note: We introduce test_tsk_need_resched() instead of open-coding the flag
check so that the merge-conflict with -rt should help remind us that we
may need to support NEEDS_RESCHED_DELAYED in the future, too.

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
CC: Dmitry Adamushko <dmitry.adamushko@gmail.com>
CC: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-05 23:56:18 +02:00
Linus Torvalds
108c196184 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
  x86 PCI: call dmi_check_pciprobe()
  x86/pci: add pci=skip_isa_align command lines.
  x86/pci: remove flag in pci_cfg_space_size_ext
  x86: fix section mismatch in pci_scan_bus
2008-05-05 12:39:10 -07:00
Harvey Harrison
688b744d8b kgdb: fix signedness mixmatches, add statics, add declaration to header
Noticed by sparse:
arch/x86/kernel/kgdb.c:556:15: warning: symbol 'kgdb_arch_pc' was not declared. Should it be static?
kernel/kgdb.c:149:8: warning: symbol 'kgdb_do_roundup' was not declared. Should it be static?
kernel/kgdb.c:193:22: warning: symbol 'kgdb_arch_pc' was not declared. Should it be static?
kernel/kgdb.c:712:5: warning: symbol 'remove_all_break' was not declared. Should it be static?

Related to kgdb_hex2long:
arch/x86/kernel/kgdb.c:371:28: warning: incorrect type in argument 2 (different signedness)
arch/x86/kernel/kgdb.c:371:28:    expected long *long_val
arch/x86/kernel/kgdb.c:371:28:    got unsigned long *<noident>
kernel/kgdb.c:469:27: warning: incorrect type in argument 2 (different signedness)
kernel/kgdb.c:469:27:    expected long *long_val
kernel/kgdb.c:469:27:    got unsigned long *<noident>
kernel/kgdb.c:470:27: warning: incorrect type in argument 2 (different signedness)
kernel/kgdb.c:470:27:    expected long *long_val
kernel/kgdb.c:470:27:    got unsigned long *<noident>
kernel/kgdb.c:894:27: warning: incorrect type in argument 2 (different signedness)
kernel/kgdb.c:894:27:    expected long *long_val
kernel/kgdb.c:894:27:    got unsigned long *<noident>
kernel/kgdb.c:895:27: warning: incorrect type in argument 2 (different signedness)
kernel/kgdb.c:895:27:    expected long *long_val
kernel/kgdb.c:895:27:    got unsigned long *<noident>
kernel/kgdb.c:1127:28: warning: incorrect type in argument 2 (different signedness)
kernel/kgdb.c:1127:28:    expected long *long_val
kernel/kgdb.c:1127:28:    got unsigned long *<noident>
kernel/kgdb.c:1132:25: warning: incorrect type in argument 2 (different signedness)
kernel/kgdb.c:1132:25:    expected long *long_val
kernel/kgdb.c:1132:25:    got unsigned long *<noident>

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2008-05-05 07:13:21 -05:00
Emil Medve
b41e5fffe8 [POWERPC] devres: Add devm_ioremap_prot()
We provide an ioremap_flags, so this provides a corresponding
devm_ioremap_prot.  The slight name difference is at Ben
Herrenschmidt's request as he plans on changing ioremap_flags to
ioremap_prot in the future.

Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Acked-by: Tejun Heo <htejun@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-05-05 16:47:14 +10:00
Ingo Molnar
e73b65f1db sysfs: build fix
x86.git testing found the following build failure on v2.6.26-rc1:

  In file included from include/linux/kobject.h:22,
                   from include/linux/module.h:17,
                   from include/linux/crypto.h:22,
                   from arch/x86/kernel/asm-offsets_32.c:8,
                   from arch/x86/kernel/asm-offsets.c:3:
  include/linux/sysfs.h:201: error: redefinition of 'sysfs_update_group'
  include/linux/sysfs.h:195: error: previous definition of 'sysfs_update_group' was here
  make[1]: *** [arch/x86/kernel/asm-offsets.s] Error 1
  make: *** [prepare0] Error 2

with the following config:

    http://redhat.com/~mingo/misc/config-Sun_May__4_07_09_30_CEST_2008.bad

the reason for the build failure is the duplicate definition of the
sysfs_update_group() inline function in include/linux/sysfs.h.

The duplication was a merge error: it was added via -mm by commit
v2.6.25-7262-g2850699, "sysfs: sysfs_update_group stub for
CONFIG_SYSFS=n" a day before v2.6.26-rc1, but a day before that the same
commit was already merged upstream via the sysfs tree, with commit
v2.6.25-7211-g1cbfb7a.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-04 17:07:03 -07:00
Linus Torvalds
afa26be86b Merge git://git.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-hrt
* git://git.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-hrt:
  clocksource: allow read access to available/current_clocksource
  clocksource: Fix permissions for available_clocksource
  hrtimer: remove duplicate helper function
2008-05-03 13:51:10 -07:00
Linus Torvalds
38e80121bd Merge git://git.infradead.org/battery-2.6
* git://git.infradead.org/battery-2.6:
  PMU battery: filenames in sysfs with spaces
  pda_power: add init and exit function callbacks
2008-05-03 10:57:57 -07:00
Linus Torvalds
4f9faaace2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (47 commits)
  rose: Wrong list_lock argument in rose_node seqops
  netns: Fix reassembly timer to use the right namespace
  netns: Fix device renaming for sysfs
  bnx2: Update version to 1.7.5.
  bnx2: Update RV2P firmware for 5709.
  bnx2: Zero out context memory for 5709.
  bnx2: Fix register test on 5709.
  bnx2: Fix remote PHY initial link state.
  bnx2: Refine remote PHY locking.
  bridge: forwarding table information for >256 devices
  tg3: Update version to 3.92
  tg3: Add link state reporting to UMP firmware
  tg3: Fix ethtool loopback test for 5761 BX devices
  tg3: Fix 5761 NVRAM sizes
  tg3: Use constant 500KHz MI clock on adapters with a CPMU
  hci_usb.h: fix hard-to-trigger race
  dccp: ccid2.c, ccid3.c use clamp(), clamp_t()
  net: remove NR_CPUS arrays in net/core/dev.c
  net: use get/put_unaligned_* helpers
  bluetooth: use get/put_unaligned_* helpers
  ...
2008-05-03 10:18:21 -07:00
Linus Torvalds
c36c804559 Merge git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
  [POWERPC] Bolt in SLB entry for kernel stack on secondary cpus
  [POWERPC] PS3: Update ps3_defconfig
  [POWERPC] PS3: Remove unsupported wakeup sources
  [POWERPC] PS3: Make ps3_virq_setup and ps3_virq_destroy static
  [POWERPC] PS3: Add time include to lpm
  [POWERPC] Fix slb.c compile warnings
  [POWERPC] Xilinx: Fix compile warnings
  [POWERPC] Squash build warning for print of resource_size_t in fsl_soc.c
  [RAPIDIO] fix current kernel-doc notation
  [POWERPC] 86xx: mpc8610_hpcd: add support for PCI Express x8 slot
  Fix a potential issue in mpc52xx uart driver
  [POWERPC] mpc5200: Allow for fixed speed MII configurations
  [POWERPC] 86xx: Fix the wrong serial1 interrupt for 8610 board
2008-05-03 10:01:33 -07:00
Oliver Hartkopp
4346f65426 hrtimer: remove duplicate helper function
The helper function hrtimer_callback_running() is used in
kernel/hrtimer.c as well as in the updated net/can/bcm.c which now
supports hrtimers. Moving the helper function to hrtimer.h removes the
duplicate definition in the C-files.

Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-03 18:11:48 +02:00
Stephen Hemminger
ae4f8fca40 bridge: forwarding table information for >256 devices
The forwarding table binary interface (my bad choice), only exposes
the port number of the first 8 bits. The bridge code was limited to
256 ports at the time, but now the kernel supports up 1024 ports, so
the upper bits are lost when doing:

   brctl showmacs

The fix is to squeeze the extra bits into small hole left in data
structure, to maintain binary compatiablity.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-02 16:53:33 -07:00
Philipp Zabel
f6b6b180b4 pda_power: add init and exit function callbacks
This adds init/exit function callbacks to pda_power, to
provide a place where the platform code can request/free
GPIOs that it wants to use in the is_ac_online, is_usb_online
and set_charge functions.

Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com>
Signed-off-by: Anton Vorontsov <cbouatmailru@gmail.com>
2008-05-03 03:39:55 +04:00
Linus Torvalds
b66e1f11eb Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  [PATCH] fix sysctl_nr_open bugs
  [PATCH] sanitize anon_inode_getfd()
  [PATCH] split linux/file.h
  [PATCH] make osf_select() use core_sys_select()
  [PATCH] remove horrors with irix tty ioctls handling
  [PATCH] fix file and descriptor handling in perfmon
2008-05-02 11:23:14 -07:00
Linus Torvalds
1be1d6b7f3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6: (32 commits)
  USB GADGET/PERIPHERAL: g_file_storage Bulk-Only Transport compliance, clear-feature ignore
  USB GADGET/PERIPHERAL: g_file_storage Bulk-Only Transport compliance
  usb_serial: some coding style fixes
  USB: Remove redundant dependencies on USB_ATM.
  USB: UHCI: disable remote wakeup when it's not needed
  USB: OHCI: work around bogus compiler warning
  USB: add Cypress c67x00 OTG controller HCD driver
  USB: add Cypress c67x00 OTG controller core driver
  USB: add Cypress c67x00 low level interface code
  USB: airprime: unlock mutex instead of trying to lock it again
  USB: storage: Update mailling list address
  USB: storage: UNUSUAL_DEVS() for PanDigital Picture frame.
  USB: Add the USB 2.0 extension descriptor.
  USB: add more FTDI device ids
  USB: fix cannot work usb storage when using ohci-sm501
  usb: gadget zero timer init fix
  usb: gadget zero style fixups (mostly whitespace)
  usb serial gadget: CDC ACM fixes
  usb: pxa27x_udc driver
  USB: INTOVA Pixtreme camera mass storage device
  ...
2008-05-02 11:03:08 -07:00
Linus Torvalds
37b6a04fd9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6:
  driver-core: add dev_name() to help transition away from using bus_id
2008-05-02 11:02:53 -07:00
David Lopo
a5e54b0dbb USB GADGET/PERIPHERAL: g_file_storage Bulk-Only Transport compliance
Gadget can tell controller driver to ignore Clear-Feature(HALT_ENDPOINT)
This API change enables future support for Bulk-Only Transport compliance

Signed-off-by: David Lopo <lopo.david@gmail.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-05-02 10:25:58 -07:00
Peter Korsgaard
b02b371e6d USB: add Cypress c67x00 OTG controller core driver
This patch add the core driver for the c67x00 USB OTG controller.  The core
driver is responsible for the platform bus binding and creating either
USB HCD or USB Gadget instances for each of the serial interface engines
on the chip.

This driver does not directly implement the HCD or gadget behaviours; it
just controls access to the chip.

Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk>
Acked-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-05-02 10:25:56 -07:00
Sarah Sharp
35e5437e8c USB: Add the USB 2.0 extension descriptor.
This device descriptor was added by the recent USB Link Power Management (LPM)
ECN.  It indicates whether the USB device supports LPM.

This descriptor is grouped under a Binary Device Object Store (BOS) descriptor.
Update the BOS comments to indicate any USB device (not just wireless USB
devices) can implement BOS descriptors.

Signed-off-by: Sarah Sharp <sarah.a.sharp@intel.com>
Acked-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-05-02 10:25:54 -07:00
Kay Sievers
06916639e2 driver-core: add dev_name() to help transition away from using bus_id
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-05-02 10:12:28 -07:00
Linus Torvalds
3482a6f1d1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-genirq
* git://git.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-genirq:
  genirq: reenable a nobody cared disabled irq when a new driver arrives
2008-05-02 08:22:36 -07:00
Ryan Harper
48e4043d45 virtio: add virtio disk geometry feature
Rather than faking up some geometry, allow the backend to push the disk
geometry via virtio pci config option.  Keep the old geo code around for
compatibility.

Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (modified to single struct)
2008-05-02 21:50:51 +10:00
Rusty Russell
c45a6816c1 virtio: explicit advertisement of driver features
A recent proposed feature addition to the virtio block driver revealed
some flaws in the API: in particular, we assume that feature
negotiation is complete once a driver's probe function returns.

There is nothing in the API to require this, however, and even I
didn't notice when it was violated.

So instead, we require the driver to specify what features it supports
in a table, we can then move the feature negotiation into the virtio
core.  The intersection of device and driver features are presented in
a new 'features' bitmap in the struct virtio_device.

Note that this highlights the difference between Linux unsigned-long
bitmaps where each unsigned long is in native endian, and a
straight-forward little-endian array of bytes.

Drivers can still remove feature bits in their probe routine if they
really have to.

API changes:
- dev->config->feature() no longer gets and acks a feature.
- drivers should advertise their features in the 'feature_table' field
- use virtio_has_feature() for extra sanity when checking feature bits

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-05-02 21:50:50 +10:00
Rusty Russell
72e61eb40b virtio: change config to guest endian.
A recent proposed feature addition to the virtio block driver revealed
some flaws in the API, in particular how easy it is to break big
endian machines.

The virtio config space was originally chosen to be little-endian,
because we thought the config might be part of the PCI config space
for virtio_pci.  It's actually a separate mmio region, so that
argument holds little water; as only x86 is currently using the virtio
mechanism, we can change this (but must do so now, before the
impending s390 merge).

API changes:
- __virtio_config_val() just becomes a striaght vdev->config_get() call.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-05-02 21:50:50 +10:00
Rusty Russell
5539ae9613 virtio: finer-grained features for virtio_net
So, we previously had a 'VIRTIO_NET_F_GSO' bit which meant that 'the
host can handle csum offload, and any TSO (v4&v6 incl ECN) or UFO
packets you might want to send.  I thought this was good enough for
Linux, but it actually isn't, since we don't do UFO in software.

So, add separate feature bits for what the host can handle.  Add
equivalent ones for the guest to say what it can handle, because LRO
is coming too (thanks Herbert!).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-05-02 21:50:47 +10:00
Rusty Russell
cb38fa23c1 virtio: de-structify virtio_block status byte
Ron Minnich points out that a struct containing a char is not always
sizeof(char); simplest to remove the structure to avoid confusion.

Cc: "ron minnich" <rminnich@gmail.com>

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-05-02 21:50:45 +10:00
Christian Borntraeger
8147313287 virtio: export more headers to userspace
Rusty,

is there a reason why we dont export the virtio headers for
9p, balloon, console, pci, and virtio_ring? kvm uses make sync,
but I think it is still useful to heave these headers exported
as they might be useful for other userspace tools.

I dont export virtio.h, because it does not seem to have useful
information for userspace and it requires scatterlist.h which is
also not exported. See also my other mail about your "virtio:
change config to guest endian." patch.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-05-02 21:50:44 +10:00
Thomas Gleixner
1adb0850a1 genirq: reenable a nobody cared disabled irq when a new driver arrives
Uwe Kleine-Koenig has some strange hardware where one of the shared
interrupts can be asserted during boot before the appropriate driver
loads. Requesting the shared irq line from another driver result in a
spurious interrupt storm which finally disables the interrupt line.

I have seen similar behaviour on resume before (the hardware does not
work anymore so I can not verify).

Change the spurious disable logic to increment the disable depth and
mark the interrupt with an extra flag which allows us to reenable the
interrupt when a new driver arrives which requests the same irq
line. In the worst case this will disable the irq again via the
spurious trap, but there is a decent chance that the new driver is the
one which can handle the already asserted interrupt and makes the box
usable again.

Eric Biederman said further: This case also happens on a regular basis
in kdump kernels where we deliberately don't shutdown the hardware
before starting the new kernel.  This patch should reduce the need for
using irqpoll in that situation by a small amount.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-and-Acked-by: Uwe Kleine-König <Uwe.Kleine-Koenig@digi.com>
2008-05-02 13:40:34 +02:00
Randy Dunlap
9941d945f4 [RAPIDIO] fix current kernel-doc notation
Fix current (-git16) missing docbook/kernel-doc notation in RapidIO files.

Warning(linux-2.6.25-git16//include/linux/rio.h:187): No description found for parameter 'sys_size'
Warning(linux-2.6.25-git16//include/linux/rio.h:187): No description found for parameter 'phy_type'

Warning(linux-2.6.25-git16//arch/powerpc/sysdev/fsl_rio.c:188): No description found for parameter 'mport'
Warning(linux-2.6.25-git16//arch/powerpc/sysdev/fsl_rio.c:224): No description found for parameter 'mport'
Warning(linux-2.6.25-git16//arch/powerpc/sysdev/fsl_rio.c:245): No description found for parameter 'mport'
Warning(linux-2.6.25-git16//arch/powerpc/sysdev/fsl_rio.c:270): No description found for parameter 'mport'
Warning(linux-2.6.25-git16//arch/powerpc/sysdev/fsl_rio.c:311): No description found for parameter 'mport'
Warning(linux-2.6.25-git16//arch/powerpc/sysdev/fsl_rio.c:996): No description found for parameter 'dev'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2008-05-01 23:01:54 -05:00
Kirill A. Shutemov
2218228392 Make linux/wireless.h be able to compile
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-05-01 17:38:35 -04:00
Linus Torvalds
2c4aabcca8 Merge git://git.infradead.org/mtd-2.6
* git://git.infradead.org/mtd-2.6:
  [MTD][NOR] Add physical address to point() method
  [JFFS2] Track parent inode for directories (for NFS export)
  [JFFS2] Invert last argument of jffs2_gc_fetch_inode(), make it boolean.
  [JFFS2] Quiet lockdep false positive.
  [JFFS2] Clean up jffs2_alloc_inode() and jffs2_i_init_once()
  [MTD] Delete long-unused jedec.h header file.
  [MTD] [NAND] at91_nand: use at91_nand_{en,dis}able consistently.
2008-05-01 11:15:28 -07:00
Jared Hulbert
a98889f3d8 [MTD][NOR] Add physical address to point() method
Adding the ability to get a physical address from point() in addition
to virtual address.  This physical address is required for XIP of
userspace code from flash.

Signed-off-by: Jared Hulbert <jaredeh@gmail.com>
Reviewed-by: Jörn Engel <joern@logfs.org>
Acked-by: Nicolas Pitre <nico@cam.org>
Acked-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2008-05-01 18:59:11 +01:00
Al Viro
2030a42cec [PATCH] sanitize anon_inode_getfd()
a) none of the callers even looks at inode or file returned by anon_inode_getfd()
b) any caller that would try to look at those would be racy, since by the time
it returns we might have raced with close() from another thread and that
file would be pining for fjords.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-05-01 13:08:50 -04:00
Al Viro
9f3acc3140 [PATCH] split linux/file.h
Initial splitoff of the low-level stuff; taken to fdtable.h

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-05-01 13:08:16 -04:00
Al Viro
a2dcb44c3c [PATCH] make osf_select() use core_sys_select()
... instead of open-coding it

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-05-01 13:07:28 -04:00
Linus Torvalds
03fc922f40 Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
  module: add MODULE_STATE_GOING notifier call
  module: Enhance verify_export_symbols
  module: set unused_gpl_crcs instead of overwriting unused_crcs
  module: neaten __find_symbol, rename to find_symbol
  module: reduce module image and resident size
  module: make module_sect_attrs private to kernel/module.c
2008-05-01 08:26:56 -07:00
Jan Kara
c32e026efc quota: add a convenience macro for filesystems
Note that it cannot be an inline function because we don't have struct
super_block prototype...

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-01 08:04:01 -07:00
Scott Kilau
99da9047e6 jsm: add new supported board to jsm serial driver
Add new PCI Express Neo/JSM board to the supported list of drivers in
the JSM driver.

Signed-off-by: Scott Kilau <scottk@digi.com>
Acked-by: Ananda V <avenkat@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-01 08:04:01 -07:00
Randy Dunlap
2850699c59 sysfs: sysfs_update_group stub for CONFIG_SYSFS=n
scsi_transport_spi uses sysfs_update_group() when CONFIG_SYSFS=n, so provide a
stub for it.

next-20080423/drivers/scsi/scsi_transport_spi.c:1467: error: implicit declaration of function 'sysfs_update_group'
make[3]: *** [drivers/scsi/scsi_transport_spi.o] Error 1

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Greg KH <greg@kroah.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-01 08:03:59 -07:00
David Brownell
34990cf702 Add a new sysfs_streq() string comparison function
Add a new sysfs_streq() string comparison function, which ignores
the trailing newlines found in sysfs inputs.  By example:

	sysfs_streq("a", "b")	==> false
	sysfs_streq("a", "a")	==> true
	sysfs_streq("a", "a\n")	==> true
	sysfs_streq("a\n", "a")	==> true

This is intended to simplify parsing of sysfs inputs, letting them
avoid the need to manually strip off newlines from inputs.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Acked-by: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-01 08:03:59 -07:00
Roman Zippel
7dffa3c673 ntp: handle leap second via timer
Remove the leap second handling from second_overflow(), which doesn't have to
check for it every second anymore.  With CONFIG_NO_HZ this also makes sure the
leap second is handled close to the full second.  Additionally this makes it
possible to abort a leap second properly by resetting the STA_INS/STA_DEL
status bits.

Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-01 08:03:59 -07:00
Roman Zippel
8383c42399 ntp: remove current_tick_length()
current_tick_length used to do a little more, but now it just returns
tick_length, which we can also access directly at the few places, where it's
needed.

Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-01 08:03:59 -07:00
Roman Zippel
7fc5c78409 ntp: rename TICK_LENGTH_SHIFT to NTP_SCALE_SHIFT
As TICK_LENGTH_SHIFT is used for more than just the tick length, the name
isn't quite approriate anymore, so this renames it to NTP_SCALE_SHIFT.

Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-01 08:03:59 -07:00
Roman Zippel
153b5d054a ntp: support for TAI
This adds support for setting the TAI value (International Atomic Time).  The
value is reported back to userspace via timex (as we don't have a
ntp_gettime() syscall).

Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-01 08:03:59 -07:00
Roman Zippel
9f14f669d1 ntp: increase time_offset resolution
time_offset is already a 64bit value but its resolution barely used, so this
makes better use of it by replacing SHIFT_UPDATE with TICK_LENGTH_SHIFT.

Side note: the SHIFT_HZ in SHIFT_UPDATE was incorrect for CONFIG_NO_HZ and the
primary reason for changing time_offset to 64bit to avoid the overflow.

Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-01 08:03:58 -07:00
Roman Zippel
074b3b8794 ntp: increase time_freq resolution
This changes time_freq to a 64bit value and makes it static (the only outside
user had no real need to modify it).  Intermediate values were already 64bit,
so the change isn't that big, but it saves a little in shifts by replacing
SHIFT_NSEC with TICK_LENGTH_SHIFT.  PPM_SCALE is then used to convert between
user space and kernel space representation.

Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-01 08:03:58 -07:00
Roman Zippel
eea83d896e ntp: NTP4 user space bits update
This adds a few more things from the ntp nanokernel related to user space.
It's now possible to select the resolution used of some values via STA_NANO
and the kernel reports in which mode it works (pll/fll).

If some values for adjtimex() are outside the acceptable range, they are now
simply normalized instead of letting the syscall fail.  I removed
MOD_CLKA/MOD_CLKB as the mapping didn't really makes any sense, the kernel
doesn't support setting the clock.

Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-01 08:03:58 -07:00
Roman Zippel
f8bd2258e2 remove div_long_long_rem
x86 is the only arch right now, which provides an optimized for
div_long_long_rem and it has the downside that one has to be very careful that
the divide doesn't overflow.

The API is a little akward, as the arguments for the unsigned divide are
signed.  The signed version also doesn't handle a negative divisor and
produces worse code on 64bit archs.

There is little incentive to keep this API alive, so this converts the few
users to the new API.

Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-01 08:03:58 -07:00
Roman Zippel
6f6d6a1a6a rename div64_64 to div64_u64
Rename div64_64 to div64_u64 to make it consistent with the other divide
functions, so it clearly includes the type of the divide.  Move its definition
to math64.h as currently no architecture overrides the generic implementation.
 They can still override it of course, but the duplicated declarations are
avoided.

Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Cc: Avi Kivity <avi@qumranet.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-01 08:03:58 -07:00
Roman Zippel
2418f4f28f introduce explicit signed/unsigned 64bit divide
The current do_div doesn't explicitly say that it's unsigned and the signed
counterpart is missing, which is e.g.  needed when dealing with time values.

This introduces 64bit signed/unsigned divide functions which also attempts to
cleanup the somewhat awkward calling API, which often requires the use of
temporary variables for the dividend.  To avoid the need for temporary
variables everywhere for the remainder, each divide variant also provides a
version which doesn't return the remainder.

Each architecture can now provide optimized versions of these function,
otherwise generic fallback implementations will be used.

As an example I provided an alternative for the current x86 divide, which
avoids the asm casts and using an union allows gcc to generate better code.
It also avoids the upper divde in a few more cases, where the result is known
(i.e.  upper quotient is zero).

Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-01 08:03:58 -07:00
Rusty Russell
ea01e798e2 module: reduce module image and resident size
Resulting reduction (x86-64, gcc 4.1.2) with my (special purpose, i.e.
much reduced) configurations:
- 16k kernel resident size
- 180k module resident size
- 10k module image size

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-05-01 21:14:59 +10:00
Rusty Russell
a58730c421 module: make module_sect_attrs private to kernel/module.c
No-one else is using these afaics.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-05-01 21:14:59 +10:00
David S. Miller
c2a3b23345 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/linville/wireless-2.6 2008-05-01 02:06:32 -07:00
Luis Carlos Cobo
51ceddade0 mac80211: use 4-byte mesh sequence number
This follows the new 802.11s/D2.0 draft.

Signed-off-by: Luis Carlos Cobo <luisca@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2008-04-30 20:34:26 -04:00
Greg Kroah-Hartman
c3bb7fadaf klist: fix coding style errors in klist.h and klist.c
Finally clean up the odd spacing in these files.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-30 16:52:58 -07:00
Kay Sievers
c3b19ff06e driver core: remove no longer used "struct class_device"
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-30 16:52:49 -07:00
Kumar Gala
4f452e8aa4 devres: support addresses greater than an unsigned long via dev_ioremap
Use a resource_size_t instead of unsigned long since some arch's are
capable of having ioremap deal with addresses greater than the size of a
unsigned long.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-30 16:52:48 -07:00
Randy Dunlap
1cbfb7a5ac sysfs: sysfs_update_group stub for CONFIG_SYSFS=n
scsi_transport_spi uses sysfs_update_group() when CONFIG_SYSFS=n,
so provide a stub for it.

next-20080423/drivers/scsi/scsi_transport_spi.c:1467: error: implicit declaration of function 'sysfs_update_group'
make[3]: *** [drivers/scsi/scsi_transport_spi.o] Error 1

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-30 16:52:47 -07:00
Tejun Heo
93dd40013f klist: implement klist_add_{after|before}()
Add klist_add_after() and klist_add_before() which puts a new node
after and before an existing node, respectively.  This is useful for
callers which need to keep klist ordered.  Note that synchronizing
between simultaneous additions for ordering is the caller's
responsibility.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-30 16:52:47 -07:00
Tejun Heo
1da43e4a9e klist: implement KLIST_INIT() and DEFINE_KLIST()
klist is missing static initializers and definition helper.  Add them.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-30 16:52:47 -07:00
Linus Torvalds
08acd4f8af Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (179 commits)
  ACPI: Fix acpi_processor_idle and idle= boot parameters interaction
  acpi: fix section mismatch warning in pnpacpi
  intel_menlo: fix build warning
  ACPI: Cleanup: Remove unneeded, multiple local dummy variables
  ACPI: video - fix permissions on some proc entries
  ACPI: video - properly handle errors when registering proc elements
  ACPI: video - do not store invalid entries in attached_array list
  ACPI: re-name acpi_pm_ops to acpi_suspend_ops
  ACER_WMI/ASUS_LAPTOP: fix build bug
  thinkpad_acpi: fix possible NULL pointer dereference if kstrdup failed
  ACPI: check a return value correctly in acpi_power_get_context()
  #if 0 acpi/bay.c:eject_removable_drive()
  eeepc-laptop: add hwmon fan control
  eeepc-laptop: add backlight
  eeepc-laptop: add base driver
  ACPI: thinkpad-acpi: bump up version to 0.20
  ACPI: thinkpad-acpi: fix selects in Kconfig
  ACPI: thinkpad-acpi: use a private workqueue
  ACPI: thinkpad-acpi: fluff really minor fix
  ACPI: thinkpad-acpi: use uppercase for "LED" on user documentation
  ...

Fixed conflicts in drivers/acpi/video.c and drivers/misc/intel_menlow.c
manually.
2008-04-30 11:52:52 -07:00
Len Brown
008238b54a Merge branch 'pnp' into release 2008-04-30 13:59:05 -04:00
Ingo Molnar
ae3a0064e6 inlining: do not allow gcc below version 4 to optimize inlining
fix the condition to match intention: always use the old inlining
behavior on all gcc versions below 4.

this should solve the UML build problem.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:42:49 -07:00
Robert P. J. Day
969a19f1c4 Drop the exporting of empty <linux/byteorder/generic.h>
Fix up the contents of <linux/byteorder/> so that it doesn't export a
content-free generic.h to user space.  This involves:

* Removing the __KERNEL__ tests from generic.h and dropping it from
  Kbuild.
* Wrapping the inclusions of generic.h in both big_endian.h and
  little_endian.h in __KERNEL__ tests.
* Shifting big_endian.h and little_endian.h from header-y to
  unifdef-y in Kbuild.

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:54 -07:00
Robert P. J. Day
735643ee6c Remove "#ifdef __KERNEL__" checks from unexported headers
Remove the "#ifdef __KERNEL__" tests from unexported header files in
linux/include whose entire contents are wrapped in that preprocessor
test.

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:54 -07:00
Thomas Gleixner
237fc6e7a3 add hrtimer specific debugobjects code
hrtimers have now dynamic users in the network code.  Put them under
debugobjects surveillance as well.

Add calls to the generic object debugging infrastructure and provide fixup
functions which allow to keep the system alive when recoverable problems have
been detected by the object debugging core code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Greg KH <greg@kroah.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:53 -07:00
Thomas Gleixner
c6f3a97f86 debugobjects: add timer specific object debugging code
Add calls to the generic object debugging infrastructure and provide fixup
functions which allow to keep the system alive when recoverable problems have
been detected by the object debugging core code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Greg KH <greg@kroah.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:53 -07:00
Thomas Gleixner
3ac7fe5a4a infrastructure to debug (dynamic) objects
We can see an ever repeating problem pattern with objects of any kind in the
kernel:

1) freeing of active objects
2) reinitialization of active objects

Both problems can be hard to debug because the crash happens at a point where
we have no chance to decode the root cause anymore.  One problem spot are
kernel timers, where the detection of the problem often happens in interrupt
context and usually causes the machine to panic.

While working on a timer related bug report I had to hack specialized code
into the timer subsystem to get a reasonable hint for the root cause.  This
debug hack was fine for temporary use, but far from a mergeable solution due
to the intrusiveness into the timer code.

The code further lacked the ability to detect and report the root cause
instantly and keep the system operational.

Keeping the system operational is important to get hold of the debug
information without special debugging aids like serial consoles and special
knowledge of the bug reporter.

The problems described above are not restricted to timers, but timers tend to
expose it usually in a full system crash.  Other objects are less explosive,
but the symptoms caused by such mistakes can be even harder to debug.

Instead of creating specialized debugging code for the timer subsystem a
generic infrastructure is created which allows developers to verify their code
and provides an easy to enable debug facility for users in case of trouble.

The debugobjects core code keeps track of operations on static and dynamic
objects by inserting them into a hashed list and sanity checking them on
object operations and provides additional checks whenever kernel memory is
freed.

The tracked object operations are:
- initializing an object
- adding an object to a subsystem list
- deleting an object from a subsystem list

Each operation is sanity checked before the operation is executed and the
subsystem specific code can provide a fixup function which allows to prevent
the damage of the operation.  When the sanity check triggers a warning message
and a stack trace is printed.

The list of operations can be extended if the need arises.  For now it's
limited to the requirements of the first user (timers).

The core code enqueues the objects into hash buckets.  The hash index is
generated from the address of the object to simplify the lookup for the check
on kfree/vfree.  Each bucket has it's own spinlock to avoid contention on a
global lock.

The debug code can be compiled in without being active.  The runtime overhead
is minimal and could be optimized by asm alternatives.  A kernel command line
option enables the debugging code.

Thanks to Ingo Molnar for review, suggestions and cleanup patches.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Greg KH <greg@kroah.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:53 -07:00
Thomas Gleixner
30327acf78 slab: add a flag to prevent debug_free checks on a kmem_cache
This is a preperatory patch for the debugobjects infrastructure.  The flag
prevents debug_free checks on kmem_caches.  This is necessary to avoid
resursive calls into a debug mechanism which uses a kmem_cache itself.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:53 -07:00
Harvey Harrison
bdf4bbaaee Add macros similar to min/max/min_t/max_t
Also, change the variable names used in the min/max macros to avoid shadowed
variable warnings when min/max min_t/max_t are nested.

Small formatting changes to make all the macros have a similar form.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix v4l build]
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Michael Buesch <mb@bu3sch.de>
Cc: "John W. Linville" <linville@tuxdriver.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Dmitry Torokhov <dtor@mail.ru>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:53 -07:00
Samuel Thibault
f7511d5f66 Basic braille screen reader support
This adds a minimalistic braille screen reader support.  This is meant to
be used by blind people e.g.  on boot failures or when / cannot be mounted
etc and thus the userland screen readers can not work.

[akpm@linux-foundation.org: fix exports]
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Cc: Jiri Kosina <jikos@jikos.cz>
Cc: Dmitry Torokhov <dtor@mail.ru>
Acked-by: Alan Cox <alan@redhat.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:52 -07:00
Christoph Hellwig
86098fa011 reiserfs: use open_bdev_excl
Use the proper helper to open a blockdevice by name for filesystem use,
this makes sure it's properly claimed (also added for open-by-number) and
gets rid of the struct file abuse.

Tested by mounting a reiserfs filesystem with external journal.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Acked-by: Edward Shishkin <edward.shishkin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:51 -07:00
Miklos Szeredi
fc3ba692a4 mm: Add NR_WRITEBACK_TEMP counter
Fuse will use temporary buffers to write back dirty data from memory mappings
(normal writes are done synchronously).  This is needed, because there cannot
be any guarantee about the time in which a write will complete.

By using temporary buffers, from the MM's point if view the page is written
back immediately.  If the writeout was due to memory pressure, this
effectively migrates data from a full zone to a less full zone.

This patch adds a new counter (NR_WRITEBACK_TEMP) for the number of pages used
as temporary buffers.

[Lee.Schermerhorn@hp.com: add vmstat_text for NR_WRITEBACK_TEMP]
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:50 -07:00
Miklos Szeredi
dd5656e59c mm: bdi: export bdi_writeout_inc()
Fuse needs this for writable mmap support.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:50 -07:00
Miklos Szeredi
e4ad08fe64 mm: bdi: add separate writeback accounting capability
Add a new BDI capability flag: BDI_CAP_NO_ACCT_WB.  If this flag is
set, then don't update the per-bdi writeback stats from
test_set_page_writeback() and test_clear_page_writeback().

Misc cleanups:

 - convert bdi_cap_writeback_dirty() and friends to static inline functions
 - create a flag that includes all three dirty/writeback related flags,
   since almst all users will want to have them toghether

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:50 -07:00
Miklos Szeredi
76f1418b48 mm: bdi: move statistics to debugfs
Move BDI statistics to debugfs:

   /sys/kernel/debug/bdi/<bdi>/stats

Use postcore_initcall() to initialize the sysfs class and debugfs,
because debugfs is initialized in core_initcall().

Update descriptions in ABI documentation.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:50 -07:00
Peter Zijlstra
a42dde0415 mm: bdi: allow setting a maximum for the bdi dirty limit
Add "max_ratio" to /sys/class/bdi.  This indicates the maximum percentage of
the global dirty threshold allocated to this bdi.

[mszeredi@suse.cz]

 - fix parsing in max_ratio_store().
 - export bdi_set_max_ratio() to modules
 - limit bdi_dirty with bdi->max_ratio
 - document new sysfs attribute

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:50 -07:00
Peter Zijlstra
189d3c4a94 mm: bdi: allow setting a minimum for the bdi dirty limit
Under normal circumstances each device is given a part of the total write-back
cache that relates to its current avg writeout speed in relation to the other
devices.

min_ratio - allows one to assign a minimum portion of the write-back cache to
a particular device.  This is useful in situations where you might want to
provide a minimum QoS.  (One request for this feature came from flash based
storage people who wanted to avoid writing out at all costs - they of course
needed some pdflush hacks as well)

max_ratio - allows one to assign a maximum portion of the dirty limit to a
particular device.  This is useful in situations where you want to avoid one
device taking all or most of the write-back cache.  Eg.  an NFS mount that is
prone to get stuck, or a FUSE mount which you don't trust to play fair.

Add "min_ratio" to /sys/class/bdi.  This indicates the minimum percentage of
the global dirty threshold allocated to this bdi.

[mszeredi@suse.cz]

 - fix parsing in min_ratio_store()
 - document new sysfs attribute

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:50 -07:00
Peter Zijlstra
cf0ca9fe5d mm: bdi: export BDI attributes in sysfs
Provide a place in sysfs (/sys/class/bdi) for the backing_dev_info object.
This allows us to see and set the various BDI specific variables.

In particular this properly exposes the read-ahead window for all relevant
users and /sys/block/<block>/queue/read_ahead_kb should be deprecated.

With patient help from Kay Sievers and Greg KH

[mszeredi@suse.cz]

 - split off NFS and FUSE changes into separate patches
 - document new sysfs attributes under Documentation/ABI
 - do bdi_class_init as a core_initcall, otherwise the "default" BDI
   won't be initialized
 - remove bdi_init_fmt macro, it's not used very much

[akpm@linux-foundation.org: fix ia64 warning]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Acked-by: Greg KH <greg@kroah.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:49 -07:00
Pavel Emelyanov
caafa43243 pidns: make pid->level and pid_ns->level unsigned
These values represent the nesting level of a namespace and pids living in it,
and it's always non-negative.

Turning this from int to unsigned int saves some space in pid.c (11 bytes on
x86 and 64 on ia64) by letting the compiler optimize the pid_nr_ns a bit.
E.g.  on ia64 this removes the sign extension calls, which compiler adds to
optimize access to pid->nubers[ns->level].

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:49 -07:00
Oleg Nesterov
24336eaeec pids: introduce change_pid() helper
Based on Eric W. Biederman's idea.

Without tasklist_lock held task_session()/task_pgrp() can return NULL if the
caller races with setprgp()/setsid() which does detach_pid() + attach_pid().
This can happen even if task == current.

Intoduce the new helper, change_pid(), which should be used instead.  This way
the caller always sees the special pid != NULL, either old or new.

Also change the prototype of attach_pid(), it always returns 0 and nobody
check the returned value.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc:  "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:48 -07:00
Pavel Emelyanov
5cd204550b Deprecate find_task_by_pid()
There are some places that are known to operate on tasks'
global pids only:

* the rest_init() call (called on boot)
* the kgdb's getthread
* the create_kthread() (since the kthread is run in init ns)

So use the find_task_by_pid_ns(..., &init_pid_ns) there
and schedule the find_task_by_pid for removal.

[sukadev@us.ibm.com: Fix warning in kernel/pid.c]
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:48 -07:00
Sukadev Bhattiprolu
718a916338 devpts: factor out PTY index allocation
Factor out the code used to allocate/free a pts index into new interfaces,
devpts_new_index() and devpts_kill_index().  This localizes the external data
structures used in managing the pts indices.

[akpm@linux-foundation.org: undo accidental mutex2sem conversion]
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:48 -07:00
Alan Cox
39c2e60f8c tty: add throttle/unthrottle helpers
Something Arjan suggested which allows us to clean up the code nicely

Signed-off-by: Alan Cox <alan@redhat.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:47 -07:00
Alan Cox
f34d7a5b70 tty: The big operations rework
- Operations are now a shared const function block as with most other Linux
  objects

- Introduce wrappers for some optional functions to get consistent behaviour

- Wrap put_char which used to be patched by the tty layer

- Document which functions are needed/optional

- Make put_char report success/fail

- Cache the driver->ops pointer in the tty as tty->ops

- Remove various surplus lock calls we no longer need

- Remove proc_write method as noted by Alexey Dobriyan

- Introduce some missing sanity checks where certain driver/ldisc
  combinations would oops as they didn't check needed methods were present

[akpm@linux-foundation.org: fix fs/compat_ioctl.c build]
[akpm@linux-foundation.org: fix isicom]
[akpm@linux-foundation.org: fix arch/ia64/hp/sim/simserial.c build]
[akpm@linux-foundation.org: fix kgdb]
Signed-off-by: Alan Cox <alan@redhat.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:47 -07:00
Alan Cox
76b25a5509 char: switch gs, cyclades and esp to return int for put_char
Signed-off-by: Alan Cox <alan@redhat.com>
Cc: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:45 -07:00
Alan Cox
5d0fdf1e01 tty_io: fix remaining pid struct locking
This fixes the last couple of pid struct locking failures I know about.

[oleg@tv-sign.ru: clean up do_task_stat()]
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:40 -07:00
Alan Cox
47f86834bb redo locking of tty->pgrp
Historically tty->pgrp and friends were pid_t and the code "knew" they were
safe.  The change to pid structs opened up a few races and the removal of the
BKL in places made them quite hittable.  We put tty->pgrp under the ctrl_lock
for the tty.

Signed-off-by: Alan Cox <alan@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:40 -07:00
Alan Cox
04f378b198 tty: BKL pushdown
- Push the BKL down into the line disciplines
- Switch the tty layer to unlocked_ioctl
- Introduce a new ctrl_lock spin lock for the control bits
- Eliminate much of the lock_kernel use in n_tty
- Prepare to (but don't yet) call the drivers with the lock dropped
  on the paths that historically held the lock

BKL now primarily protects open/close/ldisc change in the tty layer

[jirislaby@gmail.com: a couple of fixes]
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:40 -07:00
Oleg Nesterov
53b6f9fbd3 ptrace: introduce ptrace_reparented() helper
Add another trivial helper for the sake of grep.  It also auto-documents the
fact that ->parent != real_parent implies ->ptrace.

No functional changes.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:38 -07:00
Roland McGrath
f3de272b82 signals: use HAVE_SET_RESTORE_SIGMASK
Change all the #ifdef TIF_RESTORE_SIGMASK conditionals in non-arch code to
#ifdef HAVE_SET_RESTORE_SIGMASK.  If arch code defines it first, the generic
set_restore_sigmask() using TIF_RESTORE_SIGMASK is not defined.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:37 -07:00
Roland McGrath
7648d961fc signals: set_restore_sigmask TIF_SIGPENDING
Set TIF_SIGPENDING in set_restore_sigmask.  This lets arch code take
TIF_RESTORE_SIGMASK out of the set of bits that will be noticed on return to
user mode.  On some machines those bits are scarce, and we can free this
unneeded one up for other uses.

It is probably the case that TIF_SIGPENDING is always set anyway everywhere
set_restore_sigmask() is used.  But this is some cheap paranoia in case there
is an arcane case where it might not be.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:37 -07:00
Roland McGrath
4e4c22c711 signals: add set_restore_sigmask
This adds the set_restore_sigmask() inline in <linux/thread_info.h> and
replaces every set_thread_flag(TIF_RESTORE_SIGMASK) with a call to it.  No
change, but abstracts the details of the flag protocol from all the calls.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:37 -07:00
Oleg Nesterov
fae5fa44f1 signals: fix /sbin/init protection from unwanted signals
The global init has a lot of long standing problems with the unhandled fatal
signals.

	- The "is_global_init(current)" check in get_signal_to_deliver()
	  protects only the main thread. Sub-thread can dequee the fatal
	  signal and shutdown the whole thread group except the main thread.
	  If it dequeues SIGSTOP /sbin/init will be stopped, this is not
	  right too. Note that we can't use is_global_init(->group_leader),
	  this breaks exec and this can't solve other problems we have.

	- Even if afterwards ignored, the fatal signals sets SIGNAL_GROUP_EXIT
	  on delivery. This breaks exec, has other bad implications, and this
	  is just wrong.

Introduce the new SIGNAL_UNKILLABLE flag to fix these problems.  It also helps
to solve some other problems addressed by the subsequent patches.

Currently we use this flag for the global init only, but it could also be used
by kthreads and (perhaps) by the sub-namespace inits.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:37 -07:00
Oleg Nesterov
ac5c215383 signals: join send_sigqueue() with send_group_sigqueue()
We export send_sigqueue() and send_group_sigqueue() for the only user,
posix_timer_event().  This is a bit silly, because both are just trivial
helpers on top of do_send_sigqueue() and because the we pass the unused
.si_signo parameter.

Kill them both, rename do_send_sigqueue() to send_sigqueue(), and export it.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:36 -07:00
Oleg Nesterov
6ca25b5513 kill_pid_info: don't take now unneeded tasklist_lock
Previously handle_stop_signal(SIGCONT) could drop ->siglock.  That is why
kill_pid_info(SIGCONT) takes tasklist_lock to make sure the target task can't
go away after unlock.  Not needed now.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:34 -07:00
Oleg Nesterov
e442055193 signals: re-assign CLD_CONTINUED notification from the sender to reciever
Based on discussion with Jiri and Roland.

In short: currently handle_stop_signal(SIGCONT, p) sends the notification to
p->parent, with this patch p itself notifies its parent when it becomes
running.

handle_stop_signal(SIGCONT) has to drop ->siglock temporary in order to notify
the parent with do_notify_parent_cldstop().  This leads to multiple problems:

	- as Jiri Kosina pointed out, the stopped task can resume without
	  actually seeing SIGCONT which may have a handler.

	- we race with another sig_kernel_stop() signal which may come in
	  that window.

	- we race with sig_fatal() signals which may set SIGNAL_GROUP_EXIT
	  in that window.

	- we can't avoid taking tasklist_lock() while sending SIGCONT.

With this patch handle_stop_signal() just sets the new SIGNAL_CLD_CONTINUED
flag in p->signal->flags and returns.  The notification is sent by the first
task which returns from finish_stop() (there should be at least one) or any
other signalled thread from get_signal_to_deliver().

This is a user-visible change.  Say, currently kill(SIGCONT, stopped_child)
can't return without seeing SIGCHLD, with this patch SIGCHLD can be delayed
unpredictably.  Another difference is that if the child is ptraced by another
process, CLD_CONTINUED may be delivered to ->real_parent after ptrace_detach()
while currently it always goes to the tracer which doesn't actually need this
notification.  Hopefully not a problem.

The patch asks for the futher obvious cleanups, I'll send them separately.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:34 -07:00
Dan Williams
6bfe0b4990 md: support blocking writes to an array on device failure
Allows a userspace metadata handler to take action upon detecting a device
failure.

Based on an original patch by Neil Brown.

Changes:
-added blocked_wait waitqueue to rdev
-don't qualify Blocked with Faulty always let userspace block writes
-added md_wait_for_blocked_rdev to wait for the block device to be clear, if
 userspace misses the notification another one is sent every 5 seconds
-set MD_RECOVERY_NEEDED after clearing "blocked"
-kill DoBlock flag, just test mddev->external

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:33 -07:00
Bryan Wu
2f3517418d Blackfin serial driver: this driver enable SPORTs on Blackfin emulate UART
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Cc: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:30 -07:00
Yinghai Lu
70b9f7dc14 x86/pci: remove flag in pci_cfg_space_size_ext
so let pci_cfg_space_size call it directly without flag.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-04-29 15:34:05 -07:00
Christoph Hellwig
3dcf54515a ext4: move headers out of include/linux
Move ext4 headers out of include/linux.  This is just the trivial move,
there's some more thing that could be done later. 

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-29 18:13:32 -04:00
Sam Ravnborg
98db6f193c x86: fix section mismatch in pci_scan_bus
Fix following section mismatch warning:
WARNING: vmlinux.o(.text+0x275616): Section mismatch in reference from the function pci_scan_bus() to the function .devinit.text:pci_scan_bus_parented()

The warning was seen with a CONFIG_DEBUG_SECTION_MISMATCH=y build.
The inline function pci_scan_bus refer to functions annotated
__devinit - so annotate it __devinit too.
This revealed a few x86 specific functions that were only
used from __init or __devinit context.
So annotate these __devinit and the warning was killed.

The added include in pci.h was not strictly required but
added to avoid being dependent on indirect includes.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Jesse Barnes <jbarnes@hobbes.lan>
2008-04-29 13:41:59 -07:00
Jens Axboe
7663c1e279 Improve queue_is_locked()
spin_is_locked() doesn't work on UP without spinlock debugging. Make it
safer and just return 1 on UP, so we don't get false positives. The plan
is to kill this debug function during the -rc cycle.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 12:36:54 -07:00
Linus Torvalds
9781db7b34 Merge branch 'audit.b50' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current
* 'audit.b50' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current:
  [PATCH] new predicate - AUDIT_FILETYPE
  [patch 2/2] Use find_task_by_vpid in audit code
  [patch 1/2] audit: let userspace fully control TTY input auditing
  [PATCH 2/2] audit: fix sparse shadowed variable warnings
  [PATCH 1/2] audit: move extern declarations to audit.h
  Audit: MAINTAINERS update
  Audit: increase the maximum length of the key field
  Audit: standardize string audit interfaces
  Audit: stop deadlock from signals under load
  Audit: save audit_backlog_limit audit messages in case auditd comes back
  Audit: collect sessionid in netlink messages
  Audit: end printk with newline
2008-04-29 11:41:22 -07:00
Linus Torvalds
a217656cb2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (21 commits)
  pciehp: fix error message about getting hotplug control
  pci/irq: let pci_device_shutdown to call pci_msi_shutdown v2
  pci/irq: restore mask_bits in msi shutdown -v3
  doc: replace yet another dev with pdev for consistency in DMA-mapping.txt
  PCI: don't expose struct pci_vpd to userspace
  doc: fix an incorrect suggestion to pass NULL for PCI like buses
  Consistently use pdev as the variable of type struct pci_dev *.
  pciehp: Fix command write
  shpchp: fix slot name
  make pciehp_acpi_get_hp_hw_control_from_firmware()
  pciehp: Clean up pcie_init()
  pciehp: Mask hotplug interrupt at controller release
  pciehp: Remove useless hotplug interrupt enabling
  pciehp: Fix wrong slot capability check
  pciehp: Fix wrong slot control register access
  pciehp: Add missing memory barrier
  pciehp: Fix interrupt event handlig
  pciehp: fix slot name
  Update MAINTAINERS with location of PCI tree
  PCI: Add Intel SCH PCI IDs
  ...
2008-04-29 10:17:59 -07:00
Linus Torvalds
8f45c1a58a block: fix queue locking verification
The new queue_flag_set/clear() functions verify that the queue is
locked, but in doing so they will actually instead oops if the queue
lock hasn't been initialized at all.

So fix the lock debug test to consider the "no lock" case to be
unlocked.  This way you get a nice WARN_ON_ONCE() instead of a fatal
oops.

Bug introduced by commit 75ad23bc0f
("block: make queue flags non-atomic").

Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Nick Piggin <npiggin@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 10:16:38 -07:00
Yinghai Lu
d52877c7b1 pci/irq: let pci_device_shutdown to call pci_msi_shutdown v2
[PATCH 2/2] pci/irq: let pci_device_shutdown to call pci_msi_shutdown v2

this change

| commit 23a274c8a5
| Author: Prakash, Sathya <sathya.prakash@lsi.com>
| Date:   Fri Mar 7 15:53:21 2008 +0530
|
|     [SCSI] mpt fusion: Enable MSI by default for SAS controllers
|
|     This patch modifies the driver to enable MSI by default for all SAS chips.
|
|     Signed-off-by: Sathya Prakash <sathya.prakash@lsi.com>
|     Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
|
Causes the kexec of a RHEL 5.1 kernel to fail.

root casue: the rhel 5.1 kernel still uses INTx emulation.  and
mptscsih_shutdown doesn't call pci_disable_msi to reenable INTx on kexec path

So call pci_msi_shutdown in the shutdown path to do the same thing to msix

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Jesse Barnes <jbarnes@hobbes.lan>
2008-04-29 09:12:51 -07:00
Yinghai Lu
8e149e09f9 pci/irq: restore mask_bits in msi shutdown -v3
[PATCH 1/2] pci/irq: restore mask_bits in msi shutdown -v3

Yinghai found that kexec'ing a RHEL 5.1 kernel with 2.6.25-rc3+ kernels
prevents his NIC from working.  He bisected to

| commit 89d694b9db
| Author: Thomas Gleixner <tglx@linutronix.de>
| Date:   Mon Feb 18 18:25:17 2008 +0100
|
|   genirq: do not leave interupts enabled on free_irq
|
|   The default_disable() function was changed in commit:
|
|    76d2160147
|    genirq: do not mask interrupts by default
|

For MSI, default_shutdown will call mask_bit for msi device.  All mask bits
will left disabled after free_irq.  Then in the kexec case, the next kernel
can only use msi_enable bit, so all device's MSI can not be used.

So lets to restore the mask bit to its pci reset defined value (enabled) when
we disable the kernels use of msi to be a little friendlier to kexec'd kernels.

Extend msi_set_mask_bit to msi_set_mask_bits to take mask, so we can fully
restore that to 0x00 instead of 0xfe.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Jesse Barnes <jbarnes@hobbes.lan>
2008-04-29 09:11:12 -07:00
Linus Torvalds
5f78e4d339 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-bigbox-pci
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-bigbox-pci:
  x86: add pci=check_enable_amd_mmconf and dmi check
  x86: work around io allocation overlap of HT links
  acpi: get boot_cpu_id as early for k8_scan_nodes
  x86_64: don't need set default res if only have one root bus
  x86: double check the multi root bus with fam10h mmconf
  x86: multi pci root bus with different io resource range, on 64-bit
  x86: use bus conf in NB conf fun1 to get bus range on, on 64-bit
  x86: get mp_bus_to_node early
  x86 pci: remove checking type for mmconfig probe
  x86: remove unneeded check in mmconf reject
  driver core: try parent numa_node at first before using default
  x86: seperate mmconf for fam10h out from setup_64.c
  x86: if acpi=off, force setting the mmconf for fam10h
  x86_64: check MSR to get MMCONFIG for AMD Family 10h
  x86_64: check and enable MMCONFIG for AMD Family 10h
  x86_64: set cfg_size for AMD Family 10h in case MMCONFIG
  x86: mmconf enable mcfg early
  x86: clear pci_mmcfg_virt when mmcfg get rejected
  x86: validate against acpi motherboard resources

Fixed up fairly trivial conflicts in arch/x86/pci/{init.c,pci.h} due to
OLPC support manually.
2008-04-29 08:26:51 -07:00
Linus Torvalds
867a89e0b7 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
  [RAPIDIO] Change RapidIO doorbell source and target ID field to 16-bit
  [RAPIDIO] Add RapidIO connection info print out and re-training for broken connections
  [RAPIDIO] Add serial RapidIO controller support, which includes MPC8548, MPC8641
  [RAPIDIO] Add RapidIO node probing into MPC86xx_HPCN board id table
  [RAPIDIO] Add RapidIO node into MPC8641HPCN dts file
  [RAPIDIO] Auto-probe the RapidIO system size
  [RAPIDIO] Add OF-tree support to RapidIO controller driver
  [RAPIDIO] Add RapidIO multi mport support
  [RAPIDIO] Move include/asm-ppc/rio.h to asm-powerpc
  [RAPIDIO] Add RapidIO option to kernel configuration
  [RAPIDIO] Change RIO function mpc85xx_ to fsl_
  [POWERPC] Provide walk_memory_resource() for powerpc
  [POWERPC] Update lmb data structures for hotplug memory add/remove
  [POWERPC] Hotplug memory remove notifications for powerpc
  [POWERPC] windfarm: Add PowerMac 12,1 support
  [POWERPC] Fix building of pmac32 when CONFIG_NVRAM=m
  [POWERPC] Add IRQSTACKS support on ppc32
  [POWERPC] Use __always_inline for xchg* and cmpxchg*
  [POWERPC] Add fast little-endian switch system call
2008-04-29 08:19:14 -07:00
Linus Torvalds
44473d9913 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
* git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
  [CPUFREQ] state info wrong after resume
  [CPUFREQ] allow use of the powersave governor as the default one
  [CPUFREQ] document the currently undocumented parts of the sysfs interface
  [CPUFREQ] expose cpufreq coordination requirements regardless of coordination mechanism
2008-04-29 08:18:49 -07:00
Linus Torvalds
bd5d435a96 Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  block: Skip I/O merges when disabled
  block: add large command support
  block: replace sizeof(rq->cmd) with BLK_MAX_CDB
  ide: use blk_rq_init() to initialize the request
  block: use blk_rq_init() to initialize the request
  block: rename and export rq_init()
  block: no need to initialize rq->cmd with blk_get_request
  block: no need to initialize rq->cmd in prepare_flush_fn hook
  block/blk-barrier.c:blk_ordered_cur_seq() mustn't be inline
  block/elevator.c:elv_rq_merge_ok() mustn't be inline
  block: make queue flags non-atomic
  block: add dma alignment and padding support to blk_rq_map_kern
  unexport blk_max_pfn
  ps3disk: Remove superfluous cast
  block: make rq_init() do a full memset()
  relay: fix splice problem
2008-04-29 08:18:03 -07:00
Thomas Gleixner
fee4b19fb3 bitops: remove "optimizations"
The mapsize optimizations which were moved from x86 to the generic
code in commit 64970b68d2 increased the
binary size on non x86 architectures.

Looking into the real effects of the "optimizations" it turned out
that they are not used in find_next_bit() and find_next_zero_bit().

The ones in find_first_bit() and find_first_zero_bit() are used in a
couple of places but none of them is a real hot path.

Remove the "optimizations" all together and call the library functions
unconditionally.

Boot-tested on x86 and compile tested on every cross compiler I have.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:11:16 -07:00
Christoph Lameter
37487a5652 Add kbuild.h that contains common definitions for kbuild users
The same definitions are used for the bounds logic and the asm-offsets.h
generation by kbuild.  Put them into include/linux/kbuild.h file.

Also add a new feature

	COMMENT("text")

which can be used to insert lines of ocmments into asm-offsets.h and
bounds.h.

Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Jay Estabrook <jay.estabrook@hp.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Chris Zankel <chris@zankel.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Bryan Wu <bryan.wu@analog.com>
Cc: Mike Frysinger <vapier.adi@gmail.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Greg Ungerer <gerg@uclinux.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Grant Grundler <grundler@parisc-linux.org>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Miles Bader <miles@gnu.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:29 -07:00
Harvey Harrison
064106a91b kernel: add common infrastructure for unaligned access
Create a linux/unaligned directory similar in spirit to the linux/byteorder
folder to hold generic implementations collected from various arches.

Currently there are five implementations:
1) packed_struct.h: C-struct based, from asm-generic/unaligned.h
2) le_byteshift.h: Open coded byte-swapping, heavily based on asm-arm
3) be_byteshift.h: Open coded byte-swapping, heavily based on asm-arm
4) memmove.h: taken from multiple implementations in tree
5) access_ok.h: taken from x86 and others, unaligned access is ok.

All of the new implementations checks for sizes not equal to 1,2,4,8
and will fail to link.

API additions:

get_unaligned_{le16|le32|le64|be16|be32|be64}(p) which is meant to replace
code of the form:
le16_to_cpu(get_unaligned((__le16 *)p));

put_unaligned_{le16|le32|le64|be16|be32|be64}(val, pointer) which is meant to
replace code of the form:
put_unaligned(cpu_to_le16(val), (__le16 *)p);

The headers that arches should include from their asm/unaligned.h:

access_ok.h : Wrappers of the byteswapping functions in asm/byteorder

Choose a particular implementation for little-endian access:
le_byteshift.h
le_memmove.h (arch must be LE)
le_struct.h (arch must be LE)

Choose a particular implementation for big-endian access:
be_byteshift.h
be_memmove.h (arch must be BE)
be_struct.h (arch must be BE)

After including as needed from the above, include unaligned/generic.h and
define your arch's get/put_unaligned as (for LE):

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:27 -07:00
Robert P. J. Day
dddfbaf8f8 sysv fs: remove superfluous check for __GNUC__ compiler
Since <linux/sysv_fs.h> isn't exported to userspace, there is little
point checking that this is a GNU-compatible compiler.

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:27 -07:00
Hitoshi Mitake
c3c52bce69 edac: fix module initialization on several modules 2nd time
I implemented opstate_init() as a inline function in linux/edac.h.

added calling opstate_init() to:
	i82443bxgx_edac.c
	i82860_edac.c
	i82875p_edac.c
	i82975x_edac.c

I wrote a fixed patch of
edac-fix-module-initialization-on-several-modules.patch,
and tested building 2.6.25-rc7 with applying this. It was succeed.
I think the patch is now correct.

Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Hitoshi Mitake <h.mitake@gmail.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:26 -07:00
Akinobu Mita
199f0ca514 idr: create idr_layer_cache at boot time
Avoid a possible kmem_cache_create() failure by creating idr_layer_cache
unconditionary at boot time rather than creating it on-demand when idr_init()
is called the first time.

This change also enables us to eliminate the check every time idr_init() is
called.

[akpm@linux-foundation.org: rename init_id_cache() to idr_init_cache()]
[akpm@linux-foundation.org: fix alpha build]
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:25 -07:00
Robert P. J. Day
098ef1c0ea nbd: delete superfluous test for __GNUC__
Since <linux/compiler.h> already tests for __GNUC__, there's no point in nbd.h
repeating that test.

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Cc: Paul Clements <paul.clements@steeleye.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:24 -07:00
Laurent Vivier
48cf6061b3 NBD: allow nbd to be used locally
This patch allows Network Block Device to be mounted locally (nbd-client to
nbd-server over 127.0.0.1).

It creates a kthread to avoid the deadlock described in NBD tools
documentation.  So, if nbd-client hangs waiting for pages, the kblockd thread
can continue its work and free pages.

I have tested the patch to verify that it avoids the hang that always occurs
when writing to a localhost nbd connection.  I have also tested to verify that
no performance degradation results from the additional thread and queue.

Patch originally from Laurent Vivier.

Signed-off-by: Paul Clements <paul.clements@steeleye.com>
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:23 -07:00
Pavel Emelyanov
d7321cd624 sysctl: add the ->permissions callback on the ctl_table_root
When reading from/writing to some table, a root, which this table came from,
may affect this table's permissions, depending on who is working with the
table.

The core hunk is at the bottom of this patch.  All the rest is just pushing
the ctl_table_root argument up to the sysctl_perm() function.

This will be mostly (only?) used in the net sysctls.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@sw.ru>
Cc: Denis V. Lunev <den@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:23 -07:00
Pavel Emelyanov
2c4c7155f2 sysctl: clean from unneeded extern and forward declarations
The do_sysctl_strategy isn't used outside kernel/sysctl.c, so this can be
static and without a prototype in header.

Besides, move this one and parse_table() above their callers and drop the
forward declarations of the latter call.

One more "besides" - fix two checkpatch warnings: space before a ( and an
extra space at the end of a line.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@sw.ru>
Cc: Denis V. Lunev <den@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:23 -07:00
Adrian Bunk
1a46674b99 include/linux/sysctl.h: remove empty #else
Remove an empty #else.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:23 -07:00
Denis V. Lunev
59b7435149 proc: introduce proc_create_data to setup de->data
This set of patches fixes an proc ->open'less usage due to ->proc_fops flip in
the most part of the kernel code.  The original OOPS is described in the
commit 2d3a4e3666:

    Typical PDE creation code looks like:

    	pde = create_proc_entry("foo", 0, NULL);
    	if (pde)
    		pde->proc_fops = &foo_proc_fops;

    Notice that PDE is first created, only then ->proc_fops is set up to
    final value. This is a problem because right after creation
    a) PDE is fully visible in /proc , and
    b) ->proc_fops are proc_file_operations which do not have ->open callback. So, it's
       possible to ->read without ->open (see one class of oopses below).

    The fix is new API called proc_create() which makes sure ->proc_fops are
    set up before gluing PDE to main tree. Typical new code looks like:

    	pde = proc_create("foo", 0, NULL, &foo_proc_fops);
    	if (!pde)
    		return -ENOMEM;

    Fix most networking users for a start.

    In the long run, create_proc_entry() for regular files will go.

In addition to this, proc_create_data is introduced to fix reading from
proc without PDE->data. The race is basically the same as above.

create_proc_entries is replaced in the entire kernel code as new method
is also simply better.

This patch:

The problem is the same as for de->proc_fops.  Right now PDE becomes visible
without data set.  So, the entry could be looked up without data.  This, in
most cases, will simply OOPS.

proc_create_data call is created to address this issue.  proc_create now
becomes a wrapper around it.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Chris Mason <chris.mason@oracle.com>
Acked-by: David Howells <dhowells@redhat.com>
Cc: Dmitry Torokhov <dtor@mail.ru>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Grant Grundler <grundler@parisc-linux.org>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Jaroslav Kysela <perex@suse.cz>
Cc: Jeff Garzik <jgarzik@pobox.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Karsten Keil <kkeil@suse.de>
Cc: Kyle McMartin <kyle@parisc-linux.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Nadia Derbey <Nadia.Derbey@bull.net>
Cc: Neil Brown <neilb@suse.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Osterlund <petero2@telia.com>
Cc: Pierre Peiffer <peifferp@gmail.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:20 -07:00
Alexey Dobriyan
8731f14d37 proc: remove ->get_info infrastructure
Now that last dozen or so users of ->get_info were removed, ditch it too.
Everyone sane shouldd have switched to seq_file interface long ago.

P.S.: Co-existing 3 interfaces (->get_info/->read_proc/->proc_fops) for proc
      is long-standing crap, BTW, thus
      a) put ->read_proc/->write_proc/read_proc_entry() users on death row,
      b) new such users should be rejected,
      c) everyone is encouraged to convert his favourite ->read_proc user or
         I'll do it, lazy bastards.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:19 -07:00
Alexey Dobriyan
c74c120a21 proc: remove proc_root from drivers
Remove proc_root export.  Creation and removal works well if parent PDE is
supplied as NULL -- it worked always that way.

So, one useless export removed and consistency added, some drivers created
PDEs with &proc_root as parent but removed them as NULL and so on.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:18 -07:00
Alexey Dobriyan
928b4d8c89 proc: remove proc_root_driver
Use creation by full path: "driver/foo".

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:18 -07:00
Alexey Dobriyan
36a5aeb878 proc: remove proc_root_fs
Use creation by full path instead: "fs/foo".

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:18 -07:00
Alexey Dobriyan
9c37066d88 proc: remove proc_bus
Remove proc_bus export and variable itself. Using pathnames works fine
and is slightly more understandable and greppable.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:18 -07:00
Matt Helsley
925d1c401f procfs task exe symlink
The kernel implements readlink of /proc/pid/exe by getting the file from
the first executable VMA.  Then the path to the file is reconstructed and
reported as the result.

Because of the VMA walk the code is slightly different on nommu systems.
This patch avoids separate /proc/pid/exe code on nommu systems.  Instead of
walking the VMAs to find the first executable file-backed VMA we store a
reference to the exec'd file in the mm_struct.

That reference would prevent the filesystem holding the executable file
from being unmounted even after unmapping the VMAs.  So we track the number
of VM_EXECUTABLE VMAs and drop the new reference when the last one is
unmapped.  This avoids pinning the mounted filesystem.

[akpm@linux-foundation.org: improve comments]
[yamamoto@valinux.co.jp: fix dup_mmap]
Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: David Howells <dhowells@redhat.com>
Cc:"Eric W. Biederman" <ebiederm@xmission.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:17 -07:00
David Howells
7249db2c28 keys: make key_serial() a function if CONFIG_KEYS=y
Make key_serial() an inline function rather than a macro if CONFIG_KEYS=y.
This prevents double evaluation of the key pointer and also provides better
type checking.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:17 -07:00
David Howells
0b77f5bfb4 keys: make the keyring quotas controllable through /proc/sys
Make the keyring quotas controllable through /proc/sys files:

 (*) /proc/sys/kernel/keys/root_maxkeys
     /proc/sys/kernel/keys/root_maxbytes

     Maximum number of keys that root may have and the maximum total number of
     bytes of data that root may have stored in those keys.

 (*) /proc/sys/kernel/keys/maxkeys
     /proc/sys/kernel/keys/maxbytes

     Maximum number of keys that each non-root user may have and the maximum
     total number of bytes of data that each of those users may have stored in
     their keys.

Also increase the quotas as a number of people have been complaining that it's
not big enough.  I'm not sure that it's big enough now either, but on the
other hand, it can now be set in /etc/sysctl.conf.

Signed-off-by: David Howells <dhowells@redhat.com>
Cc: <kwc@citi.umich.edu>
Cc: <arunsr@cse.iitk.ac.in>
Cc: <dwalsh@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:17 -07:00
David Howells
69664cf16a keys: don't generate user and user session keyrings unless they're accessed
Don't generate the per-UID user and user session keyrings unless they're
explicitly accessed.  This solves a problem during a login process whereby
set*uid() is called before the SELinux PAM module, resulting in the per-UID
keyrings having the wrong security labels.

This also cures the problem of multiple per-UID keyrings sometimes appearing
due to PAM modules (including pam_keyinit) setuiding and causing user_structs
to come into and go out of existence whilst the session keyring pins the user
keyring.  This is achieved by first searching for extant per-UID keyrings
before inventing new ones.

The serial bound argument is also dropped from find_keyring_by_name() as it's
not currently made use of (setting it to 0 disables the feature).

Signed-off-by: David Howells <dhowells@redhat.com>
Cc: <kwc@citi.umich.edu>
Cc: <arunsr@cse.iitk.ac.in>
Cc: <dwalsh@redhat.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <jmorris@namei.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:17 -07:00
Arun Raghavan
6b79ccb514 keys: allow clients to set key perms in key_create_or_update()
The key_create_or_update() function provided by the keyring code has a default
set of permissions that are always applied to the key when created.  This
might not be desirable to all clients.

Here's a patch that adds a "perm" parameter to the function to address this,
which can be set to KEY_PERM_UNDEF to revert to the current behaviour.

Signed-off-by: Arun Raghavan <arunsr@cse.iitk.ac.in>
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Satyam Sharma <ssatyam@cse.iitk.ac.in>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:16 -07:00
David Howells
70a5bb72b5 keys: add keyctl function to get a security label
Add a keyctl() function to get the security label of a key.

The following is added to Documentation/keys.txt:

 (*) Get the LSM security context attached to a key.

	long keyctl(KEYCTL_GET_SECURITY, key_serial_t key, char *buffer,
		    size_t buflen)

     This function returns a string that represents the LSM security context
     attached to a key in the buffer provided.

     Unless there's an error, it always returns the amount of data it could
     produce, even if that's too big for the buffer, but it won't copy more
     than requested to userspace. If the buffer pointer is NULL then no copy
     will take place.

     A NUL character is included at the end of the string if the buffer is
     sufficiently big.  This is included in the returned count.  If no LSM is
     in force then an empty string will be returned.

     A process must have view permission on the key for this function to be
     successful.

[akpm@linux-foundation.org: declare keyctl_get_security()]
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Paul Moore <paul.moore@hp.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: James Morris <jmorris@namei.org>
Cc: Kevin Coffman <kwc@citi.umich.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:16 -07:00
David Howells
4a38e122e2 keys: allow the callout data to be passed as a blob rather than a string
Allow the callout data to be passed as a blob rather than a string for
internal kernel services that call any request_key_*() interface other than
request_key().  request_key() itself still takes a NUL-terminated string.

The functions that change are:

	request_key_with_auxdata()
	request_key_async()
	request_key_async_with_auxdata()

Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Paul Moore <paul.moore@hp.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <jmorris@namei.org>
Cc: Kevin Coffman <kwc@citi.umich.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:16 -07:00
Cyrill Gorcunov
eb6900fbfa ELF: Use EI_NIDENT instead of numeric value
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:16 -07:00
Robert P. J. Day
66ec2d7786 ipmi: make comment match actual preprocessor check
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:15 -07:00
Alexey Dobriyan
fa68be0def ipmi: remove ->write_proc code
IPMI code theoretically allows ->write_proc users, but nobody uses this thus
far.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Corey Minyard <minyard@acm.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:15 -07:00
Corey Minyard
c70d749986 ipmi: style fixes in the base code
Lots of style fixes for the base IPMI driver.  No functional changes.
Basically fixes everything reported by checkpatch and fixes the comment
style.

Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:15 -07:00
Corey Minyard
bda4c30aa6 ipmi: run to completion fixes
The "run_to_completion" mode was somewhat broken.  Locks need to be avoided in
run_to_completion mode, and it shouldn't be used by normal users, just
internally for panic situations.

This patch removes locks in run_to_completion mode and removes the user call
for setting the mode.  The only user was the poweroff code, but it was easily
converted to use the polling interface.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:14 -07:00
Zhang, Yanmin
44f564a4bf ipc: add definitions of USHORT_MAX and others
Add definitions of USHORT_MAX and others into kernel.  ipc uses it and slub
implementation might also use it.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
Reviewed-by: Christoph Lameter <clameter@sgi.com>
Cc: Nadia Derbey <Nadia.Derbey@bull.net>
Cc: "Pierre Peiffer" <peifferp@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:14 -07:00
Nadia Derbey
6546bc4279 ipc: re-enable msgmni automatic recomputing msgmni if set to negative
The enhancement as asked for by Yasunori: if msgmni is set to a negative
value, register it back into the ipcns notifier chain.

A new interface has been added to the notification mechanism:
notifier_chain_cond_register() registers a notifier block only if not already
registered.  With that new interface we avoid taking care of the states
changes in procfs.

Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Mingming Cao <cmm@us.ibm.com>
Cc: Pierre Peiffer <pierre.peiffer@bull.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:13 -07:00
Nadia Derbey
e2c284d8a8 ipc: recompute msgmni on ipc namespace creation/removal
Introduce a notification mechanism that aims at recomputing msgmni each time
an ipc namespace is created or removed.

The ipc namespace notifier chain already defined for memory hotplug management
is used for that purpose too.

Each time a new ipc namespace is allocated or an existing ipc namespace is
removed, the ipcns notifier chain is notified.  The callback routine for each
registered ipc namespace is then activated in order to recompute msgmni for
that namespace.

Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Mingming Cao <cmm@us.ibm.com>
Cc: Pierre Peiffer <pierre.peiffer@bull.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:12 -07:00
Nadia Derbey
b6b337ad1c ipc: recompute msgmni on memory add / remove
Introduce the registration of a callback routine that recomputes msg_ctlmni
upon memory add / remove.

A single notifier block is registered in the hotplug memory chain for all the
ipc namespaces.

Since the ipc namespaces are not linked together, they have their own
notification chain: one notifier_block is defined per ipc namespace.

Each time an ipc namespace is created (removed) it registers (unregisters) its
notifier block in (from) the ipcns chain.  The callback routine registered in
the memory chain invokes the ipcns notifier chain with the IPCNS_LOWMEM event.
 Each callback routine registered in the ipcns namespace, in turn, recomputes
msgmni for the owning namespace.

Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Mingming Cao <cmm@us.ibm.com>
Cc: Pierre Peiffer <pierre.peiffer@bull.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:12 -07:00
Nadia Derbey
0c40ba4fd6 ipc: define the slab_memory_callback priority as a constant
This is a trivial patch that defines the priority of slab_memory_callback in
the callback chain as a constant.  This is to prepare for next patch in the
series.

Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Mingming Cao <cmm@us.ibm.com>
Cc: Pierre Peiffer <pierre.peiffer@bull.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:12 -07:00
Nadia Derbey
4d89dc6ab2 ipc: scale msgmni to the number of ipc namespaces
Since all the namespaces see the same amount of memory (the total one) this
patch introduces a new variable that counts the ipc namespaces and divides
msg_ctlmni by this counter.

Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Mingming Cao <cmm@us.ibm.com>
Cc: Pierre Peiffer <pierre.peiffer@bull.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:12 -07:00
Nadia Derbey
f7bf3df8be ipc: scale msgmni to the amount of lowmem
On large systems we'd like to allow a larger number of message queues.  In
some cases up to 32K.  However simply setting MSGMNI to a larger value may
cause problems for smaller systems.

The first patch of this series introduces a default maximum number of message
queue ids that scales with the amount of lowmem.

Since msgmni is per namespace and there is no amount of memory dedicated to
each namespace so far, the second patch of this series scales msgmni to the
number of ipc namespaces too.

Since msgmni depends on the amount of memory, it becomes necessary to
recompute it upon memory add/remove.  In the 4th patch, memory hotplug
management is added: a notifier block is registered into the memory hotplug
notifier chain for the ipc subsystem.  Since the ipc namespaces are not linked
together, they have their own notification chain: one notifier_block is
defined per ipc namespace.  Each time an ipc namespace is created (removed) it
registers (unregisters) its notifier block in (from) the ipcns chain.  The
callback routine registered in the memory chain invokes the ipcns notifier
chain with the IPCNS_MEMCHANGE event.  Each callback routine registered in the
ipcns namespace, in turn, recomputes msgmni for the owning namespace.

The 5th patch makes it possible to keep the memory hotplug notifier chain's
lock for a lesser amount of time: instead of directly notifying the ipcns
notifier chain upon memory add/remove, a work item is added to the global
workqueue.  When activated, this work item is the one who notifies the ipcns
notifier chain.

Since msgmni depends on the number of ipc namespaces, it becomes necessary to
recompute it upon ipc namespace creation / removal.  The 6th patch uses the
ipc namespace notifier chain for that purpose: that chain is notified each
time an ipc namespace is created or removed.  This makes it possible to
recompute msgmni for all the namespaces each time one of them is created or
removed.

When msgmni is explicitely set from userspace, we should avoid recomputing it
upon memory add/remove or ipcns creation/removal.  This is what the 7th patch
does: it simply unregisters the ipcns callback routine as soon as msgmni has
been changed from procfs or sysctl().

Even if msgmni is set by hand, it should be possible to make it back
automatically recomputed upon memory add/remove or ipcns creation/removal.
This what is achieved in patch 8: if set to a negative value, msgmni is added
back to the ipcns notifier chain, making it automatically recomputed again.

This patch:

Compute msg_ctlmni to make it scale with the amount of lowmem.  msg_ctlmni is
now set to make the message queues occupy 1/32 of the available lowmem.

Some cleaning has also been done for the MSGPOOL constant: the msgctl man page
says it's not used, but it also defines it as a size in bytes (the code
expresses it in Kbytes).

Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Mingming Cao <cmm@us.ibm.com>
Cc: Pierre Peiffer <pierre.peiffer@bull.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:12 -07:00
Arthur Kepner
74bc7ceebf dma: add dma_*map*_attrs() interfaces
Introduce new interfaces, dma_*map*_attrs(), for passing architecture-specific
attributes when memory is mapped and unmapped for DMA.  Give the interfaces
default implementations which ignore attributes.  Also introduce the
dma_{set|get}_attr() interfaces for setting and retrieving individual
attributes.  Define one attribute, DMA_ATTR_WRITE_BARRIER, in anticipation of
its use by ia64/sn.  Select whether architectures implement arch-specific
versions of the dma_*map*_attrs() interfaces via HAVE_DMA_ATTRS in Kconfig.

[markn@au1.ibm.com: dma_{set,get}_attr() have to be static inline]
Signed-off-by: Arthur Kepner <akepner@sgi.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Roland Dreier <rdreier@cisco.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: David Miller <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Grant Grundler <grundler@parisc-linux.org>
Cc: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:11 -07:00
Pavel Emelyanov
29f2a4dac8 memcgroup: implement failcounter reset
This is a very common requirement from people using the resource accounting
facilities (not only memcgroup but also OpenVZ beancounters).  They want to
put the cgroup in an initial state without re-creating it.

For example after re-configuring a group people want to observe how this new
configuration fits the group needs without saving the previous failcnt value.

Merge two resets into one mem_cgroup_reset() function to demonstrate how
multiplexing work.

Besides, I have plans to move the files, that correspond to res_counter to the
res_counter.c file and somehow "import" them into controller.  I don't know
how to make it gracefully yet, but merging resets of max_usage and failcnt in
one function will be there for sure.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:10 -07:00
Pavel Emelyanov
faebe9fdf3 memcgroups: add a document describing the resource counter abstraction
The resource counter is supposed to facilitate the resource accounting of
arbitrary resource (and it already does this for memory controller).

However, it is about to be used in other resources controllers (swap, kernel
memory, networking, etc), so provide a doc describing how to work with it.
This will eliminate all the possible future duplications in the appropriate
controllers' docs.

Fixed errors pointed out by Randy.

[akpm@linux-foundation.org: fix documentation tpyo]
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:10 -07:00
Pavel Emelyanov
c84872e168 memcgroup: add the max_usage member on the res_counter
This field is the maximal value of the usage one since the counter creation
(or since the latest reset).

To reset this to the usage value simply write anything to the appropriate
cgroup file.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:10 -07:00
Balbir Singh
cf475ad28a cgroups: add an owner to the mm_struct
Remove the mem_cgroup member from mm_struct and instead adds an owner.

This approach was suggested by Paul Menage.  The advantage of this approach
is that, once the mm->owner is known, using the subsystem id, the cgroup
can be determined.  It also allows several control groups that are
virtually grouped by mm_struct, to exist independent of the memory
controller i.e., without adding mem_cgroup's for each controller, to
mm_struct.

A new config option CONFIG_MM_OWNER is added and the memory resource
controller selects this config option.

This patch also adds cgroup callbacks to notify subsystems when mm->owner
changes.  The mm_cgroup_changed callback is called with the task_lock() of
the new task held and is called just prior to changing the mm->owner.

I am indebted to Paul Menage for the several reviews of this patchset and
helping me make it lighter and simpler.

This patch was tested on a powerpc box, it was compiled with both the
MM_OWNER config turned on and off.

After the thread group leader exits, it's moved to init_css_state by
cgroup_exit(), thus all future charges from runnings threads would be
redirected to the init_css_set's subsystem.

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Pavel Emelianov <xemul@openvz.org>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Sudhir Kumar <skumar@linux.vnet.ibm.com>
Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Cc: Hirokazu Takahashi <taka@valinux.co.jp>
Cc: David Rientjes <rientjes@google.com>,
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Reviewed-by: Paul Menage <menage@google.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:10 -07:00
Serge E. Hallyn
29486df325 cgroups: introduce cft->read_seq()
Introduce a read_seq() helper in cftype, which uses seq_file to print out
lists.  Use it in the devices cgroup.  Also split devices.allow into two
files, so now devices.deny and devices.allow are the ones to use to manipulate
the whitelist, while devices.list outputs the cgroup's current whitelist.

Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Acked-by: Paul Menage <menage@google.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:10 -07:00
Li Zefan
28fd5dfc12 cgroups: remove the css_set linked-list
Now we can run through the hash table instead of running through the
linked-list.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Reviewed-by: Paul Menage <menage@google.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:10 -07:00
Li Zefan
472b1053f3 cgroups: use a hash table for css_set finding
When we attach a process to a different cgroup, the css_set linked-list will
be run through to find a suitable existing css_set to use.  This patch
implements a hash table for better performance.

The following benchmarks have been tested:

For N in 1, 5, 10, 50, 100, 500, 1000, create N cgroups with one sleeping
task in each, and then move an additional task through each cgroup in
turn.

Here is a test result:

N	Loop	orig - Time(s)	hash - Time(s)
----------------------------------------------
1	10000	1.201231728	1.196311177
5	2000	1.065743872	1.040566424
10	1000	0.991054735	0.986876440
50	200	0.976554203	0.969608733
100	100	0.998504680	0.969218270
500	20	1.157347764	0.962602963
1000	10	1.619521852	1.085140172

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Reviewed-by: Paul Menage <menage@google.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:09 -07:00
Serge E. Hallyn
08ce5f16ee cgroups: implement device whitelist
Implement a cgroup to track and enforce open and mknod restrictions on device
files.  A device cgroup associates a device access whitelist with each cgroup.
 A whitelist entry has 4 fields.  'type' is a (all), c (char), or b (block).
'all' means it applies to all types and all major and minor numbers.  Major
and minor are either an integer or * for all.  Access is a composition of r
(read), w (write), and m (mknod).

The root device cgroup starts with rwm to 'all'.  A child devcg gets a copy of
the parent.  Admins can then remove devices from the whitelist or add new
entries.  A child cgroup can never receive a device access which is denied its
parent.  However when a device access is removed from a parent it will not
also be removed from the child(ren).

An entry is added using devices.allow, and removed using
devices.deny.  For instance

	echo 'c 1:3 mr' > /cgroups/1/devices.allow

allows cgroup 1 to read and mknod the device usually known as
/dev/null.  Doing

	echo a > /cgroups/1/devices.deny

will remove the default 'a *:* mrw' entry.

CAP_SYS_ADMIN is needed to change permissions or move another task to a new
cgroup.  A cgroup may not be granted more permissions than the cgroup's parent
has.  Any task can move itself between cgroups.  This won't be sufficient, but
we can decide the best way to adequately restrict movement later.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix may-be-used-uninitialized warning]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Acked-by: James Morris <jmorris@namei.org>
Looks-good-to: Pavel Emelyanov <xemul@openvz.org>
Cc: Daniel Hokka Zakrisson <daniel@hozac.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:09 -07:00
Pavel Emelyanov
d447ea2f30 cgroups: add the trigger callback to struct cftype
Trigger callback can be used to receive a kick-up from the user space.  The
string written is ignored.

The cftype->private is used for multiplexing events.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Paul Menage <menage@google.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:09 -07:00
Paul Menage
e73d2c61d1 CGroups _s64 files: add cgroups read_s64/write_s64 file methods
These patches add cgroups read_s64 and write_s64 control file methods (the
signed equivalent of read_u64/write_u64) and use them to implement the
cpu.rt_runtime_us control file in the CFS cgroup subsystem.

This patch:

These are the signed equivalents of the read_u64/write_u64 methods

Signed-off-by: Paul Menage <menage@google.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:09 -07:00
Paul Menage
3116f0e3df CGroup API files: move "releasable" to cgroup_debug subsystem
The "releasable" control file provided by the cgroup framework exports the
state of a per-cgroup flag that's related to the notify-on-release feature.
This isn't really generally useful, unless you're trying to debug this
particular feature of cgroups.

This patch moves the "releasable" file to the cgroup_debug subsystem.

Signed-off-by: Paul Menage <menage@google.com>
Cc: "Li Zefan" <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "YAMAMOTO Takashi" <yamamoto@valinux.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:09 -07:00
Paul Menage
9179656961 CGroup API files: add cgroup map data type
Adds a new type of supported control file representation, a map from strings
to u64 values.

Each map entry is printed as a line in a similar format to /proc/vmstat, i.e.
"$key $value\n"

Signed-off-by: Paul Menage <menage@google.com>
Cc: "Li Zefan" <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "YAMAMOTO Takashi" <yamamoto@valinux.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:08 -07:00
Paul Menage
2c7eabf376 CGroup API files: add res_counter_read_u64()
Adds a function for returning the value of a resource counter member, in a
form suitable for use in a cgroup read_u64 control file method.

Signed-off-by: Paul Menage <menage@google.com>
Cc: "Li Zefan" <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "YAMAMOTO Takashi" <yamamoto@valinux.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:08 -07:00
Paul Menage
f4c753b7ea CGroup API files: rename read/write_uint methods to read_write_u64
Several people have justifiably complained that the "_uint" suffix is
inappropriate for functions that handle u64 values, so this patch just renames
all these functions and their users to have the suffic _u64.

[peterz@infradead.org: build fix]
Signed-off-by: Paul Menage <menage@google.com>
Cc: "Li Zefan" <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "YAMAMOTO Takashi" <yamamoto@valinux.co.jp>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:07 -07:00
Jan Engelhardt
c9e587abfd vt: fix background color on line feed
A command that causes a line feed while a background color is active,
such as

	perl -e 'print "x" x 60, "\e[44m", "x" x 40, "\e[0m\n"'
and
	perl -e 'print "x" x 40, "\e[44m\n", "x" x 40, "\e[0m\n"'

causes the line that was started as a result of the line feed to be completely
filled with the currently active background color instead of the default
color.

When scrolling, part of the current screen is memcpy'd/memmove'd to the new
region, and the new line(s) that will appear as a result are cleared using
memset.  However, the lines are cleared with vc->vc_video_erase_char, causing
them to be colored with the currently active background color.  This is
different from X11 terminal emulators which always paint the new lines with
the default background color (e.g.  `xterm -bg black`).

The clear operation (\e[1J and \e[2J) also use vc_video_erase_char, so a new
vc->vc_scrl_erase_char is introduced with contains the erase character used
for scrolling, which is built from vc->vc_def_color instead of vc->vc_color.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:06 -07:00
Dave Young
5f97a5a879 isolate ratelimit from printk.c for other use
Due to the rcupreempt.h WARN_ON trigged, I got 2G syslog file.  For some
serious complaining of kernel, we need repeat the warnings, so here I isolate
the ratelimit part of printk.c to a standalone file.

Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:06 -07:00
David Howells
8f0cfa52a1 xattr: add missing consts to function arguments
Add missing consts to xattr function arguments.

Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:06 -07:00
Ilpo Järvinen
76308da189 smb.h: uses struct timespec but didn't include linux/time.h
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:05 -07:00
Robert P. J. Day
95d8c365b2 lists: add "const" qualifier to first arg of list_splice() operations
Since neither the list_splice() nor __list_splice() routines modify their
first argument, might as well declare them "const".

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:04 -07:00
Robert P. J. Day
8673511845 kbuild: move files that don't check __KERNEL__
Move files that don't check __KERNEL__ from unifdef-y to header-y.

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:04 -07:00
Robert P. J. Day
1a6924f93d kbuild: remove duplicate, conflicting entry for oom.h
oom.h is already tagged for unifdef'ing, so its entry as a simple exportable
header should be deleted.

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:04 -07:00
Robert P. J. Day
aab3c3b01d Remove superfluous include of string.h from percpu.h
There's nothing in percpu.h that requires an explicit inclusion of
string.h.

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:04 -07:00
Pavel Emelyanov
3a2e7f47d7 binfmt_misc.c: avoid potential kernel stack overflow
This can be triggered with root help only, but...

Register the ":text:E::txt::/root/cat.txt:' rule in binfmt_misc (by root) and
try launching the cat.txt file (by anyone) :) The result is - the endless
recursion in the load_misc_binary -> open_exec -> load_misc_binary chain and
stack overflow.

There's a similar problem with binfmt_script, and there's a sh_bang memner on
linux_binprm structure to handle this, but simply raising this in binfmt_misc
may break some setups when the interpreter of some misc binaries is a script.

So the proposal is to turn sh_bang into a bit, add a new one (the misc_bang)
and raise it in load_misc_binary.  After this, even if we set up the misc ->
script -> misc loop for binfmts one of them will step on its own bang and
exit.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:04 -07:00
Adrian Bunk
7d195a5409 proper extern for late_time_init
Add a proper extern for late_time_init in include/linux/init.h

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:03 -07:00
Tetsuo Handa
175a06ae30 exec: remove argv_len from struct linux_binprm
I noticed that 2.6.24.2 calculates bprm->argv_len at do_execve().  But it
doesn't update bprm->argv_len after "remove_arg_zero() +
copy_strings_kernel()" at load_script() etc.

audit_bprm() is called from search_binary_handler() and
search_binary_handler() is called from load_script() etc.  Thus, I think the
condition check

  if (bprm->argv_len > (audit_argv_kb << 10))
          return -E2BIG;

in audit_bprm() might return wrong result when strlen(removed_arg) !=
strlen(spliced_args).  Why not update bprm->argv_len at load_script() etc.  ?

By the way, 2.6.25-rc3 seems to not doing the condition check.  Is the field
bprm->argv_len no longer needed?

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Ollie Wild <aaw@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:03 -07:00
WANG Cong
ecd0fa9825 Remove the macro get_personality
Remove the macro get_personality, use ->personality instead.

Cc: Christoph Hellwig <hch@infradead.org
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Bryan Wu <bryan.wu@analog.com>
Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:02 -07:00
jan sonnek
6e5e8c5085 Misc: phantom, consistent whitespace
Make it consistent with the rest of the header.

Signed-off-by: jan sonnek <xsonnek@gmail.com>
Cc: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:02 -07:00
Jiri Slaby
7e4e8e689f Misc: phantom, add compat ioctl
Openhaptics uses pointers in _IOC() macros, implement compat for them. Also
add _IOC alternatives which are not 32/64 bit dependent (structures
passed through aren't yet) -- libphantom will use them.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:02 -07:00
Adrian Bunk
eb0f1c442d proper __do_softirq() prototype
Add a proper prototype for __do_softirq() in include/linux/interrupt.h

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:02 -07:00
Adrian Bunk
58b250daff remove mca_is_adapter_used()
Remove the no longer used mca_is_adapter_used().

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:01 -07:00
Adrian Bunk
946a57b526 remove generic_commit_write()
Remove the obsolete and no longer used generic_commit_write().

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:01 -07:00
Adrian Bunk
d5470b596a fs/aio.c: make 3 functions static
Make the following needlessly global functions static:

- __put_ioctx()
- lookup_ioctx()
- io_submit_one()

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:00 -07:00
Adrian Bunk
07d45da616 fs/drop_caches.c: make 2 functions static
Make the following needlessly global functions static:

- drop_pagecache()
- drop_slab()

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:00 -07:00
Adrian Bunk
f11b00f3bd fs/fs-writeback.c: make 2 functions static
Make the following needlessly global functions static:

- writeback_acquire()
- writeback_release()

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:00 -07:00
Adrian Bunk
67cde59537 make vfs_ioctl() static
Make the needlessly global vfs_ioctl() static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:00 -07:00
Adrian Bunk
6b09ae6692 make __put_super() static
Make the needlessly global __put_super() static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:00 -07:00
Sam Ravnborg
f718e31819 cpu: fix section mismatch warnings in hotcpu_register
Fix following warnings:
WARNING: vmlinux.o(.data+0x5020): Section mismatch in reference from the variable cpu_vsyscall_notifier_nb.12876 to the function .cpuinit.text:cpu_vsyscall_notifier()
WARNING: vmlinux.o(.data+0x9ce0): Section mismatch in reference from the variable profile_cpu_callback_nb.17654 to the function .devinit.text:profile_cpu_callback()
WARNING: vmlinux.o(.data+0xd380): Section mismatch in reference from the variable workqueue_cpu_callback_nb.15004 to the function .devinit.text:workqueue_cpu_callback()
WARNING: vmlinux.o(.data+0x11d00): Section mismatch in reference from the variable relay_hotcpu_callback_nb.19626 to the function .cpuinit.text:relay_hotcpu_callback()
WARNING: vmlinux.o(.data+0x12970): Section mismatch in reference from the variable cpu_callback_nb.24694 to the function .devinit.text:cpu_callback()
WARNING: vmlinux.o(.data+0x3fee0): Section mismatch in reference from the variable percpu_counter_hotcpu_callback_nb.10903 to the function .cpuinit.text:percpu_counter_hotcpu_callback()
WARNING: vmlinux.o(.data+0x74ce0): Section mismatch in reference from the variable topology_cpu_callback_nb.12506 to the function .cpuinit.text:topology_cpu_callback()

Functions used as argument are by definition only used in HOTPLUG_CPU
situations so thay are annotated __cpuinit.  Annotate the static variable used
by hotcpu_register with __cpuinitdata to match this definition.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:05:59 -07:00
Sripathi Kodi
679c9cd4ac add RUSAGE_THREAD
Add the RUSAGE_THREAD option for the getrusage system call.  This is
essentially Roland's patch from http://lkml.org/lkml/2008/1/18/589, but the
line about RUSAGE_LWP line has been removed, as suggested by Ulrich and
Christoph.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:05:59 -07:00
Nur Hussein
95b570c9ce Taint kernel after WARN_ON(condition)
The kernel is sent to tainted within the warn_on_slowpath() function, and
whenever a warning occurs the new taint flag 'W' is set.  This is useful to
know if a warning occurred before a BUG by preserving the warning as a flag
in the taint state.

This does not work on architectures where WARN_ON has its own definition.
These archs are:
	1. s390
	2. superh
	3. avr32
	4. parisc

The maintainers of these architectures have been added in the Cc: list
in this email to alert them to the situation.

The documentation in oops-tracing.txt has been updated to include the
new flag.

Signed-off-by: Nur Hussein <nurhussein@gmail.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:05:59 -07:00
Ilpo Järvinen
bd3feb13e1 fs/coda: remove static inline forward declarations
They're defined later on in the same file with bodies and nothing in
between needs them.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Jan Harkes <jaharkes@cs.cmu.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:05:59 -07:00
Eric Dumazet
ede9c697bc Avoid divides in BITS_TO_LONGS
BITS_PER_LONG is a signed value (32 or 64)

DIV_ROUND_UP(nr, BITS_PER_LONG) performs signed arithmetic if "nr" is signed too.

Converting BITS_TO_LONGS(nr) to DIV_ROUND_UP(nr, BITS_PER_BYTE *
sizeof(long)) makes sure compiler can perform a right shift, even if "nr"
is a signed value, instead of an expensive integer divide.

Applying this patch saves 141 bytes on x86 when CONFIG_CC_OPTIMIZE_FOR_SIZE=y
and speedup bitmap operations.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:05:59 -07:00
Nishanth Aravamudan
ab857d0938 mm: fix misleading __GFP_REPEAT related comments
The definition and use of __GFP_REPEAT, __GFP_NOFAIL and __GFP_NORETRY in the
core VM have somewhat differing comments as to their actual semantics.
Annoyingly, the flags definition has inline and header comments, which might
be interpreted as not being equivalent.  Just add references to the header
comments in the inline ones so they don't go out of sync in the future.  In
their use in __alloc_pages() clarify that the current implementation treats
low-order allocations and __GFP_REPEAT allocations as distinct cases.

To clarify, the flags' semantics are:

__GFP_NORETRY means try no harder than one run through __alloc_pages

__GFP_REPEAT means __GFP_NOFAIL

__GFP_NOFAIL means repeat forever

order <= PAGE_ALLOC_COSTLY_ORDER means __GFP_NOFAIL

Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:05:58 -07:00
Alan D. Brunelle
ac9fafa124 block: Skip I/O merges when disabled
The block I/O + elevator + I/O scheduler code spend a lot of time trying
to merge I/Os -- rightfully so under "normal" circumstances. However,
if one were to know that the incoming I/O stream was /very/ random in
nature, the cycles are wasted.

This patch adds a per-request_queue tunable that (when set) disables
merge attempts (beyond the simple one-hit cache check), thus freeing up
a non-trivial amount of CPU cycles.

Signed-off-by: Alan D. Brunelle <alan.brunelle@hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-04-29 14:48:55 +02:00
FUJITA Tomonori
d7e3c3249e block: add large command support
This patch changes rq->cmd from the static array to a pointer to
support large commands.

We rarely handle large commands. So for optimization, a struct request
still has a static array for a command. rq_init sets rq->cmd pointer
to the static array.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-04-29 14:48:55 +02:00
FUJITA Tomonori
2a4aa30c5f block: rename and export rq_init()
This rename rq_init() blk_rq_init() and export it. Any path that hands
the request to the block layer needs to call it to initialize the
request.

This is a preparation for large command support, which needs to
initialize the request in a proper way (that is, just doing a memset()
will not work).

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-04-29 14:48:55 +02:00
Nick Piggin
75ad23bc0f block: make queue flags non-atomic
We can save some atomic ops in the IO path, if we clearly define
the rules of how to modify the queue flags.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-04-29 14:48:33 +02:00
Zhang Wei
61b269179d [RAPIDIO] Add serial RapidIO controller support, which includes MPC8548, MPC8641
Signed-off-by: Zhang Wei <wei.zhang@freescale.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-29 19:40:29 +10:00
Zhang Wei
e042323607 [RAPIDIO] Auto-probe the RapidIO system size
The RapidIO system size will auto probe in RIO setup.  The route table
and rionet_active in rionet.c are changed to be allocated dynamically
according to the size of the system.

Signed-off-by: Zhang Wei <wei.zhang@freescale.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-29 19:40:28 +10:00
Zhang Wei
ad1e9380b1 [RAPIDIO] Add RapidIO multi mport support
The original RapidIO driver suppose there is only one mpc85xx RIO controller
in system.  So, some data structures are defined as mpc85xx_rio global, such
as 'regs_win', 'dbell_ring', 'msg_tx_ring'.  Now, I changed them to mport's
private members.  And you can define multi RIO OF-nodes in dts file for multi
RapidIO controller in one processor, such as PCI/PCI-Ex host controllers in
Freescale's silicon.  And the mport operation function declaration should be
changed to know which RapidIO controller is target.

Signed-off-by: Zhang Wei <wei.zhang@freescale.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-29 19:40:28 +10:00
FUJITA Tomonori
68154e90c9 block: add dma alignment and padding support to blk_rq_map_kern
This patch adds bio_copy_kern similar to
bio_copy_user. blk_rq_map_kern uses bio_copy_kern instead of
bio_map_kern if necessary.

bio_copy_kern uses temporary pages and the bi_end_io callback frees
these pages. bio_copy_kern saves the original kernel buffer at
bio->bi_private it doesn't use something like struct bio_map_data to
store the information about the caller.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-04-29 09:50:34 +02:00
Bjorn Helgaas
dfd2e1b4e6 PNPBIOS: remove include/linux/pnpbios.h
The contents of include/linux/pnpbios.h are used only inside the PNPBIOS
backend, so this file doesn't need to be visible outside PNP.

This patch moves the contents into an existing PNPBIOS-specific file,
drivers/pnp/pnpbios/pnpbios.h.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Acked-By: Rene Herman <rene.herman@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-04-29 03:22:30 -04:00
Bjorn Helgaas
261b20da4b ISAPNP: remove unused pnp_dev->regs field
The "regs" field in struct pnp_dev is set but never read, so remove it.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Acked-By: Rene Herman <rene.herman@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-04-29 03:22:30 -04:00
Bjorn Helgaas
62cfb298b9 PNP: make interfaces private to the PNP core
The interfaces for registering protocols, devices, cards,
and resource options should only be used inside the PNP core.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Acked-By: Rene Herman <rene.herman@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-04-29 03:22:30 -04:00
Bjorn Helgaas
02d83b5da3 PNP: make pnp_resource_table private to PNP core
There are no remaining references to the PNP_MAX_* constants or
the pnp_resource_table structure outside of the PNP core.  Make
them private to the PNP core.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-04-29 03:22:27 -04:00
Bjorn Helgaas
13575e81bb PNP: convert resource accessors to use pnp_get_resource(), not pnp_resource_table
This removes more direct references to pnp_resource_table.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-04-29 03:22:23 -04:00
Bjorn Helgaas
b90eca0a61 PNP: add pnp_get_resource() interface
This adds a pnp_get_resource() that works the same way as
platform_get_resource().  This will enable us to consolidate
many pnp_resource_table references in one place, which will
make it easier to make the table dynamic.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Acked-By: Rene Herman <rene.herman@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-04-29 03:22:23 -04:00
Bjorn Helgaas
2cd1393098 PNP: remove unused interfaces using pnp_resource_table
Rene Herman <rene.herman@gmail.com> recently removed the only in-tree
driver uses of:

    pnp_init_resource_table()
    pnp_manual_config_dev()
    pnp_resource_change()

in this change:

    http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=109c53f840e551d6e99ecfd8b0131a968332c89f

These are no longer used in the PNP core either, so we can just remove
them completely.

It's possible that there are out-of-tree drivers that use these
interfaces.  They should be changed to either (1) use PNP quirks
to work around broken hardware or firmware, or (2) use the sysfs
interfaces to control resource usage from userspace.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Acked-By: Rene Herman <rene.herman@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-04-29 03:22:22 -04:00
Bjorn Helgaas
f449000209 PNP: add pnp_init_resources(struct pnp_dev *) interface
Add pnp_init_resources(struct pnp_dev *) to replace
pnp_init_resource_table(), which takes a pointer to the
pnp_resource_table itself.  Passing only the pnp_dev * reduces
the possibility for error in the caller and removes the
pnp_resource_table implementation detail from the interface.

Even though pnp_init_resource_table() is exported, I did not
export pnp_init_resources() because it is used only by the PNP
core.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Acked-By: Rene Herman <rene.herman@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-04-29 03:22:22 -04:00
Bjorn Helgaas
59284cb409 PNP: remove pnp_resource_table from internal get/set interfaces
When we call protocol->get() and protocol->set() methods, we currently
supply pointers to both the pnp_dev and the pnp_resource_table even
though the pnp_resource_table should always be the one associated with
the pnp_dev.

This removes the pnp_resource_table arguments to make it clear that
these methods only operate on the specified pnp_dev.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Acked-By: Rene Herman <rene.herman@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-04-29 03:22:21 -04:00
Bjorn Helgaas
c1caf06ccf PNP: add debug output to option registration
Add debug output to resource option registration functions (enabled
by CONFIG_PNP_DEBUG).  This uses dev_printk, so I had to add pnp_dev
arguments at the same time.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Acked-By: Rene Herman <rene.herman@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-04-29 03:22:20 -04:00
Bjorn Helgaas
048825deea PNP: make pnp_add_card_id() internal to PNP core
pnp_add_card_id() doesn't need to be exposed outside the PNP core, so
move the declaration to an internal header file.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Acked-By: Rene Herman <rene.herman@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-04-29 03:22:17 -04:00
Bjorn Helgaas
1692b27bf3 PNP: make pnp_add_id() internal to PNP core
pnp_add_id() doesn't need to be exposed outside the PNP core, so
move the declaration to an internal header file.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Acked-By: Rene Herman <rene.herman@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-04-29 03:22:15 -04:00
Bjorn Helgaas
ca0e8b6fd2 ISAPNP: move config register addresses out of isapnp.h
These are used only in drivers/pnp/isapnp/core.c, so no need to
expose them to the world.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Acked-By: Rene Herman <rene.herman@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-04-29 03:22:15 -04:00
Zhang Rui
e68b16abd9 thermal: add hwmon sysfs I/F
Add hwmon sys I/F for generic thermal driver.

Note: we have one hwmon class device for EACH TYPE of the thermal zone device.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Acked-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-04-29 02:48:01 -04:00
Zhang, Rui
9ec732ff80 thermal: add new get_crit_temp callback
Add a new callback so that the generic thermal can get
the critical trip point info of a thermal zone,
which is needed for building the tempX_crit hwmon sysfs attribute.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Acked-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-04-29 02:45:49 -04:00
Zhang Rui
63c4ec905d thermal: add the support for building the generic thermal as a module
Build the generic thermal driver as module "thermal_sys".

Make ACPI thermal, video, processor and fan SELECT the generic
thermal driver, as these drivers rely on it to build the sysfs I/F.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Acked-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-04-29 02:44:00 -04:00
Badari Pulavarty
9d88a2eb6e [POWERPC] Provide walk_memory_resource() for powerpc
Provide walk_memory_resource() for 64-bit powerpc.  PowerPC maintains
logical memory region mapping in the lmb.memory structure.  Walk
through these structures and do the callbacks for the contiguous
chunks.

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-29 15:57:53 +10:00
Badari Pulavarty
98d5c21c81 [POWERPC] Update lmb data structures for hotplug memory add/remove
The powerpc kernel maintains information about logical memory blocks
in the lmb.memory structure, which is initialized and updated at boot
time, but not when memory is added or removed while the kernel is
running.

This adds a hotplug memory notifier which updates lmb.memory when
memory is added or removed.  This information is useful for eHEA
driver to find out the memory layout and holes.

NOTE: No special locking is needed for lmb_add() and lmb_remove().
Calls to these are serialized by caller. (pSeries_reconfig_chain).

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-04-29 15:57:53 +10:00
Lennert Buytenhek
ce4e2e4558 mv643xx_eth: inter-mv643xx SMI port sharing
There exist chips with up to four mv643xx_eth silicon blocks but
only one external SMI (MII management) interface -- the SMI logic
of the first block is shared by all the blocks.

Handle this by allowing a per-port override of which
mv643xx_eth_shared's SMI registers (and spinlock) to use.

Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Nicolas Pitre <nico@marvell.com>
Signed-off-by: Dale Farnsworth <dale@farnsworth.org>
2008-04-28 21:17:07 -07:00
Lennert Buytenhek
240e4419e0 mv643xx_eth: shorten shared platform driver name
Change the MV643XX_ETH_SHARED_NAME platform driver name to something
shorter than 19 characters, so that we can register multiple (otherwise
we end up with sysfs conflicts since all instances will map to
"mv643xx_eth_shared." as there is a 20-char sysfs file name limit.)

Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Nicolas Pitre <nico@marvell.com>
Signed-off-by: Dale Farnsworth <dale@farnsworth.org>
2008-04-28 21:17:07 -07:00
Lennert Buytenhek
c416a41f99 mv643xx_eth: configurable t_clk
Make t_clk configurable via platform device data (with the current
hardcoded value, 133 MHz, being the default), as it varies across
different chip families.

Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Nicolas Pitre <nico@marvell.com>
Signed-off-by: Dale Farnsworth <dale@farnsworth.org>
2008-04-28 21:17:07 -07:00
Lennert Buytenhek
f2ce825d2a mv643xx_eth: mbus decode window support
Make it possible to pass mbus_dram_target_info to the mv643xx_eth
driver via the platform data, and make the mv643xx_eth driver
program the window registers based on this data if it is passed in.

Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Reviewed-by: Tzachi Perelstein <tzachi@marvell.com>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Dale Farnsworth <dale@farnsworth.org>
2008-04-28 21:17:07 -07:00
Lennert Buytenhek
fa3959f457 mv643xx_eth: get rid of static variables, allow multiple instances
Move mv643xx_eth's static state (ethernet register block base address
and MII management interface spinlock) into a struct hanging off the
shared platform device.  This is necessary to support chips that
contain multiple mv643xx_eth silicon blocks.

Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Nicolas Pitre <nico@marvell.com>
Signed-off-by: Dale Farnsworth <dale@farnsworth.org>
2008-04-28 21:17:07 -07:00