Commit graph

21401 commits

Author SHA1 Message Date
Sage Weil
24be0c4810 ceph: fix erroneous cap flush to non-auth mds
The int flushing is global and not clear on each iteration of the loop,
which can cause a second flush of caps to any MDSs with ids greater than
the auth.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-01-19 09:23:24 -08:00
Sage Weil
50aac4fec5 ceph: fix cap_wanted_delay_{min,max} mount option initialization
These were initialized to 0 instead of the default, fallout from the RBD
refactor in 3d14c5d2b6.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-01-19 09:23:22 -08:00
Steven Whitehouse
24d9765fc1 GFS2: Fix error path in gfs2_lookup_by_inum()
In the (impossible, except if there is fs corruption) error path
in gfs2_lookup_by_inum() if the call to gfs2_inode_refresh()
fails, it was leaving the function by calling iput() rather
than iget_failed(). This would cause future lookups of the same
inode to block forever.

This patch fixes the problem by moving the call to gfs2_inode_refresh()
into gfs2_inode_lookup() where iget_failed() is part of the error path
already. Also this cleans up some unreachable code and makes
gfs2_set_iop() static.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-01-18 14:49:08 +00:00
Benjamin Marzinski
23c3010808 GFS2: remove iopen glocks from cache on failed deletes
When a file gets deleted on GFS2, if a node can't get an exclusive lock on the
file's iopen glock, it punts on actually freeing up the space, because another
node is using the file.  When it does this, it needs to drop the iopen glock
from its cache so that the other node can get an exclusive lock on it. Now,
gfs2_delete_inode() sets GL_NOCACHE before dropping the shared lock on the
iopen glock in preparation for grabbing it in the exclusive state.  Since the
node needs the glock in the exclusive state, dropping the shared lock from the
cache doesn't slow down the case where no other nodes are using the file.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-01-18 14:28:29 +00:00
Al Viro
b89b12b462 autofs4: clean ->d_release() and autofs4_free_ino() up
The latter is called only when both ino and dentry are about to
be freed, so cleaning ->d_fsdata and ->dentry is pointless.

Acked-by: Ian Kent <raven@themaw.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-18 01:21:29 -05:00
Al Viro
26e6c91067 autofs4: split autofs4_init_ino()
split init_ino into new_ino and clean_ino; the former is
what used to be init_ino(NULL, sbi), the latter is for cases
where we passed non-NULL ino.  Lose unused arguments.

Acked-by: Ian Kent <raven@themaw.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-18 01:21:28 -05:00
Al Viro
5a37db302e autofs4: mkdir and symlink always get a dentry that had passed lookup
... so ->d_fsdata will have been set up before we get there

Acked-by: Ian Kent <raven@themaw.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-18 01:21:28 -05:00
Al Viro
726a5e0688 autofs4: autofs4_get_inode() doesn't need autofs_info * argument anymore
Acked-by: Ian Kent <raven@themaw.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-18 01:21:28 -05:00
Al Viro
0bf71d4d00 autofs4: kill ->size in autofs_info
It's used only to pass the length of symlink body to
autofs4_get_inode() in autofs4_dir_symlink().  We can
bloody well set inode->i_size in autofs4_dir_symlink()
directly and be done with that.

Acked-by: Ian Kent <raven@themaw.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-18 01:21:28 -05:00
Al Viro
09f12c03fa autofs4: pass mode to autofs4_get_inode() explicitly
In all cases we'd set inf->mode to know value just before
passing it to autofs4_get_inode().  That kills the need
to store it in autofs_info and pass it to autofs_init_ino()

Acked-by: Ian Kent <raven@themaw.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-18 01:21:27 -05:00
Al Viro
14a2f00bde autofs4: autofs4_mkroot() is not different from autofs4_init_ino()
Kill it.  Mind you, it's been an obfuscated call of autofs4_init_ino()
ever since 2.3.99pre6-4...

Acked-by: Ian Kent <raven@themaw.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-18 01:21:27 -05:00
Al Viro
292c5ee802 autofs4: keep symlink body in inode->i_private
gets rid of all ->free()/->u.symlink machinery in autofs; we simply
keep symlink bodies in inode->i_private and free them in ->evict_inode().

Acked-by: Ian Kent <raven@themaw.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-18 01:21:27 -05:00
Ian Kent
c0bcc9d552 autofs4 - fix debug print in autofs4_lookup()
oz_mode isn't defined any more, use autofs4_oz_mode(sbi) instead.

Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-18 01:21:27 -05:00
Ian Kent
8931221411 vfs - fix dentry ref count in do_lookup()
There is a ref count problem in fs/namei.c:do_lookup().

When walking in ref-walk mode, if follow_managed() returns a fail we
need to drop dentry and possibly vfsmount.  Clean up properly,
as we do in the other caller of follow_managed().

Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-18 01:21:26 -05:00
Ian Kent
c14cc63a63 autofs4 - fix get_next_positive_dentry()
The initialization condition in fs/autofs4/expire.c:get_next_positive_dentry()
appears to be incorrect. If prev == NULL I believe that root should be
returned.

Further down, at the current dentry check for it being simple_positive()
it looks like the d_lock for dentry p should be dropped instead of dentry
ret, otherwise when p is assinged to ret we end up with no lock on p and
a lost lock on ret, which leads to a deadlock.

Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-18 01:21:26 -05:00
Linus Torvalds
eee2a817df Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (25 commits)
  Btrfs: forced readonly mounts on errors
  btrfs: Require CAP_SYS_ADMIN for filesystem rebalance
  Btrfs: don't warn if we get ENOSPC in btrfs_block_rsv_check
  btrfs: Fix memory leak in btrfs_read_fs_root_no_radix()
  btrfs: check NULL or not
  btrfs: Don't pass NULL ptr to func that may deref it.
  btrfs: mount failure return value fix
  btrfs: Mem leak in btrfs_get_acl()
  btrfs: fix wrong free space information of btrfs
  btrfs: make the chunk allocator utilize the devices better
  btrfs: restructure find_free_dev_extent()
  btrfs: fix wrong calculation of stripe size
  btrfs: try to reclaim some space when chunk allocation fails
  btrfs: fix wrong data space statistics
  fs/btrfs: Fix build of ctree
  Btrfs: fix off by one while setting block groups readonly
  Btrfs: Add BTRFS_IOC_SUBVOL_GETFLAGS/SETFLAGS ioctls
  Btrfs: Add readonly snapshots support
  Btrfs: Refactor btrfs_ioctl_snap_create()
  btrfs: Extract duplicate decompress code
  ...
2011-01-17 14:43:43 -08:00
Linus Torvalds
9e8a462a01 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6:
  ecryptfs: remove unnecessary decrypt when extending a file
  ecryptfs: Fix ecryptfs_printk() size_t warnings
  fs/ecryptfs: Add printf format/argument verification and fix fallout
  ecryptfs: fixed testing of file descriptor flags
  ecryptfs: test lower_file pointer when lower_file_mutex is locked
  ecryptfs: missing initialization of the superblock 'magic' field
  ecryptfs: moved ECRYPTFS_SUPER_MAGIC definition to linux/magic.h
  ecryptfs: fix truncation error in ecryptfs_read_update_atime
2011-01-17 12:39:57 -08:00
Geert Uytterhoeven
cf78859f52 xfs: Do not name variables "panic"
On platforms that call panic() inside their BUG() macro (m68k/sun3, and
all platforms that don't set HAVE_ARCH_BUG), compilation fails with:

| fs/xfs/support/debug.c: In function ‘xfs_cmn_err’:
| fs/xfs/support/debug.c:92: error: called object ‘panic’ is not a function

as the local variable "panic" conflicts with the "panic()" function.
Rename the local variable to resolve this.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-17 12:39:07 -08:00
liubo
acce952b02 Btrfs: forced readonly mounts on errors
This patch comes from "Forced readonly mounts on errors" ideas.

As we know, this is the first step in being more fault tolerant of disk
corruptions instead of just using BUG() statements.

The major content:
- add a framework for generating errors that should result in filesystems
  going readonly.
- keep FS state in disk super block.
- make sure that all of resource will be freed and released at umount time.
- make sure that fter FS is forced readonly on error, there will be no more
  disk change before FS is corrected. For this, we should stop write operation.

After this patch is applied, the conversion from BUG() to such a framework can
happen incrementally.

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-01-17 15:13:08 -05:00
Linus Torvalds
1a47f7a84e Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
  cifs: add cruid= mount option
  cifs: cFYI the entire error code in map_smb_to_linux_error
2011-01-17 11:17:51 -08:00
Linus Torvalds
ab2020f2f1 Merge git://git.infradead.org/mtd-2.6
* git://git.infradead.org/mtd-2.6: (59 commits)
  mtd: mtdpart: disallow reading OOB past the end of the partition
  mtd: pxa3xx_nand: NULL dereference in pxa3xx_nand_probe
  UBI: use mtd->writebufsize to set minimal I/O unit size
  mtd: initialize writebufsize in the MTD object of a partition
  mtd: onenand: add mtd->writebufsize initialization
  mtd: nand: add mtd->writebufsize initialization
  mtd: cfi: add writebufsize initialization
  mtd: add writebufsize field to mtd_info struct
  mtd: OneNAND: OMAP2/3: prevent regulator sleeping while OneNAND is in use
  mtd: OneNAND: add enable / disable methods to onenand_chip
  mtd: m25p80: Fix JEDEC ID for AT26DF321
  mtd: txx9ndfmc: limit transfer bytes to 512 (ECC provides 6 bytes max)
  mtd: cfi_cmdset_0002: add support for Samsung K8D3x16UxC NOR chips
  mtd: cfi_cmdset_0002: add support for Samsung K8D6x16UxM NOR chips
  mtd: nand: ams-delta: drop omap_read/write, use ioremap
  mtd: m25p80: add debugging trace in sst_write
  mtd: nand: ams-delta: select for built-in by default
  mtd: OneNAND: lighten scary initial bad block messages
  mtd: OneNAND: OMAP2/3: add support for command line partitioning
  mtd: nand: rearrange ONFI revision checking, add ONFI 2.3
  ...

Fix up trivial conflict in drivers/mtd/Kconfig as per DavidW.
2011-01-17 11:15:30 -08:00
Frank Swiderski
24562486be ecryptfs: remove unnecessary decrypt when extending a file
Removes an unecessary page decrypt from ecryptfs_begin_write when the
page is beyond the current file size. Previously, the call to
ecryptfs_decrypt_page would result in a read of 0 bytes, but still
attempt to decrypt an entire page. This patch detects that case and
merely zeros the page before marking it up-to-date.

Signed-off-by: Frank Swiderski <fes@chromium.org>
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2011-01-17 13:01:25 -06:00
Tyler Hicks
f24b38874e ecryptfs: Fix ecryptfs_printk() size_t warnings
Commit cb55d21f6fa19d8c6c2680d90317ce88c1f57269 revealed a number of
missing 'z' length modifiers in calls to ecryptfs_printk() when
printing variables of type size_t. This patch fixes those compiler
warnings.

Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2011-01-17 13:01:24 -06:00
Joe Perches
888d57bbc9 fs/ecryptfs: Add printf format/argument verification and fix fallout
Add __attribute__((format... to __ecryptfs_printk
Make formats and arguments match.
Add casts to (unsigned long long) for %llu.

Signed-off-by: Joe Perches <joe@perches.com>
[tyhicks: 80 columns cleanup and fixed typo]
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2011-01-17 13:01:23 -06:00
Roberto Sassu
0abe116947 ecryptfs: fixed testing of file descriptor flags
This patch replaces the check (lower_file->f_flags & O_RDONLY) with
((lower_file & O_ACCMODE) == O_RDONLY).

Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2011-01-17 11:24:43 -06:00
Roberto Sassu
27992890b0 ecryptfs: test lower_file pointer when lower_file_mutex is locked
This patch prevents the lower_file pointer in the 'ecryptfs_inode_info'
structure to be checked when the mutex 'lower_file_mutex' is not locked.

Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2011-01-17 11:24:42 -06:00
Roberto Sassu
070baa5128 ecryptfs: missing initialization of the superblock 'magic' field
This patch initializes the 'magic' field of ecryptfs filesystems to
ECRYPTFS_SUPER_MAGIC.

Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
[tyhicks: merge with 66cb76666d]
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2011-01-17 11:23:01 -06:00
Roberto Sassu
2a8652f4e0 ecryptfs: moved ECRYPTFS_SUPER_MAGIC definition to linux/magic.h
The definition of ECRYPTFS_SUPER_MAGIC has been moved to the include
file 'linux/magic.h' to become available to other kernel subsystems.

Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2011-01-17 10:44:31 -06:00
Edward Shishkin
38a708d775 ecryptfs: fix truncation error in ecryptfs_read_update_atime
This is similar to the bug found in direct-io not so long ago.

Fix up truncation (ssize_t->int).  This only matters with >2G
reads/writes, which the kernel doesn't permit.

Signed-off-by: Edward Shishkin <edward.shishkin@gmail.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Eric Sandeen <esandeen@redhat.com>
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2011-01-17 10:44:30 -06:00
Namhyung Kim
ecf5632dd1 fs: fix address space warnings in ioctl_fiemap()
The fi_extents_start field of struct fiemap_extent_info is a
user pointer but was not marked as __user. This makes sparse
emit following warnings:

  CHECK   fs/ioctl.c
fs/ioctl.c:114:26: warning: incorrect type in argument 1 (different address spaces)
fs/ioctl.c:114:26:    expected void [noderef] <asn:1>*dst
fs/ioctl.c:114:26:    got struct fiemap_extent *[assigned] dest
fs/ioctl.c:202:14: warning: incorrect type in argument 1 (different address spaces)
fs/ioctl.c:202:14:    expected void const volatile [noderef] <asn:1>*<noident>
fs/ioctl.c:202:14:    got struct fiemap_extent *[assigned] fi_extents_start
fs/ioctl.c:212:27: warning: incorrect type in argument 1 (different address spaces)
fs/ioctl.c:212:27:    expected void [noderef] <asn:1>*dst
fs/ioctl.c:212:27:    got char *<noident>

Also add 'ufiemap' variable to eliminate unnecessary casts.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-17 08:21:42 -05:00
Namhyung Kim
27eaa1c90c aio: check return value of create_workqueue()
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-17 05:12:44 -05:00
Dr. David Alan Gilbert
274052ef0b hpfs_setattr error case avoids unlock_kernel
This fixed a case that 'sparse' spotted where hpfs_setattr has an error return
that didn't go through it's path that unlocks.

This is against git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
version 6313e3c217.

Build tested only, I don't have an hpfs file system to test.

Dave

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-17 05:11:37 -05:00
Namhyung Kim
e0bb6bda43 compat: copy missing fields in compat_statfs64 to user
f_flags and f_spare fields were not copied to userspace when
compat_sys_[f]statfs64 called.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-17 04:54:38 -05:00
Namhyung Kim
974d879e80 compat: update comment of compat statfs syscalls
The commit 7ed1ee6118 ("Take statfs variants to fs/statfs.c")
separates out statfs syscalls from fs/open.c. Thus the comment
should be changed also.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Cc: Jiri Kosina <trivial@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-17 04:54:38 -05:00
Namhyung Kim
6a5640f102 compat: remove unnecessary assignment in compat_rw_copy_check_uvector()
*@ret_pointer is initialized to @fast_pointer thus the assignment is
redundant.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-17 04:54:38 -05:00
Randy Dunlap
16ebe911eb fs: FS_POSIX_ACL does not depend on BLOCK
- Fix a kconfig unmet dependency warning.
- Remove the comment that identifies which filesystems use POSIX ACL
  utility routines.
- Move the FS_POSIX_ACL symbol outside of the BLOCK symbol if/endif block
  because its functions do not depend on BLOCK and some of the filesystems
  that use it do not depend on BLOCK.

warning: (GENERIC_ACL && JFFS2_FS_POSIX_ACL && NFSD_V4 && NFS_ACL_SUPPORT && 9P_FS_POSIX_ACL) selects FS_POSIX_ACL which has unmet direct dependencies (BLOCK)

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-17 03:30:37 -05:00
Steven Rostedt
3bc0ba4305 fs: Remove unlikely() from fget_light()
There's an unlikely() in fget_light() that assumes the file ref count
will be 1. Running the annotate branch profiler on a desktop that is
performing daily tasks (running firefox, evolution, xchat and is also part
of a distcc farm), it shows that the ref count is not 1 that often.

 correct incorrect      %    Function                  File              Line
 ------- ---------      -    --------                  ----              ----
1035099358 6209599193  85    fget_light              file_table.c         315

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-17 03:26:27 -05:00
Christoph Hellwig
2fe17c1075 fallocate should be a file operation
Currently all filesystems except XFS implement fallocate asynchronously,
while XFS forced a commit.  Both of these are suboptimal - in case of O_SYNC
I/O we really want our allocation on disk, especially for the !KEEP_SIZE
case where we actually grow the file with user-visible zeroes.  On the
other hand always commiting the transaction is a bad idea for fast-path
uses of fallocate like for example in recent Samba versions.   Given
that block allocation is a data plane operation anyway change it from
an inode operation to a file operation so that we have the file structure
available that lets us check for O_SYNC.

This also includes moving the code around for a few of the filesystems,
and remove the already unnedded S_ISDIR checks given that we only wire
up fallocate for regular files.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-17 02:25:31 -05:00
Christoph Hellwig
64c23e8687 make the feature checks in ->fallocate future proof
Instead of various home grown checks that might need updates for new
flags just check for any bit outside the mask of the features supported
by the filesystem.  This makes the check future proof for any newly
added flag.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-17 02:25:30 -05:00
Al Viro
b1e75df45a tidy up around finish_automount()
do_add_mount() and mnt_clear_expiry() are not needed outside of
namespace.c anymore, now that namei has finish_automount() to
use.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-17 01:47:59 -05:00
Al Viro
15f9a3f3e1 don't drop newmnt on error in do_add_mount()
That gets rid of the kludge in finish_automount() - we need
to keep refcount on the vfsmount as-is until we evict it from
expiry list.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-17 01:41:58 -05:00
Al Viro
19a167af7c Take the completion of automount into new helper
... and shift it from namei.c to namespace.c

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-17 01:35:23 -05:00
Linus Torvalds
8a335bc631 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/scsi-post-merge-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/scsi-post-merge-2.6:
  ocfs2: Make OCFS2_FS depend on CONFIGFS_FS
  dlm: Make DLM depend on CONFIGFS_FS
  net: Make NETCONSOLE_DYNAMIC depend on CONFIGFS_FS
  configfs: change depends -> select SYSFS
  [SCSI] sd,sr: kill compat SDEV_MEDIA_CHANGE event
  [SCSI] sd: implement sd_check_events()
2011-01-16 15:06:43 -08:00
Al Viro
7e3d0eb0b0 VFS: Fix UP compile error in fs/namespace.c
mnt_longterm is there only on SMP

Reported-and-tested-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-16 14:59:45 -08:00
Nicholas Bellinger
7b1fff7e4f ocfs2: Make OCFS2_FS depend on CONFIGFS_FS
This patch fixes the following kconfig error after changing
CONFIGFS_FS -> select SYSFS:

fs/sysfs/Kconfig:1:error: recursive dependency detected!
fs/sysfs/Kconfig:1:	symbol SYSFS is selected by CONFIGFS_FS
fs/configfs/Kconfig:1:	symbol CONFIGFS_FS is selected by OCFS2_FS
fs/ocfs2/Kconfig:1:	symbol OCFS2_FS depends on SYSFS

Signed-off-by: Nicholas A. Bellinger <nab@linux-iscsi.org>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: James Bottomley <James.Bottomley@suse.de>
2011-01-16 21:22:40 +00:00
Nicholas Bellinger
86c747d2a4 dlm: Make DLM depend on CONFIGFS_FS
This patch fixes the following kconfig error after changing
CONFIGFS_FS -> select SYSFS:

fs/sysfs/Kconfig:1:error: recursive dependency detected!
fs/sysfs/Kconfig:1:	symbol SYSFS is selected by CONFIGFS_FS
fs/configfs/Kconfig:1:	symbol CONFIGFS_FS is selected by DLM
fs/dlm/Kconfig:1:	symbol DLM depends on SYSFS

Signed-off-by: Nicholas A. Bellinger <nab@linux-iscsi.org>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: James Bottomley <James.Bottomley@suse.de>
2011-01-16 21:22:37 +00:00
Nicholas Bellinger
e205117285 configfs: change depends -> select SYSFS
This patch changes configfs to select SYSFS to fix the following:

warning: (TARGET_CORE && GFS2_FS) selects CONFIGFS_FS which has unmet direct dependencies (SYSFS)

Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Nicholas A. Bellinger <nab@linux-iscsi.org>
Acked-by: Joel Becker <jlbec@evilplan.org>
2011-01-16 21:22:29 +00:00
Stefan Schmidt
f8b18087fd fs/btrfs: Fix build of ctree
Fix the build failure in some configurations:

     CC [M]  fs/btrfs/ctree.o
  In file included from fs/btrfs/ctree.c:21:0:
  fs/btrfs/ctree.h:1003:17: error: field 'super_kobj' has incomplete type
  fs/btrfs/ctree.h:1074:17: error: field 'root_kobj' has incomplete type
  make[2]: *** [fs/btrfs/ctree.o] Error 1
  make[1]: *** [fs/btrfs] Error 2
  make: *** [fs] Error 2

caused by commit 57cc7215b7 ("headers: kobject.h redux")

We need to include kobject.h here.

Reported-by: Jeff Garzik <jeff@garzik.org>
Fix-suggested-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-16 12:59:42 -08:00
Linus Torvalds
5520ebd308 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus
* git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus:
  Squashfs: simplify CONFIG_SQUASHFS_LZO handling
  Squashfs: move squashfs_i() definition from squashfs.h
  Squashfs: get rid of default n in Kconfig
  Squashfs: add missing check in zlib_wrapper
  Squashfs: remove unnecessary variable in zlib_wrapper
  Squashfs: Add XZ compression configuration option
  Squashfs: add XZ compression support
2011-01-16 12:11:13 -08:00
Linus Torvalds
f8206b925f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (23 commits)
  sanitize vfsmount refcounting changes
  fix old umount_tree() breakage
  autofs4: Merge the remaining dentry ops tables
  Unexport do_add_mount() and add in follow_automount(), not ->d_automount()
  Allow d_manage() to be used in RCU-walk mode
  Remove a further kludge from __do_follow_link()
  autofs4: Bump version
  autofs4: Add v4 pseudo direct mount support
  autofs4: Fix wait validation
  autofs4: Clean up autofs4_free_ino()
  autofs4: Clean up dentry operations
  autofs4: Clean up inode operations
  autofs4: Remove unused code
  autofs4: Add d_manage() dentry operation
  autofs4: Add d_automount() dentry operation
  Remove the automount through follow_link() kludge code from pathwalk
  CIFS: Use d_automount() rather than abusing follow_link()
  NFS: Use d_automount() rather than abusing follow_link()
  AFS: Use d_automount() rather than abusing follow_link()
  Add an AT_NO_AUTOMOUNT flag to suppress terminal automount
  ...
2011-01-16 11:31:50 -08:00
Al Viro
f03c65993b sanitize vfsmount refcounting changes
Instead of splitting refcount between (per-cpu) mnt_count
and (SMP-only) mnt_longrefs, make all references contribute
to mnt_count again and keep track of how many are longterm
ones.

Accounting rules for longterm count:
	* 1 for each fs_struct.root.mnt
	* 1 for each fs_struct.pwd.mnt
	* 1 for having non-NULL ->mnt_ns
	* decrement to 0 happens only under vfsmount lock exclusive

That allows nice common case for mntput() - since we can't drop the
final reference until after mnt_longterm has reached 0 due to the rules
above, mntput() can grab vfsmount lock shared and check mnt_longterm.
If it turns out to be non-zero (which is the common case), we know
that this is not the final mntput() and can just blindly decrement
percpu mnt_count.  Otherwise we grab vfsmount lock exclusive and
do usual decrement-and-check of percpu mnt_count.

For fs_struct.c we have mnt_make_longterm() and mnt_make_shortterm();
namespace.c uses the latter in places where we don't already hold
vfsmount lock exclusive and opencodes a few remaining spots where
we need to manipulate mnt_longterm.

Note that we mostly revert the code outside of fs/namespace.c back
to what we used to have; in particular, normal code doesn't need
to care about two kinds of references, etc.  And we get to keep
the optimization Nick's variant had bought us...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-16 13:47:07 -05:00
Al Viro
7b8a53fd81 fix old umount_tree() breakage
Expiry-related code calls umount_tree() several times with
the same list to collect vfsmounts to.  Which is fine, except
that umount_tree() implicitly assumed that the list would
be empty on each call - it moves the victims over there and
then iterates through the list kicking them out.  It's *almost*
idempotent, so everything nearly worked.  However, mnt->ghosts
handling (and thus expirability checks) had been broken - that
part was not idempotent...

The fix is trivial - use local temporary list, splice it to
the the collector list when we are through.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-16 13:47:01 -05:00
Ben Hutchings
6f88a4403d btrfs: Require CAP_SYS_ADMIN for filesystem rebalance
Filesystem rebalancing (BTRFS_IOC_BALANCE) affects the entire
filesystem and may run uninterruptibly for a long time.  This does not
seem to be something that an unprivileged user should be able to do.

Reported-by: Aron Xu <happyaron.xu@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-01-16 11:30:20 -05:00
Josef Bacik
f690efb1aa Btrfs: don't warn if we get ENOSPC in btrfs_block_rsv_check
If we run low on space we could get a bunch of warnings out of
btrfs_block_rsv_check, but this is mostly just called via the transaction code
to see if we need to end the transaction, it expects to see failures, so let's
not WARN and freak everybody out for no reason.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-01-16 11:30:20 -05:00
Tsutomu Itoh
5e540f7715 btrfs: Fix memory leak in btrfs_read_fs_root_no_radix()
In btrfs_read_fs_root_no_radix(), 'root' is not freed if
btrfs_search_slot() returns error.

Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-01-16 11:30:20 -05:00
Tsutomu Itoh
91ca338d77 btrfs: check NULL or not
Should check if functions returns NULL or not.

Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-01-16 11:30:20 -05:00
Jesper Juhl
ff175d57f0 btrfs: Don't pass NULL ptr to func that may deref it.
Hi,

In fs/btrfs/inode.c::fixup_tree_root_location() we have this code:

...
 		if (!path) {
 			err = -ENOMEM;
 			goto out;
 		}
...
 	out:
 		btrfs_free_path(path);
 		return err;

btrfs_free_path() passes its argument on to other functions and some of
them end up dereferencing the pointer.
In the code above that pointer is clearly NULL, so btrfs_free_path() will
eventually cause a NULL dereference.

There are many ways to cut this cake (fix the bug). The one I chose was to
make btrfs_free_path() deal gracefully with NULL pointers. If you
disagree, feel free to come up with an alternative patch.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-01-16 11:30:20 -05:00
Dave Young
20b450773d btrfs: mount failure return value fix
I happened to pass swap partition as root partition in cmdline,
then kernel panic and tell me about "Cannot open root device".
It is not correct, in fact it is a fs type mismatch instead of 'no device'.

Eventually I found btrfs mounting failed with -EIO, it should be -EINVAL.
The logic in init/do_mounts.c:
        for (p = fs_names; *p; p += strlen(p)+1) {
                int err = do_mount_root(name, p, flags, root_mount_data);
                switch (err) {
                        case 0:
                                goto out;
                        case -EACCES:
                                flags |= MS_RDONLY;
                                goto retry;
                        case -EINVAL:
                                continue;
                }
		print "Cannot open root device"
		panic
	}
SO fs type after btrfs will have no chance to mount

Here fix the return value as -EINVAL

Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-01-16 11:30:19 -05:00
Jesper Juhl
42838bb265 btrfs: Mem leak in btrfs_get_acl()
It seems to me that we leak the memory allocated to 'value' in
btrfs_get_acl() if the call to posix_acl_from_xattr() fails.
Here's a patch that attempts to correct that problem.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-01-16 11:30:19 -05:00
Miao Xie
6d07bcec96 btrfs: fix wrong free space information of btrfs
When we store data by raid profile in btrfs with two or more different size
disks, df command shows there is some free space in the filesystem, but the
user can not write any data in fact, df command shows the wrong free space
information of btrfs.

 # mkfs.btrfs -d raid1 /dev/sda9 /dev/sda10
 # btrfs-show
 Label: none  uuid: a95cd49e-6e33-45b8-8741-a36153ce4b64
 	Total devices 2 FS bytes used 28.00KB
 	devid    1 size 5.01GB used 2.03GB path /dev/sda9
 	devid    2 size 10.00GB used 2.01GB path /dev/sda10
 # btrfs device scan /dev/sda9 /dev/sda10
 # mount /dev/sda9 /mnt
 # dd if=/dev/zero of=tmpfile0 bs=4K count=9999999999
   (fill the filesystem)
 # sync
 # df -TH
 Filesystem	Type	Size	Used	Avail	Use%	Mounted on
 /dev/sda9	btrfs	17G	8.6G	5.4G	62%	/mnt
 # btrfs-show
 Label: none  uuid: a95cd49e-6e33-45b8-8741-a36153ce4b64
 	Total devices 2 FS bytes used 3.99GB
 	devid    1 size 5.01GB used 5.01GB path /dev/sda9
 	devid    2 size 10.00GB used 4.99GB path /dev/sda10

It is because btrfs cannot allocate chunks when one of the pairing disks has
no space, the free space on the other disks can not be used for ever, and should
be subtracted from the total space, but btrfs doesn't subtract this space from
the total. It is strange to the user.

This patch fixes it by calcing the free space that can be used to allocate
chunks.

Implementation:
1. get all the devices free space, and align them by stripe length.
2. sort the devices by the free space.
3. check the free space of the devices,
   3.1. if it is not zero, and then check the number of the devices that has
        more free space than this device,
        if the number of the devices is beyond the min stripe number, the free
        space can be used, and add into total free space.
        if the number of the devices is below the min stripe number, we can not
        use the free space, the check ends.
   3.2. if the free space is zero, check the next devices, goto 3.1

This implementation is just likely fake chunk allocation.

After appling this patch, df can show correct space information:
 # df -TH
 Filesystem	Type	Size	Used	Avail	Use%	Mounted on
 /dev/sda9	btrfs	17G	8.6G	0	100%	/mnt

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-01-16 11:30:19 -05:00
Miao Xie
b2117a39fa btrfs: make the chunk allocator utilize the devices better
With this patch, we change the handling method when we can not get enough free
extents with default size.

Implementation:
1. Look up the suitable free extent on each device and keep the search result.
   If not find a suitable free extent, keep the max free extent
2. If we get enough suitable free extents with default size, chunk allocation
   succeeds.
3. If we can not get enough free extents, but the number of the extent with
   default size is >= min_stripes, we just change the mapping information
   (reduce the number of stripes in the extent map), and chunk allocation
   succeeds.
4. If the number of the extent with default size is < min_stripes, sort the
   devices by its max free extent's size descending
5. Use the size of the max free extent on the (num_stripes - 1)th device as the
   stripe size to allocate the device space

By this way, the chunk allocator can allocate chunks as large as possible when
the devices' space is not enough and make full use of the devices.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-01-16 11:30:19 -05:00
Miao Xie
7bfc837df9 btrfs: restructure find_free_dev_extent()
- make it return the start position and length of the max free space when it can
  not find a suitable free space.
- make it more readability

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-01-16 11:30:19 -05:00
Miao Xie
1974a3b42d btrfs: fix wrong calculation of stripe size
There are two tiny problem:
- One is When we check the chunk size is greater than the max chunk size or not,
  we should take mirrors into account, but the original code didn't.
- The other is btrfs shouldn't use the size of the residual free space as the
  length of of a dup chunk when doing chunk allocation. It is because the device
  space that a dup chunk needs is twice as large as the chunk size, if we use
  the size of the residual free space as the length of a dup chunk, we can not
  get enough free space. Fix it.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Reviewed-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-01-16 11:30:19 -05:00
Miao Xie
d52a5b5f1f btrfs: try to reclaim some space when chunk allocation fails
We cannot write data into files when when there is tiny space in the filesystem.

Reproduce steps:
 # mkfs.btrfs /dev/sda1
 # mount /dev/sda1 /mnt
 # dd if=/dev/zero of=/mnt/tmpfile0 bs=4K count=1
 # dd if=/dev/zero of=/mnt/tmpfile1 bs=4K count=99999999999999
   (fill the filesystem)
 # umount /mnt
 # mount /dev/sda1 /mnt
 # rm -f /mnt/tmpfile0
 # dd if=/dev/zero of=/mnt/tmpfile0 bs=4K count=1
   (failed with nospec)

But if we do the last step again, we can write data successfully. The reason of
the problem is that btrfs didn't try to commit the current transaction and
reclaim some space when chunk allocation failed.

This patch fixes it by committing the current transaction to reclaim some
space when chunk allocation fails.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Reviewed-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-01-16 11:30:19 -05:00
Miao Xie
299a08b1c3 btrfs: fix wrong data space statistics
Josef has implemented mixed data/metadata chunks, we must add those chunks'
space just like data chunks.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Reviewed-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-01-16 11:30:19 -05:00
Stefan Schmidt
f580eb0931 fs/btrfs: Fix build of ctree
CC [M]  fs/btrfs/ctree.o
In file included from fs/btrfs/ctree.c:21:0:
fs/btrfs/ctree.h:1003:17: error: field <91>super_kobj<92> has incomplete type
fs/btrfs/ctree.h:1074:17: error: field <91>root_kobj<92> has incomplete type
make[2]: *** [fs/btrfs/ctree.o] Error 1
make[1]: *** [fs/btrfs] Error 2
make: *** [fs] Error 2

We need to include kobject.h here.

Reported-by: Jeff Garzik <jeff@garzik.org>
Fix-suggested-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-01-16 11:30:19 -05:00
Chris Mason
f892436eb2 Merge branch 'lzo-support' of git://repo.or.cz/linux-btrfs-devel into btrfs-38 2011-01-16 11:25:54 -05:00
Chris Mason
26c79f6ba0 Merge branch 'readonly-snapshots' of git://repo.or.cz/linux-btrfs-devel into btrfs-38 2011-01-16 11:24:45 -05:00
David Howells
b650c858c2 autofs4: Merge the remaining dentry ops tables
Merge the remaining autofs4 dentry ops tables.  It doesn't matter if
d_automount and d_manage are present on something that's not mountable or
holdable as these ops are only used if the appropriate flags are set in
dentry->d_flags.

[AV] switch to ->s_d_op, since now _everything_ on autofs4 is using the
same dentry_operations.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-15 20:07:49 -05:00
David Howells
ea5b778a8b Unexport do_add_mount() and add in follow_automount(), not ->d_automount()
Unexport do_add_mount() and make ->d_automount() return the vfsmount to be
added rather than calling do_add_mount() itself.  follow_automount() will then
do the addition.

This slightly complicates things as ->d_automount() normally wants to add the
new vfsmount to an expiration list and start an expiration timer.  The problem
with that is that the vfsmount will be deleted if it has a refcount of 1 and
the timer will not repeat if the expiration list is empty.

To this end, we require the vfsmount to be returned from d_automount() with a
refcount of (at least) 2.  One of these refs will be dropped unconditionally.
In addition, follow_automount() must get a 3rd ref around the call to
do_add_mount() lest it eat a ref and return an error, leaving the mount we
have open to being expired as we would otherwise have only 1 ref on it.

d_automount() should also add the the vfsmount to the expiration list (by
calling mnt_set_expiry()) and start the expiration timer before returning, if
this mechanism is to be used.  The vfsmount will be unlinked from the
expiration list by follow_automount() if do_add_mount() fails.

This patch also fixes the call to do_add_mount() for AFS to propagate the mount
flags from the parent vfsmount.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-15 20:07:48 -05:00
David Howells
ab90911ff9 Allow d_manage() to be used in RCU-walk mode
Allow d_manage() to be called from pathwalk when it is in RCU-walk mode as well
as when it is in Ref-walk mode.  This permits __follow_mount_rcu() to call
d_manage() directly.  d_manage() needs a parameter to indicate that it is in
RCU-walk mode as it isn't allowed to sleep if in that mode (but should return
-ECHILD instead).

autofs4_d_manage() can then be set to retain RCU-walk mode if the daemon
accesses it and otherwise request dropping back to ref-walk mode.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-15 20:07:47 -05:00
David Howells
87556ef199 Remove a further kludge from __do_follow_link()
Remove a further kludge from __do_follow_link() as it's no longer required with
the automount code.

This reverts the non-helper-function parts of
051d381259, which breaks union mounts.

Reported-by: vaurora@redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-15 20:07:46 -05:00
Ian Kent
dd89f90d2d autofs4: Add v4 pseudo direct mount support
Version 4 of autofs provides a pseudo direct mount implementation
that relies on directories at the leaves of a directory tree under
an indirect mount to trigger mounts.

This patch adds support for that functionality.

Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-15 20:07:44 -05:00
Ian Kent
9e3fea16ba autofs4: Fix wait validation
It is possible for the check in wait.c:validate_request() to return
an incorrect result if the dentry that was mounted upon has changed
during the callback.

Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-15 20:07:43 -05:00
Ian Kent
6651149371 autofs4: Clean up autofs4_free_ino()
When this function is called the local reference count does't need to
be updated since the dentry is going away and dput definitely must
not be called here.

Also the autofs info struct field inode isn't used so remove it.

Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-15 20:07:42 -05:00
Ian Kent
71e469db24 autofs4: Clean up dentry operations
There are now two distinct dentry operations uses. One for dentrys
that trigger mounts and one for dentrys that do not.

Rationalize the use of these dentry operations and rename them to
reflect their function.

Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-15 20:07:41 -05:00
Ian Kent
e61da20a50 autofs4: Clean up inode operations
Since the use of ->follow_link() has been eliminated there is no
need to separate the indirect and direct inode operations.

Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-15 20:07:40 -05:00
Ian Kent
8c13a676d5 autofs4: Remove unused code
Remove code that is not used due to the use of ->d_automount()
and ->d_manage().

Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-15 20:07:39 -05:00
Ian Kent
b5b801779d autofs4: Add d_manage() dentry operation
This patch required a previous patch to add the ->d_automount()
dentry operation.

Add a function to use the newly defined ->d_manage() dentry operation
for blocking during mount and expire.

Whether the VFS calls the dentry operations d_automount() and d_manage()
is controled by the DMANAGED_AUTOMOUNT and DMANAGED_TRANSIT flags. autofs
uses the d_automount() operation to callback to user space to request
mount operations and the d_manage() operation to block walks into mounts
that are under construction or destruction.

In order to prevent these functions from being called unnecessarily the
DMANAGED_* flags are cleared for cases which would cause this. In the
common case the DMANAGED_AUTOMOUNT and DMANAGED_TRANSIT flags are both
set for dentrys waiting to be mounted. The DMANAGED_TRANSIT flag is
cleared upon successful mount request completion and set during expire
runs, both during the dentry expire check, and if selected for expire,
is left set until a subsequent successful mount request completes.

The exception to this is the so-called rootless multi-mount which has
no actual mount at its base. In this case the DMANAGED_AUTOMOUNT flag
is cleared upon successful mount request completion as well and set
again after a successful expire.

Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-15 20:07:38 -05:00
Ian Kent
10584211e4 autofs4: Add d_automount() dentry operation
Add a function to use the newly defined ->d_automount() dentry operation
for triggering mounts instead of doing the user space callback in ->lookup()
and ->d_revalidate().

Note, to be useful the subsequent patch to add the ->d_manage() dentry
operation is also needed so the discussion of functionality is deferred to
that patch.

Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-15 20:07:37 -05:00
David Howells
db3729153e Remove the automount through follow_link() kludge code from pathwalk
Remove the automount through follow_link() kludge code from pathwalk in favour
of using d_automount().

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Ian Kent <raven@themaw.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-15 20:07:36 -05:00
David Howells
01c64feac4 CIFS: Use d_automount() rather than abusing follow_link()
Make CIFS use the new d_automount() dentry operation rather than abusing
follow_link() on directories.

[NOTE: THIS IS UNTESTED!]

Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Steve French <sfrench@samba.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-15 20:07:35 -05:00
David Howells
36d43a4376 NFS: Use d_automount() rather than abusing follow_link()
Make NFS use the new d_automount() dentry operation rather than abusing
follow_link() on directories.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Ian Kent <raven@themaw.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-15 20:07:34 -05:00
David Howells
d18610b0ce AFS: Use d_automount() rather than abusing follow_link()
Make AFS use the new d_automount() dentry operation rather than abusing
follow_link() on directories.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-15 20:07:33 -05:00
David Howells
6f45b65672 Add an AT_NO_AUTOMOUNT flag to suppress terminal automount
Add an AT_NO_AUTOMOUNT flag to suppress terminal automounting of automount
point directories.  This can be used by fstatat() users to permit the
gathering of attributes on an automount point and also prevent
mass-automounting of a directory of automount points by ls.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Ian Kent <raven@themaw.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-15 20:07:33 -05:00
David Howells
cc53ce53c8 Add a dentry op to allow processes to be held during pathwalk transit
Add a dentry op (d_manage) to permit a filesystem to hold a process and make it
sleep when it tries to transit away from one of that filesystem's directories
during a pathwalk.  The operation is keyed off a new dentry flag
(DCACHE_MANAGE_TRANSIT).

The filesystem is allowed to be selective about which processes it holds and
which it permits to continue on or prohibits from transiting from each flagged
directory.  This will allow autofs to hold up client processes whilst letting
its userspace daemon through to maintain the directory or the stuff behind it
or mounted upon it.

The ->d_manage() dentry operation:

	int (*d_manage)(struct path *path, bool mounting_here);

takes a pointer to the directory about to be transited away from and a flag
indicating whether the transit is undertaken by do_add_mount() or
do_move_mount() skipping through a pile of filesystems mounted on a mountpoint.

It should return 0 if successful and to let the process continue on its way;
-EISDIR to prohibit the caller from skipping to overmounted filesystems or
automounting, and to use this directory; or some other error code to return to
the user.

->d_manage() is called with namespace_sem writelocked if mounting_here is true
and no other locks held, so it may sleep.  However, if mounting_here is true,
it may not initiate or wait for a mount or unmount upon the parameter
directory, even if the act is actually performed by userspace.

Within fs/namei.c, follow_managed() is extended to check with d_manage() first
on each managed directory, before transiting away from it or attempting to
automount upon it.

follow_down() is renamed follow_down_one() and should only be used where the
filesystem deliberately intends to avoid management steps (e.g. autofs).

A new follow_down() is added that incorporates the loop done by all other
callers of follow_down() (do_add/move_mount(), autofs and NFSD; whilst AFS, NFS
and CIFS do use it, their use is removed by converting them to use
d_automount()).  The new follow_down() calls d_manage() as appropriate.  It
also takes an extra parameter to indicate if it is being called from mount code
(with namespace_sem writelocked) which it passes to d_manage().  follow_down()
ignores automount points so that it can be used to mount on them.

__follow_mount_rcu() is made to abort rcu-walk mode if it hits a directory with
DCACHE_MANAGE_TRANSIT set on the basis that we're probably going to have to
sleep.  It would be possible to enter d_manage() in rcu-walk mode too, and have
that determine whether to abort or not itself.  That would allow the autofs
daemon to continue on in rcu-walk mode.

Note that DCACHE_MANAGE_TRANSIT on a directory should be cleared when it isn't
required as every tranist from that directory will cause d_manage() to be
invoked.  It can always be set again when necessary.

==========================
WHAT THIS MEANS FOR AUTOFS
==========================

Autofs currently uses the lookup() inode op and the d_revalidate() dentry op to
trigger the automounting of indirect mounts, and both of these can be called
with i_mutex held.

autofs knows that the i_mutex will be held by the caller in lookup(), and so
can drop it before invoking the daemon - but this isn't so for d_revalidate(),
since the lock is only held on _some_ of the code paths that call it.  This
means that autofs can't risk dropping i_mutex from its d_revalidate() function
before it calls the daemon.

The bug could manifest itself as, for example, a process that's trying to
validate an automount dentry that gets made to wait because that dentry is
expired and needs cleaning up:

	mkdir         S ffffffff8014e05a     0 32580  24956
	Call Trace:
	 [<ffffffff885371fd>] :autofs4:autofs4_wait+0x674/0x897
	 [<ffffffff80127f7d>] avc_has_perm+0x46/0x58
	 [<ffffffff8009fdcf>] autoremove_wake_function+0x0/0x2e
	 [<ffffffff88537be6>] :autofs4:autofs4_expire_wait+0x41/0x6b
	 [<ffffffff88535cfc>] :autofs4:autofs4_revalidate+0x91/0x149
	 [<ffffffff80036d96>] __lookup_hash+0xa0/0x12f
	 [<ffffffff80057a2f>] lookup_create+0x46/0x80
	 [<ffffffff800e6e31>] sys_mkdirat+0x56/0xe4

versus the automount daemon which wants to remove that dentry, but can't
because the normal process is holding the i_mutex lock:

	automount     D ffffffff8014e05a     0 32581      1              32561
	Call Trace:
	 [<ffffffff80063c3f>] __mutex_lock_slowpath+0x60/0x9b
	 [<ffffffff8000ccf1>] do_path_lookup+0x2ca/0x2f1
	 [<ffffffff80063c89>] .text.lock.mutex+0xf/0x14
	 [<ffffffff800e6d55>] do_rmdir+0x77/0xde
	 [<ffffffff8005d229>] tracesys+0x71/0xe0
	 [<ffffffff8005d28d>] tracesys+0xd5/0xe0

which means that the system is deadlocked.

This patch allows autofs to hold up normal processes whilst the daemon goes
ahead and does things to the dentry tree behind the automouter point without
risking a deadlock as almost no locks are held in d_manage() and none in
d_automount().

Signed-off-by: David Howells <dhowells@redhat.com>
Was-Acked-by: Ian Kent <raven@themaw.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-15 20:07:31 -05:00
David Howells
9875cf8064 Add a dentry op to handle automounting rather than abusing follow_link()
Add a dentry op (d_automount) to handle automounting directories rather than
abusing the follow_link() inode operation.  The operation is keyed off a new
dentry flag (DCACHE_NEED_AUTOMOUNT).

This also makes it easier to add an AT_ flag to suppress terminal segment
automount during pathwalk and removes the need for the kludge code in the
pathwalk algorithm to handle directories with follow_link() semantics.

The ->d_automount() dentry operation:

	struct vfsmount *(*d_automount)(struct path *mountpoint);

takes a pointer to the directory to be mounted upon, which is expected to
provide sufficient data to determine what should be mounted.  If successful, it
should return the vfsmount struct it creates (which it should also have added
to the namespace using do_add_mount() or similar).  If there's a collision with
another automount attempt, NULL should be returned.  If the directory specified
by the parameter should be used directly rather than being mounted upon,
-EISDIR should be returned.  In any other case, an error code should be
returned.

The ->d_automount() operation is called with no locks held and may sleep.  At
this point the pathwalk algorithm will be in ref-walk mode.

Within fs/namei.c itself, a new pathwalk subroutine (follow_automount()) is
added to handle mountpoints.  It will return -EREMOTE if the automount flag was
set, but no d_automount() op was supplied, -ELOOP if we've encountered too many
symlinks or mountpoints, -EISDIR if the walk point should be used without
mounting and 0 if successful.  The path will be updated to point to the mounted
filesystem if a successful automount took place.

__follow_mount() is replaced by follow_managed() which is more generic
(especially with the patch that adds ->d_manage()).  This handles transits from
directories during pathwalk, including automounting and skipping over
mountpoints (and holding processes with the next patch).

__follow_mount_rcu() will jump out of RCU-walk mode if it encounters an
automount point with nothing mounted on it.

follow_dotdot*() does not handle automounts as you don't want to trigger them
whilst following "..".

I've also extracted the mount/don't-mount logic from autofs4 and included it
here.  It makes the mount go ahead anyway if someone calls open() or creat(),
tries to traverse the directory, tries to chdir/chroot/etc. into the directory,
or sticks a '/' on the end of the pathname.  If they do a stat(), however,
they'll only trigger the automount if they didn't also say O_NOFOLLOW.

I've also added an inode flag (S_AUTOMOUNT) so that filesystems can mark their
inodes as automount points.  This flag is automatically propagated to the
dentry as DCACHE_NEED_AUTOMOUNT by __d_instantiate().  This saves NFS and could
save AFS a private flag bit apiece, but is not strictly necessary.  It would be
preferable to do the propagation in d_set_d_op(), but that doesn't normally
have access to the inode.

[AV: fixed breakage in case if __follow_mount_rcu() fails and nameidata_drop_rcu()
succeeds in RCU case of do_lookup(); we need to fall through to non-RCU case after
that, rather than just returning with ungrabbed *path]

Signed-off-by: David Howells <dhowells@redhat.com>
Was-Acked-by: Ian Kent <raven@themaw.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-15 20:05:03 -05:00
Al Viro
1a8edf40e7 do_lookup() fix
do_lookup() has a path leading from LOOKUP_RCU case to non-RCU
crossing of mountpoints, which breaks things badly.  If we
hit need_revalidate: and do nothing in there, we need to come
back into LOOKUP_RCU half of things, not to done: in non-RCU
one.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-15 20:03:39 -05:00
Linus Torvalds
7cb3920a65 Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
  xfs: prevent NMI timeouts in cmn_err
  xfs: Add log level to assertion printk
  xfs: fix an assignment within an ASSERT()
  xfs: fix error handling for synchronous writes
  xfs: add FITRIM support
  xfs: ensure log covering transactions are synchronous
  xfs: serialise unaligned direct IOs
  xfs: factor common write setup code
  xfs: split buffered IO write path from xfs_file_aio_write
  xfs: split direct IO write path from xfs_file_aio_write
  xfs: introduce xfs_rw_lock() helpers for locking the inode
  xfs: factor post-write newsize updates
  xfs: factor common post-write isize handling code
  xfs: ensure sync write errors are returned
2011-01-14 15:24:17 -08:00
Linus Torvalds
6ab8219649 Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  block: restore multiple bd_link_disk_holder() support
  block cfq: compensate preempted queue even if it has no slice assigned
  block cfq: make queue preempt work for queues from different workload
2011-01-14 13:32:07 -08:00
Linus Torvalds
6f7f7caab2 Turn d_set_d_op() BUG_ON() into WARN_ON_ONCE()
It's indicative of a real problem, and it actually triggers with
autofs4, but the BUG_ON() is excessive.  The autofs4 case is being fixed
(to only set d_op in the ->lookup method) but not merged yet.  In the
meantime this gets the code limping along.

Reported-by: Alex Elder <aelder@sgi.com>
Cc: Ian Kent <raven@themaw.net>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-14 13:26:18 -08:00
Linus Torvalds
18bce371ae Merge branch 'for-2.6.38' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.38' of git://linux-nfs.org/~bfields/linux: (62 commits)
  nfsd4: fix callback restarting
  nfsd: break lease on unlink, link, and rename
  nfsd4: break lease on nfsd setattr
  nfsd: don't support msnfs export option
  nfsd4: initialize cb_per_client
  nfsd4: allow restarting callbacks
  nfsd4: simplify nfsd4_cb_prepare
  nfsd4: give out delegations more quickly in 4.1 case
  nfsd4: add helper function to run callbacks
  nfsd4: make sure sequence flags are set after destroy_session
  nfsd4: re-probe callback on connection loss
  nfsd4: set sequence flag when backchannel is down
  nfsd4: keep finer-grained callback status
  rpc: allow xprt_class->setup to return a preexisting xprt
  rpc: keep backchannel xprt as long as server connection
  rpc: move sk_bc_xprt to svc_xprt
  nfsd4: allow backchannel recovery
  nfsd4: support BIND_CONN_TO_SESSION
  nfsd4: modify session list under cl_lock
  Documentation: fl_mylease no longer exists
  ...

Fix up conflicts in fs/nfsd/vfs.c with the vfs-scale work.  The
vfs-scale work touched some msnfs cases, and this merge removes support
for that entirely, so the conflict was trivial to resolve.
2011-01-14 13:17:26 -08:00
J. Bruce Fields
a8f2800b4f nfsd4: fix callback restarting
Ensure a new callback is added to the client's list of callbacks at most
once.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-01-14 14:51:31 -05:00
Jeff Layton
bd76331955 cifs: add cruid= mount option
In commit 3e4b3e1f we separated the "uid" mount option such that it
no longer determined the owner of the credential cache by default. When
we did this, we added a new option to cifs.upcall (--legacy-uid) to
try to make it so that it would behave the same was as it did before.

This ignored a rather important point -- the kernel has no way to know
what options are being passed to cifs.upcall, so it doesn't know what
uid it should use to determine whether to match an existing krb5 session.

The simplest solution is to simply add a new "cruid=" mount option that
only governs the uid owner of the credential cache for the mount.

Unfortunately, this means that the --legacy-uid option in cifs.upcall was
ill-considered and is now useless, but I don't see a better way to deal
with this.

A patch for the mount.cifs manpage will follow once this patch has been
accepted.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-01-14 18:51:11 +00:00
Jeff Layton
56c24305d1 cifs: cFYI the entire error code in map_smb_to_linux_error
We currently only print the DOS error part.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-01-14 18:51:11 +00:00
Tejun Heo
49731baa41 block: restore multiple bd_link_disk_holder() support
Commit e09b457b (block: simplify holder symlink handling) incorrectly
assumed that there is only one link at maximum.  dm may use multiple
links and expects block layer to track reference count for each link,
which is different from and unrelated to the exclusive device holder
identified by @holder when the device is opened.

Remove the single holder assumption and automatic removal of the link
and revive the per-link reference count tracking.  The code
essentially behaves the same as before commit e09b457b sans the
unnecessary kobject reference count dancing.

While at it, note that this facility should not be used by anyone else
than the current ones.  Sysfs symlinks shouldn't be abused like this
and the whole thing doesn't belong in the block layer at all.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Milan Broz <mbroz@redhat.com>
Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Neil Brown <neilb@suse.de>
Cc: linux-raid@vger.kernel.org
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-01-14 18:44:22 +01:00
Tejun Heo
0ad53eeefc afs: add afs_wq and use it instead of the system workqueue
flush_scheduled_work() is going away.  afs needs to make sure all the
works it has queued have finished before being unloaded and there can
be arbitrary number of pending works.  Add afs_wq and use it as the
flush domain instead of the system workqueue.

Also, convert cancel_delayed_work() + flush_scheduled_work() to
cancel_delayed_work_sync() in afs_mntpt_kill_timer().

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: linux-afs@lists.infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-14 09:25:11 -08:00
Akshat Aranya
ba28b93a52 FS-Cache: Fix operation handling
fscache_submit_exclusive_op() adds an operation to the pending list if
other operations are pending.  Fix the check for pending ops as n_ops
must be greater than 0 at the point it is checked as it is incremented
immediately before under lock.

Signed-off-by: Akshat Aranya <aranya@nec-labs.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-14 09:23:36 -08:00
Linus Torvalds
acda4721ae Merge branch 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin
* 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin:
  kernel: fix hlist_bl again
  cgroups: Fix a lockdep warning at cgroup removal
  fs: namei fix ->put_link on wrong inode in do_filp_open
2011-01-14 09:08:29 -08:00
Nick Piggin
7b9337aaf9 fs: namei fix ->put_link on wrong inode in do_filp_open
J. R. Okajima noticed that ->put_link is being attempted on the
wrong inode, and suggested the way to fix it. I changed it a bit
according to Al's suggestion to keep an explicit link path around.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-14 08:42:43 +00:00
Linus Torvalds
db9effe99a Merge branch 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin
* 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin:
  fs: fix do_last error case when need_reval_dot
  nfs: add missing rcu-walk check
  fs: hlist UP debug fixup
  fs: fix dropping of rcu-walk from force_reval_path
  fs: force_reval_path drop rcu-walk before d_invalidate
  fs: small rcu-walk documentation fixes

Fixed up trivial conflicts in Documentation/filesystems/porting
2011-01-13 20:14:13 -08:00
J. R. Okajima
f20877d94a fs: fix do_last error case when need_reval_dot
When open(2) without O_DIRECTORY opens an existing dir, it should return
EISDIR. In do_last(), the variable 'error' is initialized EISDIR, but it
is changed by d_revalidate() which returns any positive to represent
'the target dir is valid.'

Should we keep and return the initialized 'error' in this case.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-14 03:56:04 +00:00
Nick Piggin
657e94b673 nfs: add missing rcu-walk check
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-14 02:48:39 +00:00
Nick Piggin
90dbb77ba4 fs: fix dropping of rcu-walk from force_reval_path
As J. R. Okajima noted, force_reval_path passes in the same dentry to
d_revalidate as the one in the nameidata structure (other callers pass in a
child), so the locking breaks. This can oops with a chrooted nfs mount, for
example. Similarly there can be other problems with revalidating a dentry
which is already in nameidata of the path walk.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-14 02:36:19 +00:00
Nick Piggin
bb20c18db6 fs: force_reval_path drop rcu-walk before d_invalidate
d_revalidate can return in rcu-walk mode even when it returns 0.  We can't just
call any old dcache function on rcu-walk dentry (the dentry is unstable, so
even through d_lock can safely be taken, the result may no longer be what we
expect -- careful re-checks would be required). So just drop rcu in this case.

(I missed this conversion when switching to the rcu-walk convention that Linus
suggested)

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-14 02:35:53 +00:00
J. Bruce Fields
4795bb37ef nfsd: break lease on unlink, link, and rename
Any change to any of the links pointing to an entry should also break
delegations.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-01-13 21:04:09 -05:00
J. Bruce Fields
6a76bebefe nfsd4: break lease on nfsd setattr
Leases (delegations) should really be broken on any metadata change, not
just on size change.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-01-13 21:04:08 -05:00
J. Bruce Fields
9ce137eee4 nfsd: don't support msnfs export option
We've long had these pointless #ifdef MSNFS's sprinkled throughout the
code--pointless because MSNFS is always defined (and we give no config
option to make that easy to change).  So we could just remove the
ifdef's and compile the resulting code unconditionally.

But as long as we're there: why not just rip out this code entirely?
The only purpose is to implement the "msnfs" export option which turns
on Windows-like behavior in some cases, and:

	- the export option isn't documented anywhere;
	- the userland utilities (which would need to be able to parse
	  "msnfs" in an export file) don't support it;
	- I don't know how to maintain this, as I don't know what the
	  proper behavior is; and
	- google shows no evidence that anyone has ever used this.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-01-13 21:04:07 -05:00
J. Bruce Fields
9ee1ba5402 nfsd4: initialize cb_per_client
Otherwise a callback that is aborted before it runs will result in a
list_del on an uninitialized list head.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-01-13 21:04:06 -05:00
Stefan Hajnoczi
cb9ef8d5e3 fs/fs-writeback.c: fix sync_inodes_sb() return value kernel-doc
The sync_inodes_sb() function does not have a return value.  Remove the
outdated documentation comment.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 17:32:48 -08:00
Andrea Arcangeli
5f24ce5fd3 thp: remove PG_buddy
PG_buddy can be converted to _mapcount == -2.  So the PG_compound_lock can
be added to page->flags without overflowing (because of the sparse section
bits increasing) with CONFIG_X86_PAE=y and CONFIG_X86_PAT=y.  This also
has to move the memory hotplug code from _mapcount to lru.next to avoid
any risk of clashes.  We can't use lru.next for PG_buddy removal, but
memory hotplug can use lru.next even more easily than the mapcount
instead.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 17:32:43 -08:00
Andrea Arcangeli
79134171df thp: transparent hugepage vmstat
Add hugepage stat information to /proc/vmstat and /proc/meminfo.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 17:32:43 -08:00
Mandeep Singh Baines
dabb16f639 oom: allow a non-CAP_SYS_RESOURCE proces to oom_score_adj down
We'd like to be able to oom_score_adj a process up/down as it
enters/leaves the foreground.  Currently, it is not possible to oom_adj
down without CAP_SYS_RESOURCE.  This patch allows a task to decrease its
oom_score_adj back to the value that a CAP_SYS_RESOURCE thread set it to
or its inherited value at fork.  Assuming the thread that has forked it
has oom_score_adj of 0, each process could decrease it back from 0 upon
activation unless a CAP_SYS_RESOURCE thread elevated it to something
higher.

Alternative considered:

* a setuid binary
* a daemon with CAP_SYS_RESOURCE

Since you don't wan't all processes to be able to reduce their oom_adj, a
setuid or daemon implementation would be complex.  The alternatives also
have much higher overhead.

This patch updated from original patch based on feedback from David
Rientjes.

Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Ying Han <yinghan@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 17:32:35 -08:00
Nikanth Karthikesan
2d90508f63 mm: smaps: export mlock information
Currently there is no way to find whether a process has locked its pages
in memory or not.  And which of the memory regions are locked in memory.

Add a new field "Locked" to export this information via the smaps file.

Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 17:32:33 -08:00
Hai Shan
c32b0d4b3f fs/mpage.c: consolidate code
Merge mpage_end_io_read() and mpage_end_io_write() into mpage_end_io() to
eliminate code duplication.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Hai Shan <shan.hai@windriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 17:32:32 -08:00
Andrew Morton
c691b9d983 sync_inode_metadata: fix comment
Use correct function name, remove incorrect apostrophe

Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 17:32:32 -08:00
Jan Kara
b9543dac5b writeback: avoid livelocking WB_SYNC_ALL writeback
When wb_writeback() is called in WB_SYNC_ALL mode, work->nr_to_write is
usually set to LONG_MAX.  The logic in wb_writeback() then calls
__writeback_inodes_sb() with nr_to_write == MAX_WRITEBACK_PAGES and we
easily end up with non-positive nr_to_write after the function returns, if
the inode has more than MAX_WRITEBACK_PAGES dirty pages at the moment.

When nr_to_write is <= 0 wb_writeback() decides we need another round of
writeback but this is wrong in some cases!  For example when a single
large file is continuously dirtied, we would never finish syncing it
because each pass would be able to write MAX_WRITEBACK_PAGES and inode
dirty timestamp never gets updated (as inode is never completely clean).
Thus __writeback_inodes_sb() would write the redirtied inode again and
again.

Fix the issue by setting nr_to_write to LONG_MAX in WB_SYNC_ALL mode.  We
do not need nr_to_write in WB_SYNC_ALL mode anyway since
write_cache_pages() does livelock avoidance using page tagging in
WB_SYNC_ALL mode.

This makes wb_writeback() call __writeback_inodes_sb() only once on
WB_SYNC_ALL.  The latter function won't livelock because it works on

- a finite set of files by doing queue_io() once at the beginning
- a finite set of pages by PAGECACHE_TAG_TOWRITE page tagging

After this patch, program from http://lkml.org/lkml/2010/10/24/154 is no
longer able to stall sync forever.

[fengguang.wu@intel.com: fix locking comment]
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Engelhardt <jengelh@medozas.de>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 17:32:32 -08:00
Jan Kara
aa373cf550 writeback: stop background/kupdate works from livelocking other works
Background writeback is easily livelockable in a loop in wb_writeback() by
a process continuously re-dirtying pages (or continuously appending to a
file).  This is in fact intended as the target of background writeback is
to write dirty pages it can find as long as we are over
dirty_background_threshold.

But the above behavior gets inconvenient at times because no other work
queued in the flusher thread's queue gets processed.  In particular, since
e.g.  sync(1) relies on flusher thread to do all the IO for it, sync(1)
can hang forever waiting for flusher thread to do the work.

Generally, when a flusher thread has some work queued, someone submitted
the work to achieve a goal more specific than what background writeback
does.  Moreover by working on the specific work, we also reduce amount of
dirty pages which is exactly the target of background writeout.  So it
makes sense to give specific work a priority over a generic page cleaning.

Thus we interrupt background writeback if there is some other work to do.
We return to the background writeback after completing all the queued
work.

This may delay the writeback of expired inodes for a while, however the
expired inodes will eventually be flushed to disk as long as the other
works won't livelock.

[fengguang.wu@intel.com: update comment]
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Engelhardt <jengelh@medozas.de>
Cc: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 17:32:32 -08:00
Wu Fengguang
71927e84e0 writeback: trace wakeup event for background writeback
This tracks when balance_dirty_pages() tries to wakeup the flusher thread
for background writeback (if it was not started already).

Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Engelhardt <jengelh@medozas.de>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 17:32:32 -08:00
Jan Kara
6585027a5e writeback: integrated background writeback work
Check whether background writeback is needed after finishing each work.

When bdi flusher thread finishes doing some work check whether any kind of
background writeback needs to be done (either because
dirty_background_ratio is exceeded or because we need to start flushing
old inodes).  If so, just do background write back.

This way, bdi_start_background_writeback() just needs to wake up the
flusher thread.  It will do background writeback as soon as there is no
other work.

This is a preparatory patch for the next patch which stops background
writeback as soon as there is other work to do.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Engelhardt <jengelh@medozas.de>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 17:32:32 -08:00
Linus Torvalds
6254b32b57 ecryptfs: fix broken build
Stephen Rothwell reports that the vfs merge broke the build of ecryptfs.
The breakage comes from commit 66cb76666d ("sanitize ecryptfs
->mount()") which was obviously not even build tested. Tssk, tssk, Al.

This is the minimal build fixup for the situation, although I don't have
a filesystem to actually test it with.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 17:19:38 -08:00
Sage Weil
17db143fc0 ceph: fix xattr rbtree search
Fix xattr name comparison in rbtree search for strings that share a prefix.
The *name argument is null terminated, but the xattr name is not, so we
need to use strncmp, but that means adjusting for the case where name is
a prefix of xattr->name.

The corresponding case in __set_xattr() already handles this properly
(although in that case *name is also not null terminated).

Reported-by: Sergiy Kibrik <sakib@meta.ua>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-01-13 15:50:11 -08:00
Yehuda Sadeh
1c1266bb91 ceph: fix getattr on directory when using norbytes
The norbytes mount option was broken, and when doing getattr
on a directory it return the rbytes instead of the number of
entities. This commit fixes it.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-01-13 15:50:06 -08:00
Phillip Lougher
01a678c5a2 Squashfs: simplify CONFIG_SQUASHFS_LZO handling
Get rid of messy repeated #if(n)def CONFIG_SQUASHFS_LZO code
in decompressor.c

Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
2011-01-13 21:38:46 +00:00
Phillip Lougher
8fcd97216f Squashfs: move squashfs_i() definition from squashfs.h
Move squashfs_i() definition out of squashfs.h, this eliminates
the need to #include squashfs_fs_i.h from numerous files.

Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
2011-01-13 21:24:15 +00:00
Phillip Lougher
6197fd8678 Squashfs: get rid of default n in Kconfig
As pointed out by Geert Uytterhoeven, "default n" is the default,
no reason to specify it.

Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
2011-01-13 21:21:52 +00:00
Phillip Lougher
e7ee11f0ec Squashfs: add missing check in zlib_wrapper
On file system corruption zlib can return Z_STREAM_OK with
input buffers remaining, which will not be released.

Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
2011-01-13 21:21:00 +00:00
Phillip Lougher
170cf02165 Squashfs: remove unnecessary variable in zlib_wrapper
Get rid of unnecessary bytes variable, and remove redundant
initialisation of zlib_err.

Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
2011-01-13 21:20:52 +00:00
Phillip Lougher
7a43ae5237 Squashfs: Add XZ compression configuration option
Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
2011-01-13 21:16:52 +00:00
Phillip Lougher
81bb8debd0 Squashfs: add XZ compression support
Add support for reading file systems compressed with the
XZ compression algorithm.

This patch adds the XZ decompressor wrapper code.

Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
2011-01-13 20:51:20 +00:00
Trond Myklebust
8a0eebf66e NFS: Fix NFSv3 exclusive open semantics
Commit c0204fd2b8 (NFS: Clean up
nfs4_proc_create()) broke NFSv3 exclusive open by removing the code
that passes the O_EXCL flag down to nfs3_proc_create(). This patch
reverts that offending hunk from the original commit.

Reported-by: Nick Bowler <nbowler@elliptictech.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org    [2.6.37]
Tested-by: Nick Bowler <nbowler@elliptictech.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 12:06:29 -08:00
Linus Torvalds
275220f0fc Merge branch 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block
* 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block: (43 commits)
  block: ensure that completion error gets properly traced
  blktrace: add missing probe argument to block_bio_complete
  block cfq: don't use atomic_t for cfq_group
  block cfq: don't use atomic_t for cfq_queue
  block: trace event block fix unassigned field
  block: add internal hd part table references
  block: fix accounting bug on cross partition merges
  kref: add kref_test_and_get
  bio-integrity: mark kintegrityd_wq highpri and CPU intensive
  block: make kblockd_workqueue smarter
  Revert "sd: implement sd_check_events()"
  block: Clean up exit_io_context() source code.
  Fix compile warnings due to missing removal of a 'ret' variable
  fs/block: type signature of major_to_index(int) to major_to_index(unsigned)
  block: convert !IS_ERR(p) && p to !IS_ERR_NOR_NULL(p)
  cfq-iosched: don't check cfqg in choose_service_tree()
  fs/splice: Pull buf->ops->confirm() from splice_from_pipe actors
  cdrom: export cdrom_check_events()
  sd: implement sd_check_events()
  sr: implement sr_check_events()
  ...
2011-01-13 10:45:01 -08:00
Linus Torvalds
b2034d474b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (41 commits)
  fs: add documentation on fallocate hole punching
  Gfs2: fail if we try to use hole punch
  Btrfs: fail if we try to use hole punch
  Ext4: fail if we try to use hole punch
  Ocfs2: handle hole punching via fallocate properly
  XFS: handle hole punching via fallocate properly
  fs: add hole punching to fallocate
  vfs: pass struct file to do_truncate on O_TRUNC opens (try #2)
  fix signedness mess in rw_verify_area() on 64bit architectures
  fs: fix kernel-doc for dcache::prepend_path
  fs: fix kernel-doc for dcache::d_validate
  sanitize ecryptfs ->mount()
  switch afs
  move internal-only parts of ncpfs headers to fs/ncpfs
  switch ncpfs
  switch 9p
  pass default dentry_operations to mount_pseudo()
  switch hostfs
  switch affs
  switch configfs
  ...
2011-01-13 10:27:28 -08:00
Linus Torvalds
a170315420 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  rbd: fix cleanup when trying to mount inexistent image
  net/ceph: make ceph_msgr_wq non-reentrant
  ceph: fsc->*_wq's aren't used in memory reclaim path
  ceph: Always free allocated memory in osdmap_decode()
  ceph: Makefile: Remove unnessary code
  ceph: associate requests with opening sessions
  ceph: drop redundant r_mds field
  ceph: implement DIRLAYOUTHASH feature to get dir layout from MDS
  ceph: add dir_layout to inode
2011-01-13 10:25:24 -08:00
Linus Torvalds
008d23e485 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits)
  Documentation/trace/events.txt: Remove obsolete sched_signal_send.
  writeback: fix global_dirty_limits comment runtime -> real-time
  ppc: fix comment typo singal -> signal
  drivers: fix comment typo diable -> disable.
  m68k: fix comment typo diable -> disable.
  wireless: comment typo fix diable -> disable.
  media: comment typo fix diable -> disable.
  remove doc for obsolete dynamic-printk kernel-parameter
  remove extraneous 'is' from Documentation/iostats.txt
  Fix spelling milisec -> ms in snd_ps3 module parameter description
  Fix spelling mistakes in comments
  Revert conflicting V4L changes
  i7core_edac: fix typos in comments
  mm/rmap.c: fix comment
  sound, ca0106: Fix assignment to 'channel'.
  hrtimer: fix a typo in comment
  init/Kconfig: fix typo
  anon_inodes: fix wrong function name in comment
  fix comment typos concerning "consistent"
  poll: fix a typo in comment
  ...

Fix up trivial conflicts in:
 - drivers/net/wireless/iwlwifi/iwl-core.c (moved to iwl-legacy.c)
 - fs/ext4/ext4.h

Also fix missed 'diabled' typo in drivers/net/bnx2x/bnx2x.h while at it.
2011-01-13 10:05:56 -08:00
Stefani Seibold
6f772fe65c cramfs: generate unique inode number for better inode cache usage
Generate a unique inode numbers for any entries in the cram file system.
For files which did not contain data's (device nodes, fifos and sockets)
the offset of the directory entry inside the cramfs plus 1 will be used as
inode number.

The + 1 for the inode will it make possible to distinguish between a file
which contains no data and files which has data, the later one has a inode
value where the lower two bits are always 0.

It also reimplements the behavior to set the size and the number of block
to 0 for special file, which is the right value for empty files, devices,
fifos and sockets

As a little benefit it will be also more compatible which older mkcramfs,
because it will never use the cramfs_inode->offset for creating a inode
number for special files.

[akpm@linux-foundation.org: trivial comment fix]
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Stefani Seibold <stefani@seibold.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 08:03:23 -08:00
Jeff Moyer
d3486f8b9e aio: remove unused aio_run_iocbs()
aio_run_iocbs() is not used at all, so get rid of it.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 08:03:22 -08:00
Namhyung Kim
2e41025598 aio: remove unnecessary check
'nr >= min_nr >= 0' always satisfies 'nr >= 0' so the check is unnecesary.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 08:03:22 -08:00
Namhyung Kim
e6d7202b66 fs/char_dev.c: remove unused cdev_index()
Commit 66fa12c571 ("ieee1394: remove the old IEEE 1394 driver stack")
eliminated the only user of cdev_index().  So it can be removed too.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 08:03:17 -08:00
Dave Anderson
ceff1a7709 /proc/kcore: fix seeking
Commit 34aacb2920 ("procfs: Use generic_file_llseek in /proc/kcore") broke
seeking on /proc/kcore.  This changes it back to use default_llseek in
order to restore the original behavior.

The problem with generic_file_llseek is that it only allows seeks up to
inode->i_sb->s_maxbytes, which is 2GB-1 on procfs, where the memory file
offset values in the /proc/kcore PT_LOAD segments may exceed or start
beyond that offset value.

A similar revert was made for /proc/vmcore.

Signed-off-by: Dave Anderson <anderson@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 08:03:17 -08:00
Alexey Dobriyan
bf33cbdf8a proc: move proc_console.c to fs/proc/consoles.c
Filename is supposed to match procfile name for random junk.

Add __init while I'm at it.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 08:03:17 -08:00
Alexey Dobriyan
3740a20c4f proc: less LOCK/UNLOCK in remove_proc_entry()
For the common case where a proc entry is being removed and nobody is in
the process of using it, save a LOCK/UNLOCK pair.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 08:03:17 -08:00
Petr Holasek
a6fc86d2b4 kpagecount: add slab page checking because _mapcount is in a union
Add a PageSlab() check before adding the _mapcount value to /kpagecount.
page->_mapcount is in a union with the SLAB structure so for pages
controlled by SLAB, page_mapcount() returns nonsense.

Signed-off-by: Petr Holasek <pholasek@redhat.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 08:03:17 -08:00
Jovi Zhang
c6a3405846 proc: use single_open() correctly
single_open()'s third argument is for copying into seq_file->private.  Use
that, rather than open-coding it.

Signed-off-by: Jovi Zhang <bookjovi@gmail.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 08:03:16 -08:00
Alexey Dobriyan
6d1b6e4eff proc: ->low_ino cleanup
- ->low_ino is write-once field -- reading it under locks is unnecessary.

- /proc/$PID stuff never reaches pde_put()/free_proc_entry() --
   PROC_DYNAMIC_FIRST check never triggers.

- in proc_get_inode(), inode number always matches proc dir entry, so
  save one parameter.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 08:03:16 -08:00
Alexey Dobriyan
9d6de12f70 proc: use seq_puts()/seq_putc() where possible
For string without format specifiers, use seq_puts().
For seq_printf("\n"), use seq_putc('\n').

   text	   data	    bss	    dec	    hex	filename
  61866	    488	    112	  62466	   f402	fs/proc/proc.o
  61729	    488	    112	  62329	   f379	fs/proc/proc.o
  ----------------------------------------------------
  			   -139

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 08:03:16 -08:00
Alexey Dobriyan
a2ade7b6ca proc: use unsigned long inside /proc/*/statm
/proc/*/statm code needlessly truncates data from unsigned long to int.
One needs only 8+ TB of RAM to make truncation visible.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 08:03:16 -08:00
Joe Perches
34e49d4f63 fs/proc/base.c, kernel/latencytop.c: convert sprintf_symbol() to %ps
Use temporary lr for struct latency_record for improved readability and
fewer columns used.  Removed trailing space from output.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Jiri Kosina <trivial@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 08:03:16 -08:00
Jesper Juhl
566538a6cf reiserfs: make sure va_end() is always called after va_start().
A call to va_start() must always be followed by a call to va_end() in the
same function.  In fs/reiserfs/prints.c::print_block() this is not always
the case.  If 'bh' is NULL we'll return without calling va_end().

One could add a call to va_end() before the 'return' statement, but it's
nicer to just move the call to va_start() after the test for 'bh' being
NULL.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Acked-by: Edward Shishkin <edward.shishkin@gmail.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 08:03:15 -08:00
Jesper Juhl
e0e3d32bb4 befs: don't pass huge structs by value
'struct befs_disk_data_stream' is huge (~144 bytes) and it's being passed
by value in fs/befs/endian.h::cpu_to_fsrun().

It would be better to pass a pointer.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Cc: Will Dyson <will_dyson@pobox.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 08:03:15 -08:00
Davide Libenzi
e462c448fd pipe: use event aware wakeups
Send the events the wakeup refers to, so that epoll, and even the new poll
code in fs/select.c can avoid wakeups if the events do not match the
requested set.

Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 08:03:15 -08:00
Mikael Pettersson
f670d0ecda binfmt_elf: cleanups
This cleans up a few bits in binfmt_elf.c and binfmts.h:

- the hasvdso field in struct linux_binfmt is unused, so remove it and
  the only initialization of it

- the elf_map CPP symbol is not defined anywhere in the kernel, so
  remove an unnecessary #ifndef elf_map

- reduce excessive indentation in elf_format's initializer

- add missing spaces, remove extraneous spaces

No functional changes, but tested on x86 (32 and 64 bit), powerpc (32 and
64 bit), sparc64, arm, and alpha.

Signed-off-by: Mikael Pettersson <mikpe@it.uu.se>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 08:03:12 -08:00
Robin Holt
52bd19f769 epoll: convert max_user_watches to long
On a 16TB machine, max_user_watches has an integer overflow.  Convert it
to use a long and handle the associated fallout.

Signed-off-by: Robin Holt <holt@sgi.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 08:03:12 -08:00
Vasiliy Kulikov
65329bf46b fs/select.c: fix information leak to userspace
On some architectures __kernel_suseconds_t is int.  On these archs struct
timeval has padding bytes at the end.  This struct is copied to userspace
with these padding bytes uninitialized.  This leads to leaking of contents
of kernel stack memory.

This bug was added with v2.6.27-rc5-286-gb773ad4.

[akpm@linux-foundation.org: avoid the memset on architectures which don't need it]
Signed-off-by: Vasiliy Kulikov <segooon@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 08:03:12 -08:00
Andrew Morton
6db26ffc91 fs/ext4/inode.c: use pr_warn_ratelimited()
pr_warning_ratelimited() doesn't exist.

Also include printk.h, which defines these things.

Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 08:03:05 -08:00
Jens Axboe
81c5e2ae33 Merge branch 'for-2.6.38/event-handling' into for-2.6.38/core 2011-01-13 14:47:54 +01:00
Josef Bacik
9ecf639a96 Gfs2: fail if we try to use hole punch
Gfs2 doesn't have the ability to punch holes yet, so make sure we return
EOPNOTSUPP if we try to use hole punching through fallocate.  This support can
be added later.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-12 20:16:44 -05:00
Josef Bacik
23a8519b55 Btrfs: fail if we try to use hole punch
Btrfs doesn't have the ability to punch holes yet, so make sure we return
EOPNOTSUPP if we try to use hole punching through fallocate.  This support can
be added later.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-12 20:16:44 -05:00
Josef Bacik
d6dc8462f4 Ext4: fail if we try to use hole punch
Ext4 doesn't have the ability to punch holes yet, so make sure we return
EOPNOTSUPP if we try to use hole punching through fallocate.  This support can
be added later.  Thanks,

Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-12 20:16:44 -05:00
Josef Bacik
db47fef2cd Ocfs2: handle hole punching via fallocate properly
This patch just makes ocfs2 use its UNRESERVP ioctl when we get the hole punch
flag in fallocate.  I didn't test it, but it seems simple enough.  Thanks,

Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-12 20:16:43 -05:00
Josef Bacik
c25d246715 XFS: handle hole punching via fallocate properly
This patch simply allows XFS to handle the hole punching flag in fallocate
properly.  I've tested this with a little program that does a bunch of random
hole punching with FL_KEEP_SIZE and without it to make sure it does the right
thing.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-12 20:16:43 -05:00
Josef Bacik
79124f18b3 fs: add hole punching to fallocate
Hole punching has already been implemented by XFS and OCFS2, and has the
potential to be implemented on both BTRFS and EXT4 so we need a generic way to
get to this feature.  The simplest way in my mind is to add FALLOC_FL_PUNCH_HOLE
to fallocate() since it already looks like the normal fallocate() operation.
I've tested this patch with XFS and BTRFS to make sure XFS did what it's
supposed to do and that BTRFS failed like it was supposed to.  Thank you,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-12 20:16:43 -05:00
Jeff Layton
e1181ee657 vfs: pass struct file to do_truncate on O_TRUNC opens (try #2)
When a file is opened with O_TRUNC, the truncate processing is handled
by handle_truncate(). This function however doesn't receive any info
about the newly instantiated filp, and therefore can't pass that info
along so that the setattr can use it.

This makes NFSv4 misbehave. The client does an open and gets a valid
stateid, and then doesn't use that stateid on the subsequent truncate.
It uses the zero-stateid instead. Most servers ignore this fact and
just do the truncate anyway, but some don't like it (notably, RHEL4).

It seems more correct that since we have a fully instantiated file at
the time that handle_truncate is called, that we pass that along so
that the truncate operation can properly use it.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-12 20:06:59 -05:00
Al Viro
cccb5a1e69 fix signedness mess in rw_verify_area() on 64bit architectures
... and clean the unsigned-f_pos code, while we are at it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-12 20:06:58 -05:00
Randy Dunlap
208898c17a fs: fix kernel-doc for dcache::prepend_path
Fix function kernel-doc warning for prepend_path():

Warning(fs/dcache.c:1924): missing initial short description on line:

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-12 20:06:57 -05:00
Randy Dunlap
1c977540fd fs: fix kernel-doc for dcache::d_validate
Fix function parameter kernel-doc for d_validate():

Warning(fs/dcache.c:1495): No description found for parameter 'parent'
Warning(fs/dcache.c:1495): Excess function parameter 'dparent' description in 'd_validate'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-12 20:06:55 -05:00
Al Viro
66cb76666d sanitize ecryptfs ->mount()
kill ecryptfs_read_super(), reorder code allowing to use
normal d_alloc_root() instead of opencoding it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-12 20:04:37 -05:00
Al Viro
d61dcce297 switch afs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-12 20:04:20 -05:00
Al Viro
32c419d95f move internal-only parts of ncpfs headers to fs/ncpfs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-12 20:03:43 -05:00
Al Viro
0378c4051a switch ncpfs
merge dentry_operations for root and non-root

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-12 20:03:43 -05:00
Al Viro
98cd3fb0a2 switch 9p
here we actually *want* ->d_op for root; setting it allows to get rid
of kludge in v9fs_kill_super() since now we have proper ->d_release()
for root and don't need to call it manually.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-12 20:03:43 -05:00
Al Viro
c74a1cbb3c pass default dentry_operations to mount_pseudo()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-12 20:03:43 -05:00
Al Viro
f772c4a6a3 switch hostfs
->d_delete() doesn't matter for s_root anyway

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-12 20:03:42 -05:00
Al Viro
a129880daf switch affs
either d_op instance would work for root, actually...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-12 20:03:42 -05:00
Al Viro
d463a0c4b5 switch configfs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-12 20:03:12 -05:00
Al Viro
31a203df9c take coda-private headers out of include/linux
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-12 20:02:48 -05:00
Al Viro
9501e4c48e switch coda
Coda ->d_revalidate() actually checks for root, ->d_delete() is irrelevant.
So we can use the same d_op for all coda dentries

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-12 20:02:48 -05:00
Al Viro
43d344d772 switch hpfs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-12 20:02:47 -05:00
Al Viro
af53d29ac1 switch btrfs, close races
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-12 20:02:47 -05:00
Al Viro
ba87167c06 switch ocfs2, close races
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-12 20:02:46 -05:00
Al Viro
41ced6dcf3 switch gfs2, close races
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-12 20:02:46 -05:00
Al Viro
1c929cfe6d switch cifs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-12 20:02:46 -05:00
Al Viro
8b244ff2fa switch nfs to ->s_d_op
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-12 20:02:45 -05:00
Al Viro
96e1391414 switch adfs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-12 20:02:45 -05:00
Al Viro
eddf790bd4 switch hfsplus
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-12 20:02:45 -05:00
Al Viro
518c79d28e switch hfs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-12 20:02:45 -05:00
Al Viro
c6cb412366 minixfs: kill dead code
->d_op of root stays NULL these days on minixfs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-12 20:02:44 -05:00
Al Viro
30304aba6a switch sysv
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-12 20:02:44 -05:00
Al Viro
c35eebe993 switch fuse
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-12 20:02:44 -05:00
Al Viro
94b77bd86f switch jfs to ->s_d_op, close exportfs races
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-12 20:02:43 -05:00
Al Viro
3d23985d6c switch fat to ->s_d_op, close exportfs races there
don't bother with lock_super() in fat_fill_super() callers, while
we are at it - there won't be any concurrency anyway.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-12 20:02:43 -05:00
Al Viro
6cc9c1d2c1 fix isofs d_op handling
switch to ->s_d_op; d_obtain_alias() will DTRT now

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-12 20:02:43 -05:00
Al Viro
c8aebb0c9f per-superblock default ->d_op
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-12 20:02:34 -05:00
Tejun Heo
01e6acc4ea ceph: fsc->*_wq's aren't used in memory reclaim path
fsc->*_wq's aren't depended upon during memory reclaim.  Convert to
alloc_workqueue() w/o WQ_MEM_RECLAIM.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Sage Weil <sage@newdream.net>
Cc: ceph-devel@vger.kernel.org
Signed-off-by: Sage Weil <sage@newdream.net>
2011-01-12 15:15:14 -08:00
Tracey Dent
582c86e690 ceph: Makefile: Remove unnessary code
Remove the if and else conditional because the code is in mainline and there
is no need in it being there.

Also, Changed Makefile to use <modules>-y instead of <modules>-objs
because -objs is deprecated and not mentioned in
 Documentation/kbuild/makefiles.txt.

Signed-off-by: Tracey Dent <tdent48227@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-01-12 15:15:13 -08:00
Sage Weil
dc69e2e9fc ceph: associate requests with opening sessions
Associate request with sessions that aren't yep open.  This makes the
debugfs mdsc request list more informative.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-01-12 15:15:13 -08:00
Sage Weil
4af25fdda6 ceph: drop redundant r_mds field
The r_mds field is redundant, since we can find the same information at
r_session->s_mds, and when r_session is NULL then r_mds is meaningless.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-01-12 15:15:13 -08:00
Sage Weil
14303d20f3 ceph: implement DIRLAYOUTHASH feature to get dir layout from MDS
This implements the DIRLAYOUTHASH protocol feature, which passes the dir
layout over the wire from the MDS.  This gives the client knowledge
of the correct hash function to use for mapping dentries among dir
fragments.

Note that if this feature is _not_ present on the client but is on the
MDS, the client may misdirect requests.  This will result in a forward
and degrade performance.  It may also result in inaccurate NFS filehandle
generation, which will prevent fh resolution when the inode is not present
in the client cache and the parent directories have been fragmented.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-01-12 15:15:13 -08:00
Sage Weil
6c0f3af72c ceph: add dir_layout to inode
Add a ceph_dir_layout to the inode, and calculate dentry hash values based
on the parent directory's specified dir_hash function.  This is needed
because the old default Linux dcache hash function is extremely week and
leads to a poor distribution of files among dir fragments.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-01-12 15:15:12 -08:00
Jan Kara
f00c9e44ad quota: Fix deadlock during path resolution
As Al Viro pointed out path resolution during Q_QUOTAON calls to quotactl
is prone to deadlocks. We hold s_umount semaphore for reading during the
path resolution and resolution itself may need to acquire the semaphore
for writing when e. g. autofs mountpoint is passed.

Solve the problem by performing the resolution before we get hold of the
superblock (and thus s_umount semaphore). The whole thing is complicated
by the fact that some filesystems (OCFS2) ignore the path argument. So to
distinguish between filesystem which want the path and which do not we
introduce new .quota_on_meta callback which does not get the path. OCFS2
then uses this callback instead of old .quota_on.

CC: Al Viro <viro@ZenIV.linux.org.uk>
CC: Christoph Hellwig <hch@lst.de>
CC: Ted Ts'o <tytso@mit.edu>
CC: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2011-01-12 19:14:55 +01:00
Anton Altaparmakov
2818ef50c4 NTFS: writev() fix and maintenance/contact details update
Fix writev() to not keep writing the first segment over and over again
instead of moving onto subsequent segments and update the NTFS entry in
MAINTAINERS to reflect that Tuxera Inc. now supports the NTFS driver.

Signed-off-by: Anton Altaparmakov <anton@tuxera.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-12 08:35:53 -08:00
Dave Chinner
73efe4a4dd xfs: prevent NMI timeouts in cmn_err
We currently have a global error message buffer in cmn_err that is
protected by a spin lock that disables interrupts.  Recently there
have been reports of NMI timeouts occurring when the console is
being flooded by SCSI error reports due to cmn_err() getting stuck
trying to print to the console while holding this lock (i.e. with
interrupts disabled). The NMI watchdog is seeing this CPU as
non-responding and so is triggering a panic.  While the trigger for
the reported case is SCSI errors, pretty much anything that spams
the kernel log could cause this to occur.

Realistically the only reason that we have the intemediate message
buffer is to prepend the correct kernel log level prefix to the log
message. The only reason we have the lock is to protect the global
message buffer and the only reason the message buffer is global is
to keep it off the stack. Hence if we can avoid needing a global
message buffer we avoid needing the lock, and we can do this with a
small amount of cleanup and some preprocessor tricks:

	1. clean up xfs_cmn_err() panic mask functionality to avoid
	   needing debug code in xfs_cmn_err()
	2. remove the couple of "!" message prefixes that still exist that
	   the existing cmn_err() code steps over.
	3. redefine CE_* levels directly to KERN_*
	4. redefine cmn_err() and friends to use printk() directly
	   via variable argument length macros.

By doing this, we can completely remove the cmn_err() code and the
lock that is causing the problems, and rely solely on printk()
serialisation to ensure that we don't get garbled messages.

A series of followup patches is really needed to clean up all the
cmn_err() calls and related messages properly, but that results in a
series that is not easily back portable to enterprise kernels. Hence
this initial fix is only to address the direct problem in the lowest
impact way possible.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-01-12 08:46:41 -06:00
Anton Blanchard
65a84a0f75 xfs: Add log level to assertion printk
I received a ppc64 bug report involving xfs but the assertion was
filtered out by the console log level. Use KERN_CRIT to ensure it
makes it out.

Signed-off-by: Anton Blanchard <anton@samba.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-01-11 22:29:46 -06:00
Jesper Juhl
1884bd8354 xfs: fix an assignment within an ASSERT()
In fs/xfs/xfs_trans.c::xfs_trans_unreserve_and_mod_sb() at the out:
label we have this:
	ASSERT(error = 0);
I believe a comparison was intended, not an assignment. If I'm
right, the patch below fixes that up.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-01-11 22:29:13 -06:00
Christoph Hellwig
bfc60177f8 xfs: fix error handling for synchronous writes
If we get an IO error on a synchronous superblock write, we attach an
error release function to it so that when the last reference goes away
the release function is called and the buffer is invalidated and
unlocked. The buffer is left locked until the release function is
called so that other concurrent users of the buffer will be locked out
until the buffer error is fully processed.

Unfortunately, for the superblock buffer the filesyetm itself holds a
reference to the buffer which prevents the reference count from
dropping to zero and the release function being called. As a result,
once an IO error occurs on a sync write, the buffer will never be
unlocked and all future attempts to lock the buffer will hang.

To make matters worse, this problems is not unique to such buffers;
if there is a concurrent _xfs_buf_find() running, the lookup will grab
a reference to the buffer and then wait on the buffer lock, preventing
the reference count from ever falling to zero and hence unlocking the
buffer.

As such, the whole b_relse function implementation is broken because it
cannot rely on the buffer reference count falling to zero to unlock the
errored buffer. The synchronous write error path is the only path that
uses this callback - it is used to ensure that the synchronous waiter
gets the buffer error before the error state is cleared from the buffer
by the release function.

Given that the only sychronous buffer writes now go through xfs_bwrite
and the error path in question can only occur for a write of a dirty,
logged buffer, we can move most of the b_relse processing to happen
inline in xfs_buf_iodone_callbacks, just like a normal I/O completion.
In addition to that we make sure the error is not cleared in
xfs_buf_iodone_callbacks, so that xfs_bwrite can reliably check it.
Given that xfs_bwrite keeps the buffer locked until it has waited for
it and checked the error this allows to reliably propagate the error
to the caller, and make sure that the buffer is reliably unlocked.

Given that xfs_buf_iodone_callbacks was the only instance of the
b_relse callback we can remove it entirely.

Based on earlier patches by Dave Chinner and Ajeet Yadav.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Ajeet Yadav <ajeet.yadav.77@gmail.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-01-11 20:28:42 -06:00
Christoph Hellwig
a46db60834 xfs: add FITRIM support
Allow manual discards from userspace using the FITRIM ioctl.  This is not
intended to be run during normal workloads, as the freepsace btree walks
can cause large performance degradation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-01-11 20:28:29 -06:00
Dave Chinner
c58efdb442 xfs: ensure log covering transactions are synchronous
To ensure the log is covered and the filesystem idles correctly, we
need to ensure that dummy transactions hit the disk and do not stay
pinned in memory.  If the superblock is pinned in memory, it can't
be flushed so the log covering cannot make progress. The result is
dependent on timing - more oftent han not we continue to issues a
log covering transaction every 36s rather than idling after ~90s.

Fix this by making the log covering transaction synchronous. To
avoid additional log force from xfssyncd, make the log covering
transaction take the place of the existing log force in the xfssyncd
background sync process.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-01-11 20:28:17 -06:00
Linus Torvalds
b9d919a4ac Merge branch 'nfs-for-2.6.38' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'nfs-for-2.6.38' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (89 commits)
  NFS fix the setting of exchange id flag
  NFS: Don't use vm_map_ram() in readdir
  NFSv4: Ensure continued open and lockowner name uniqueness
  NFS: Move cl_delegations to the nfs_server struct
  NFS: Introduce nfs_detach_delegations()
  NFS: Move cl_state_owners and related fields to the nfs_server struct
  NFS: Allow walking nfs_client.cl_superblocks list outside client.c
  pnfs: layout roc code
  pnfs: update nfs4_callback_recallany to handle layouts
  pnfs: add CB_LAYOUTRECALL handling
  pnfs: CB_LAYOUTRECALL xdr code
  pnfs: change lo refcounting to atomic_t
  pnfs: check that partial LAYOUTGET return is ignored
  pnfs: add layout to client list before sending rpc
  pnfs: serialize LAYOUTGET(openstateid)
  pnfs: layoutget rpc code cleanup
  pnfs: change how lsegs are removed from layout list
  pnfs: change layout state seqlock to a spinlock
  pnfs: add prefix to struct pnfs_layout_hdr fields
  pnfs: add prefix to struct pnfs_layout_segment fields
  ...
2011-01-11 15:11:56 -08:00
Linus Torvalds
7c955fca3e Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6:
  UDF: Close small mem leak in udf_find_entry()
  udf: Fix directory corruption after extent merging
  udf: Protect udf_file_aio_write from possible races
  udf: Remove unnecessary bkl usages
  udf: Use of s_alloc_mutex to serialize udf_relocate_blocks() execution
  udf: Replace bkl with the UDF_I(inode)->i_data_sem for protect udf_inode_info struct
  udf: Remove BKL from free space counting functions
  udf: Call udf_add_free_space() for more blocks at once in udf_free_blocks()
  udf: Remove BKL from udf_put_super() and udf_remount_fs()
  udf: Protect default inode credentials by rwlock
  udf: Protect all modifications of LVID with s_alloc_mutex
  udf: Move handling of uniqueID into a helper function and protect it by a s_alloc_mutex
  udf: Remove BKL from udf_update_inode
  udf: Convert UDF_SB(sb)->s_flags to use bitops
  fs/udf: Add printf format/argument verification
  fs/udf: Use vzalloc

(Evil merge: this also removes the BKL dependency from the Kconfig file)
2011-01-11 14:45:52 -08:00
Linus Torvalds
e9688f6aca Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (44 commits)
  ext4: fix trimming starting with block 0 with small blocksize
  ext4: revert buggy trim overflow patch
  ext4: don't pass entire map to check_eofblocks_fl
  ext4: fix memory leak in ext4_free_branches
  ext4: remove ext4_mb_return_to_preallocation()
  ext4: flush the i_completed_io_list during ext4_truncate
  ext4: add error checking to calls to ext4_handle_dirty_metadata()
  ext4: fix trimming of a single group
  ext4: fix uninitialized variable in ext4_register_li_request
  ext4: dynamically allocate the jbd2_inode in ext4_inode_info as necessary
  ext4: drop i_state_flags on architectures with 64-bit longs
  ext4: reorder ext4_inode_info structure elements to remove unneeded padding
  ext4: drop ec_type from the ext4_ext_cache structure
  ext4: use ext4_lblk_t instead of sector_t for logical blocks
  ext4: replace i_delalloc_reserved_flag with EXT4_STATE_DELALLOC_RESERVED
  ext4: fix 32bit overflow in ext4_ext_find_goal()
  ext4: add more error checks to ext4_mkdir()
  ext4: ext4_ext_migrate should use NULL not 0
  ext4: Use ext4_error_file() to print the pathname to the corrupted inode
  ext4: use IS_ERR() to check for errors in ext4_error_file
  ...
2011-01-11 14:37:31 -08:00
Linus Torvalds
40c73abbb3 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
  ext2: Resolve 'dereferencing pointer to incomplete type' when enabling EXT2_XATTR_DEBUG
  ext3: Remove redundant unlikely()
  ext2: Remove redundant unlikely()
  ext3: speed up file creates by optimizing rec_len functions
  ext2: speed up file creates by optimizing rec_len functions
  ext3: Add more journal error check
  ext3: Add journal error check in resize.c
  quota: Use %pV and __attribute__((format (printf in __quota_error and fix fallout
  ext3: Add FITRIM handling
  ext3: Add batched discard support for ext3
  ext3: Add journal error check into ext3_rename()
  ext3: Use search_dirblock() in ext3_dx_find_entry()
  ext3: Avoid uninitialized memory references with a corrupted htree directory
  ext3: Return error code from generic_check_addressable
  ext3: Add journal error check into ext3_delete_entry()
  ext3: Add error check in ext3_mkdir()
  fs/ext3/super.c: Use printf extension %pV
  fs/ext2/super.c: Use printf extension %pV
  ext3: don't update sb journal_devnum when RO dev
2011-01-11 14:36:55 -08:00
Linus Torvalds
0945f352ce Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  fs/9p: Don't set dentry->d_op in create routines
  fs/9p: fix spelling typo
  fs/9p: TREADLINK bugfix
  net/9p: Use proper data types
  fs/9p: Simplify the .L create operation
  fs/9p: Move dotl inode operations into a seperate file
  fs/9p: fix menu presentation
  fs/9p: Fix the return error on default acl removal
  fs/9p: Remove unnecessary semicolons
2011-01-11 14:36:08 -08:00
Jan Kara
0f0a25bf51 ext4: fix trimming starting with block 0 with small blocksize
When s_first_data_block is not zero (which happens e.g. when block size is 1KB)
and trim ioctl is called to start trimming from block 0, the math in
ext4_get_group_no_and_offset() overflows. The overall result is that ioctl
returns EINVAL which is kind of unexpected and we probably don't want
userspace tools to bother with internal details of filesystem structure.
So just silently increase starting offset (and shorten length) when starting
block is below s_first_data_block.

CC: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-01-11 15:16:31 -05:00
J. Bruce Fields
5ce8ba25d6 nfsd4: allow restarting callbacks
If we lose the backchannel and then the client repairs the problem,
resend any callbacks.

We use a new cb_done flag to track whether there is still work to be
done for the callback or whether it can be destroyed with the rpc.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-01-11 15:04:11 -05:00
J. Bruce Fields
3ff3600e7e nfsd4: simplify nfsd4_cb_prepare
Remove handling for a nonexistant case (status && !-EAGAIN).

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-01-11 15:04:11 -05:00
J. Bruce Fields
14a24e99f4 nfsd4: give out delegations more quickly in 4.1 case
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-01-11 15:04:11 -05:00
J. Bruce Fields
229b2a0839 nfsd4: add helper function to run callbacks
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-01-11 15:04:11 -05:00
J. Bruce Fields
84f5f7ccc5 nfsd4: make sure sequence flags are set after destroy_session
If this loses any backchannel, make sure we have a chance to notice that
and set the sequence flags.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-01-11 15:04:11 -05:00
J. Bruce Fields
eea4980660 nfsd4: re-probe callback on connection loss
This makes sure we set the sequence flag when necessary.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-01-11 15:04:10 -05:00
J. Bruce Fields
0d7bb71907 nfsd4: set sequence flag when backchannel is down
Implement the SEQ4_STATUS_CB_PATH_DOWN flag.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-01-11 15:04:10 -05:00
J. Bruce Fields
77a3569d6c nfsd4: keep finer-grained callback status
Distinguish between when the callback channel is known to be down, and
when it is not yet confirmed.  This will be useful in the 4.1 case.

Also, we don't seem to be using the fact that this field is atomic.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2011-01-11 15:04:10 -05:00
J. Bruce Fields
dcbeaa68db nfsd4: allow backchannel recovery
Now that we have a list of connections to choose from, we can teach the
callback code to just pick a suitable connection and use that, instead
of insisting on forever using the connection that the first
create_session was sent with.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2011-01-11 15:04:10 -05:00
J. Bruce Fields
1d1bc8f207 nfsd4: support BIND_CONN_TO_SESSION
Basic xdr and processing for BIND_CONN_TO_SESSION.  This adds a
connection to the list of connections associated with a session.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-01-11 15:04:09 -05:00
J. Bruce Fields
4c6493785a nfsd4: modify session list under cl_lock
We want to traverse this from the callback code.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2011-01-11 15:04:09 -05:00
J. Bruce Fields
a2c50f6916 Merge commit 'v2.6.37' into for-2.6.38-incoming
I made a slight mess of Documentation/filesystems/Locking; resolve
conflicts with upstream before fixing it up.
2011-01-11 15:02:19 -05:00
Theodore Ts'o
0a2179b169 ext4: revert buggy trim overflow patch
This reverts commit 4f531501e4: ext4: fix possible overflow in
ext4_trim_fs()

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-01-11 14:42:29 -05:00
Linus Torvalds
7bc4a4ce68 Merge branch 'for-linus-merged' of git://oss.sgi.com/xfs/xfs
* 'for-linus-merged' of git://oss.sgi.com/xfs/xfs: (47 commits)
  xfs: convert grant head manipulations to lockless algorithm
  xfs: introduce new locks for the log grant ticket wait queues
  xfs: convert log grant heads to atomic variables
  xfs: convert l_tail_lsn to an atomic variable.
  xfs: convert l_last_sync_lsn to an atomic variable
  xfs: make AIL tail pushing independent of the grant lock
  xfs: use wait queues directly for the log wait queues
  xfs: combine grant heads into a single 64 bit integer
  xfs: rework log grant space calculations
  xfs: fact out common grant head/log tail verification code
  xfs: convert log grant ticket queues to list heads
  xfs: use AIL bulk delete function to implement single delete
  xfs: use AIL bulk update function to implement single updates
  xfs: remove all the inodes on a buffer from the AIL in bulk
  xfs: consume iodone callback items on buffers as they are processed
  xfs: reduce the number of AIL push wakeups
  xfs: bulk AIL insertion during transaction commit
  xfs: clean up xfs_ail_delete()
  xfs: Pull EFI/EFD handling out from under the AIL lock
  xfs: fix EFI transaction cancellation.
  ...
2011-01-11 11:42:06 -08:00
Linus Torvalds
498f7f505d Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: (22 commits)
  MAINTAINERS: Update Joel Becker's email address
  ocfs2: Remove unused truncate function from alloc.c
  ocfs2/cluster: dereferencing before checking in nst_seq_show()
  ocfs2: fix build for OCFS2_FS_STATS not enabled
  ocfs2/cluster: Show o2net timing statistics
  ocfs2/cluster: Track process message timing stats for each socket
  ocfs2/cluster: Track send message timing stats for each socket
  ocfs2/cluster: Use ktime instead of timeval in struct o2net_sock_container
  ocfs2/cluster: Replace timeval with ktime in struct o2net_send_tracking
  ocfs2: Add DEBUG_FS dependency
  ocfs2/dlm: Hard code the values for enums
  ocfs2/dlm: Minor cleanup
  ocfs2/dlm: Cleanup dlmdebug.c
  ocfs2: Release buffer_head in case of error in ocfs2_double_lock.
  ocfs2/cluster: Pin the local node when o2hb thread starts
  ocfs2/cluster: Show pin state for each o2hb region
  ocfs2/cluster: Pin/unpin o2hb regions
  ocfs2/cluster: Remove dropped region from o2hb quorum region bitmap
  ocfs2/cluster: Pin the remote node item in configfs
  ocfs2/dlm: make existing convertion precedent over new lock
  ...
2011-01-11 11:28:34 -08:00
Andy Adamson
357f54d6b3 NFS fix the setting of exchange id flag
Indicate support for referrals. Do not set any PNFS roles. Check the flags
returned by the server for validity. Do not use exchange flags from an old
client ID instance when recovering a client ID.

Update the EXCHID4_FLAG_XXX set to RFC 5661.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-11 14:17:09 -05:00
Aneesh Kumar K.V
b8b80cf37c fs/9p: Don't set dentry->d_op in create routines
We do set dentry->d_op in lookup even in case of EOENT entries.
That implies we should have dentry->d_op already set when
create/mkdir/mknod/link/symlink routines are called

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-01-11 09:58:08 -06:00
Eric Van Hensbergen
c25a61f542 fs/9p: fix spelling typo
introduced a typo somehow during a hand merge

Reported by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-01-11 09:58:08 -06:00
M. Mohan Kumar
31b6ceac49 fs/9p: TREADLINK bugfix
Remove v9fs_vfs_readlink_dotl function and use generic_readlink. Update
v9fs_vfs_follow_link_dotl function to accommodate this change

Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Reported-by:  Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-01-11 09:58:08 -06:00
Aneesh Kumar K.V
af7542fc8a fs/9p: Simplify the .L create operation
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-01-11 09:58:07 -06:00
Aneesh Kumar K.V
53c06f4e0a fs/9p: Move dotl inode operations into a seperate file
Source Code Reorganization

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-01-11 09:58:07 -06:00
Randy Dunlap
255614c459 fs/9p: fix menu presentation
Make the 9P_FS kconfig options subordinate to the 9P_FS kconfig symbol
in the menu presentation instead of them all being at the same level.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-01-11 09:58:07 -06:00
Aneesh Kumar K.V
6f81c11574 fs/9p: Fix the return error on default acl removal
If we don't have default ACL, then trying to remove
default acl on a file should return 0.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
2011-01-11 09:58:07 -06:00
Joe Perches
009ca3897e fs/9p: Remove unnecessary semicolons
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-01-11 09:58:07 -06:00
Alex Elder
92f1c008ae Merge branch 'master' into for-linus-merged
This merge pulls the XFS master branch into the latest Linus master.
This results in a merge conflict whose best fix is not obvious.
I manually fixed the conflict, in "fs/xfs/xfs_iget.c".

Dave Chinner had done work that resulted in RCU freeing of inodes
separate from what Nick Piggin had done, and their results differed
slightly in xfs_inode_free().  The fix updates Nick's call_rcu()
with the use of VFS_I(), while incorporating needed updates to some
XFS inode fields implemented in Dave's series.  Dave's RCU callback
function has also been removed.

Signed-off-by: Alex Elder <aelder@sgi.com>
2011-01-10 21:35:55 -06:00
Linus Torvalds
e54be894ea Merge branch 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6
* 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6:
  driver core: Document that device_rename() is only for networking
  sysfs: remove useless test from sysfs_merge_group
  driver-core: merge private parts of class and bus
  driver core: fix whitespace in class_attr_string
2011-01-10 16:10:33 -08:00
Dave Chinner
eda7798272 xfs: serialise unaligned direct IOs
When two concurrent unaligned, non-overlapping direct IOs are issued
to the same block, the direct Io layer will race to zero the block.
The result is that one of the concurrent IOs will overwrite data
written by the other IO with zeros. This is demonstrated by the
xfsqa test 240.

To avoid this problem, serialise all unaligned direct IOs to an
inode with a big hammer. We need a big hammer approach as we need to
serialise AIO as well, so we can't just block writes on locks.
Hence, the big hammer is calling xfs_ioend_wait() while holding out
other unaligned direct IOs from starting.

We don't bother trying to serialised aligned vs unaligned IOs as
they are overlapping IO and the result of concurrent overlapping IOs
is undefined - the result of either IO is a valid result so we let
them race. Hence we only penalise unaligned IO, which already has a
major overhead compared to aligned IO so this isn't a major problem.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2011-01-11 10:22:40 +11:00
Dave Chinner
4d8d15812f xfs: factor common write setup code
The buffered IO and direct IO write paths share a common set of
checks and limiting code prior to issuing the write. Factor that
into a common helper function.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2011-01-11 10:23:42 +11:00
Dave Chinner
637bbc75d9 xfs: split buffered IO write path from xfs_file_aio_write
Complete the split of the different write IO paths by splitting the
buffered IO write path out of xfs_file_aio_write(). This makes the
different mechanisms of the write patchs easier to follow.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2011-01-11 10:17:30 +11:00
Dave Chinner
f0d26e860b xfs: split direct IO write path from xfs_file_aio_write
The current xfs_file_aio_write code is a mess of locking shenanigans
to handle the different locking requirements of buffered and direct
IO. Start to clean this up by disentangling the direct IO path from
the mess.

This also removes the failed direct IO fallback path to buffered IO.
XFS handles all direct IO cases without needing to fall back to
buffered IO, so we can safely remove this unused path. This greatly
simplifies the logic and locking needed in the write path.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2011-01-11 10:15:36 +11:00
Dave Chinner
487f84f3f8 xfs: introduce xfs_rw_lock() helpers for locking the inode
We need to obtain the i_mutex, i_iolock and i_ilock during the read
and write paths. Add a set of wrapper functions to neatly
encapsulate the lock ordering and shared/exclusive semantics to make
the locking easier to follow and get right.

Note that this changes some of the exclusive locking serialisation in
that serialisation will occur against the i_mutex instead of the
XFS_IOLOCK_EXCL. This does not change any behaviour, and it is
arguably more efficient to use the mutex for such serialisation than
the rw_sem.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2011-01-12 11:37:10 +11:00
Dave Chinner
4c5cfd1b41 xfs: factor post-write newsize updates
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2011-01-11 10:14:16 +11:00
Dave Chinner
edafb6da9a xfs: factor common post-write isize handling code
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2011-01-11 10:14:06 +11:00
Dave Chinner
a363f0c203 xfs: ensure sync write errors are returned
xfs_file_aio_write() only returns the error from synchronous
flushing of the data and inode if error == 0. At the point where
error is being checked, it is guaranteed to be > 0. Therefore any
errors returned by the data or fsync flush will never be returned.
Fix the checks so we overwrite the current error once and only if an
error really occurred.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2011-01-11 10:13:53 +11:00
Trond Myklebust
68c404b18f Merge branch 'bugfixes' into nfs-for-2.6.38
Conflicts:
	fs/nfs/nfs2xdr.c
	fs/nfs/nfs3xdr.c
	fs/nfs/nfs4xdr.c
2011-01-10 14:48:02 -05:00
Trond Myklebust
6650239a4b NFS: Don't use vm_map_ram() in readdir
vm_map_ram() is not available on NOMMU platforms, and causes trouble
on incoherrent architectures such as ARM when we access the page data
through both the direct and the virtual mapping.

The alternative is to use the direct mapping to access page data
for the case when we are not crossing a page boundary, but to copy
the data into a linear scratch buffer when we are accessing data
that spans page boundaries.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: stable@kernel.org  [2.6.37]
2011-01-10 14:45:01 -05:00
Josh Hunt
d96336b05d ext2: Resolve 'dereferencing pointer to incomplete type' when enabling EXT2_XATTR_DEBUG
When I enable EXT2_XATTR_DEBUG in fs/ext2/xattr.c I get a build error stating
the following:

  CC      fs/ext2/xattr.o
fs/ext2/xattr.c: In function 'ext2_xattr_cache_insert':
fs/ext2/xattr.c:841: error: dereferencing pointer to incomplete type
fs/ext2/xattr.c:846: error: dereferencing pointer to incomplete type
make[2]: *** [fs/ext2/xattr.o] Error 1
make[1]: *** [fs/ext2] Error 2
make: *** [fs] Error 2

These lines reference ext2_xattr_cache->c_entry_count which is defined
in struct mb_cache. struct mb_cache is currently only defined in fs/mbcache.c.
Moving struct mb_cache definition to include/linux/mbcache.h to resolve the
issue.

Signed-off-by: Josh Hunt <johunt@akamai.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2011-01-10 19:04:08 +01:00