All callers really want the more logical filemap_fdatawait_range interface,
so convert them to use it and merge wait_on_page_writeback_range into
filemap_fdatawait_range.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
The discard ioctl is used by mkfs utilities to clear a block device
prior to putting metadata down. However, not all devices return zeroed
blocks after a discard. Some drives return stale data, potentially
containing old superblocks. It is therefore important to know whether
discarded blocks are properly zeroed.
Both ATA and SCSI drives have configuration bits that indicate whether
zeroes are returned after a discard operation. Implement a block level
interface that allows this information to be bubbled up the stack and
queried via a new block device ioctl.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
There seems to be a regression in direct write path due to following
commit in for-2.6.33 branch of block tree.
commit 1af60fbd75
Author: Jeff Moyer <jmoyer@redhat.com>
Date: Fri Oct 2 18:56:53 2009 -0400
block: get rid of the WRITE_ODIRECT flag
Marking direct writes as WRITE_SYNC_PLUG instead of WRITE_ODIRECT, sets
the NOIDLE flag in bio and hence in request. This tells CFQ to not expect
more request from the queue and not idle on it (despite the fact that
queue's think time is less and it is not seeky).
So direct writers lose big time when competing with sequential readers.
Using fio, I have run one direct writer and two sequential readers and
following are the results with 2.6.32-rc7 kernel and with for-2.6.33
branch.
Test
====
1 direct writer and 2 sequential reader running simultaneously.
[global]
directory=/mnt/sdc/fio/
runtime=10
[seqwrite]
rw=write
size=4G
direct=1
[seqread]
rw=read
size=2G
numjobs=2
2.6.32-rc7
==========
direct writes: aggrb=2,968KB/s
readers : aggrb=101MB/s
for-2.6.33 branch
=================
direct write: aggrb=19KB/s
readers aggrb=137MB/s
This patch brings back the WRITE_ODIRECT flag, with the difference that we
don't set the BIO_RW_UNPLUG flag so that device is not unplugged after
submission of request and an explicit unplug from submitter is required.
That way we fix the jeff's issue of not enough merging taking place in aio
path as well as make sure direct writes get their fair share.
After the fix
=============
for-2.6.33 + fix
----------------
direct writes: aggrb=2,728KB/s
reads: aggrb=103MB/s
Thanks
Vivek
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Hi,
The WRITE_ODIRECT flag is only used in one place, and that code path
happens to also call blk_run_address_space. The introduction of this
flag, then, could result in the device being unplugged twice for every
I/O.
Further, with the batching changes in the next patch, we don't want an
O_DIRECT write to imply a queue unplug.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: (41 commits)
Revert "Seperate read and write statistics of in_flight requests"
cfq-iosched: don't delay async queue if it hasn't dispatched at all
block: Topology ioctls
cfq-iosched: use assigned slice sync value, not default
cfq-iosched: rename 'desktop' sysfs entry to 'low_latency'
cfq-iosched: implement slower async initiate and queue ramp up
cfq-iosched: delay async IO dispatch, if sync IO was just done
cfq-iosched: add a knob for desktop interactiveness
Add a tracepoint for block request remapping
block: allow large discard requests
block: use normal I/O path for discard requests
swapfile: avoid NULL pointer dereference in swapon when s_bdev is NULL
fs/bio.c: move EXPORT* macros to line after function
Add missing blk_trace_remove_sysfs to be in pair with blk_trace_init_sysfs
cciss: fix build when !PROC_FS
block: Do not clamp max_hw_sectors for stacking devices
block: Set max_sectors correctly for stacking devices
cciss: cciss_host_attr_groups should be const
cciss: Dynamically allocate the drive_info_struct for each logical drive.
cciss: Add usage_count attribute to each logical drive in /sys
...
Not all users of the topology information want to use libblkid. Provide
the topology information through bdev ioctls.
Also clarify sector size comments for existing BLK ioctls.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
truncate: use new helpers
truncate: new helpers
fs: fix overflow in sys_mount() for in-kernel calls
fs: Make unload_nls() NULL pointer safe
freeze_bdev: grab active reference to frozen superblocks
freeze_bdev: kill bd_mount_sem
exofs: remove BKL from super operations
fs/romfs: correct error-handling code
vfs: seq_file: add helpers for data filling
vfs: remove redundant position check in do_sendfile
vfs: change sb->s_maxbytes to a loff_t
vfs: explicitly cast s_maxbytes in fiemap_check_ranges
libfs: return error code on failed attr set
seq_file: return a negative error code when seq_path_root() fails.
vfs: optimize touch_time() too
vfs: optimization for touch_atime()
vfs: split generic_forget_inode() so that hugetlbfs does not have to copy it
fs/inode.c: add dev-id and inode number for debugging in init_special_inode()
libfs: make simple_read_from_buffer conventional
* 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6: (21 commits)
HWPOISON: Enable error_remove_page on btrfs
HWPOISON: Add simple debugfs interface to inject hwpoison on arbitary PFNs
HWPOISON: Add madvise() based injector for hardware poisoned pages v4
HWPOISON: Enable error_remove_page for NFS
HWPOISON: Enable .remove_error_page for migration aware file systems
HWPOISON: The high level memory error handler in the VM v7
HWPOISON: Add PR_MCE_KILL prctl to control early kill behaviour per process
HWPOISON: shmem: call set_page_dirty() with locked page
HWPOISON: Define a new error_remove_page address space op for async truncation
HWPOISON: Add invalidate_inode_page
HWPOISON: Refactor truncate to allow direct truncating of page v2
HWPOISON: check and isolate corrupted free pages v2
HWPOISON: Handle hardware poisoned pages in try_to_unmap
HWPOISON: Use bitmask/action code for try_to_unmap behaviour
HWPOISON: x86: Add VM_FAULT_HWPOISON handling to x86 page fault handler v2
HWPOISON: Add poison check to page fault handling
HWPOISON: Add basic support for poisoned pages in fault handler v3
HWPOISON: Add new SIGBUS error codes for hardware poison signals
HWPOISON: Add support for poison swap entries v2
HWPOISON: Export some rmap vma locking to outside world
...
It's unused.
It isn't needed -- read or write flag is already passed and sysctl
shouldn't care about the rest.
It _was_ used in two places at arch/frv for some reason.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Introduce new truncate helpers truncate_pagecache and inode_newsize_ok.
vmtruncate is also consolidated from mm/memory.c and mm/nommu.c and
into mm/truncate.c.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Currently we held s_umount while a filesystem is frozen, despite that we
might return to userspace and unlock it from a different process. Instead
grab an active reference to keep the file system busy and add an explicit
check for frozen filesystems in remount and reject the remount instead
of blocking on s_umount.
Add a new get_active_super helper to super.c for use by freeze_bdev that
grabs an active reference to a superblock from a given block device.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Now that we have the freeze count there is not much reason for bd_mount_sem
anymore. The actual freeze/thaw operations are serialized using the
bd_fsfreeze_mutex, and the only other place we take bd_mount_sem is
get_sb_bdev which tries to prevent mounting a filesystem while the block
device is frozen. Instead of add a check for bd_fsfreeze_count and
return -EBUSY if a filesystem is frozen. While that is a change in user
visible behaviour a failing mount is much better for this case rather
than having the mount process stuck uninterruptible for a long time.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
sb->s_maxbytes is supposed to indicate the maximum size of a file that can
exist on the filesystem. It's declared as an unsigned long long.
Even if a filesystem has no inherent limit that prevents it from using
every bit in that unsigned long long, it's still problematic to set it to
anything larger than MAX_LFS_FILESIZE. There are places in the kernel
that cast s_maxbytes to a signed value. If it's set too large then this
cast makes it a negative number and generally breaks the comparison.
Change s_maxbytes to be loff_t instead. That should help eliminate the
temptation to set it too large by making it a signed value.
Also, add a warning for couple of releases to help catch filesystems that
set s_maxbytes too large. Eventually we can either convert this to a
BUG() or just remove it and in the hope that no one will get it wrong now
that it's a signed value.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Robert Love <rlove@google.com>
Cc: Mandeep Singh Baines <msb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Hugetlbfs needs to do special things instead of truncate_inode_pages().
Currently, it copied generic_forget_inode() except for
truncate_inode_pages() call which is asking for trouble (the code there
isn't trivial). So create a separate function generic_detach_inode()
which does all the list magic done in generic_forget_inode() and call
it from hugetlbfs_forget_inode().
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
We do this automatically in get_sb_bdev() from the set_bdev_super()
callback. Filesystems that have their own private backing_dev_info
must assign that in ->fill_super().
Note that ->s_bdi assignment is required for proper writeback!
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
It has been unused since it was introduced in:
commit 520808bf20e90fdbdb320264ba7dd5cf9d47dcac
Author: Andrew Morton <akpm@osdl.org>
Date: Fri May 21 00:46:17 2004 -0700
[PATCH] block device layer: separate backing_dev_info infrastructure
So lets just kill it.
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Truncating metadata pages is not safe right now before
we haven't audited all file systems.
To enable truncation only for data address space define
a new address_space callback error_remove_page.
This is used for memory_failure.c memory error handling.
This can be then set to truncate_inode_page()
This patch just defines the new operation and adds documentation.
Callers and users come in followon patches.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
* 'for-2.6.32' of git://git.kernel.dk/linux-2.6-block: (29 commits)
block: use blkdev_issue_discard in blk_ioctl_discard
Make DISCARD_BARRIER and DISCARD_NOBARRIER writes instead of reads
block: don't assume device has a request list backing in nr_requests store
block: Optimal I/O limit wrapper
cfq: choose a new next_req when a request is dispatched
Seperate read and write statistics of in_flight requests
aoe: end barrier bios with EOPNOTSUPP
block: trace bio queueing trial only when it occurs
block: enable rq CPU completion affinity by default
cfq: fix the log message after dispatched a request
block: use printk_once
cciss: memory leak in cciss_init_one()
splice: update mtime and atime on files
block: make blk_iopoll_prep_sched() follow normal 0/1 return convention
cfq-iosched: get rid of must_alloc flag
block: use interrupts disabled version of raise_softirq_irqoff()
block: fix comment in blk-iopoll.c
block: adjust default budget for blk-iopoll
block: fix long lines in block/blk-iopoll.c
block: add blk-iopoll, a NAPI like approach for block devices
...
Introduce new function for generic inode syncing (vfs_fsync_range) and use
it from fsync() path. Introduce also new helper for syncing after a sync
write (generic_write_sync) using the generic function.
Use these new helpers for syncing from generic VFS functions. This makes
O_SYNC writes to block devices acquire i_mutex for syncing. If we really
care about this, we can make block_fsync() drop the i_mutex and reacquire
it before it returns.
CC: Evgeniy Polyakov <zbr@ioremap.net>
CC: ocfs2-devel@oss.oracle.com
CC: Joel Becker <joel.becker@oracle.com>
CC: Felix Blyakher <felixb@sgi.com>
CC: xfs@oss.sgi.com
CC: Anton Altaparmakov <aia21@cantab.net>
CC: linux-ntfs-dev@lists.sourceforge.net
CC: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
CC: linux-ext4@vger.kernel.org
CC: tytso@mit.edu
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
generic_file_aio_write_nolock() is now used only by block devices and raw
character device. Filesystems should use __generic_file_aio_write() in case
generic_file_aio_write() doesn't suit them. So rename the function to
blkdev_aio_write() and move it to fs/blockdev.c.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Rename __generic_file_aio_write_nolock() to __generic_file_aio_write(), add
comments to write helpers explaining how they should be used and export
__generic_file_aio_write() since it will be used by some filesystems.
CC: ocfs2-devel@oss.oracle.com
CC: Joel Becker <joel.becker@oracle.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
This simple helper saves some filesystems conversion from byte offset
to page numbers and also makes the fdata* interface more complete.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
The commands are conceptually writes, and in the case of IDE and SCSI
commands actually are writes. They were only reads because we thought
that would interact better with the elevators. Now the elevators know
about discard requests, that advantage no longer exists.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This gets rid of pdflush for bdi writeout and kupdated style cleaning.
pdflush writeout suffers from lack of locality and also requires more
threads to handle the same workload, since it has to work in a
non-blocking fashion against each queue. This also introduces lumpy
behaviour and potential request starvation, since pdflush can be starved
for queue access if others are accessing it. A sample ffsb workload that
does random writes to files is about 8% faster here on a simple SATA drive
during the benchmark phase. File layout also seems a LOT more smooth in
vmstat:
r b swpd free buff cache si so bi bo in cs us sy id wa
0 1 0 608848 2652 375372 0 0 0 71024 604 24 1 10 48 42
0 1 0 549644 2712 433736 0 0 0 60692 505 27 1 8 48 44
1 0 0 476928 2784 505192 0 0 4 29540 553 24 0 9 53 37
0 1 0 457972 2808 524008 0 0 0 54876 331 16 0 4 38 58
0 1 0 366128 2928 614284 0 0 4 92168 710 58 0 13 53 34
0 1 0 295092 3000 684140 0 0 0 62924 572 23 0 9 53 37
0 1 0 236592 3064 741704 0 0 4 58256 523 17 0 8 48 44
0 1 0 165608 3132 811464 0 0 0 57460 560 21 0 8 54 38
0 1 0 102952 3200 873164 0 0 4 74748 540 29 1 10 48 41
0 1 0 48604 3252 926472 0 0 0 53248 469 29 0 7 47 45
where vanilla tends to fluctuate a lot in the creation phase:
r b swpd free buff cache si so bi bo in cs us sy id wa
1 1 0 678716 5792 303380 0 0 0 74064 565 50 1 11 52 36
1 0 0 662488 5864 319396 0 0 4 352 302 329 0 2 47 51
0 1 0 599312 5924 381468 0 0 0 78164 516 55 0 9 51 40
0 1 0 519952 6008 459516 0 0 4 78156 622 56 1 11 52 37
1 1 0 436640 6092 541632 0 0 0 82244 622 54 0 11 48 41
0 1 0 436640 6092 541660 0 0 0 8 152 39 0 0 51 49
0 1 0 332224 6200 644252 0 0 4 102800 728 46 1 13 49 36
1 0 0 274492 6260 701056 0 0 4 12328 459 49 0 7 50 43
0 1 0 211220 6324 763356 0 0 0 106940 515 37 1 10 51 39
1 0 0 160412 6376 813468 0 0 0 8224 415 43 0 6 49 45
1 1 0 85980 6452 886556 0 0 4 113516 575 39 1 11 54 34
0 2 0 85968 6452 886620 0 0 0 1640 158 211 0 0 46 54
A 10 disk test with btrfs performs 26% faster with per-bdi flushing. A
SSD based writeback test on XFS performs over 20% better as well, with
the throughput being very stable around 1GB/sec, where pdflush only
manages 750MB/sec and fluctuates wildly while doing so. Random buffered
writes to many files behave a lot better as well, as does random mmap'ed
writes.
A separate thread is added to sync the super blocks. In the long term,
adding sync_supers_bdi() functionality could get rid of this thread again.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This is a first step at introducing per-bdi flusher threads. We should
have no change in behaviour, although sb_has_dirty_inodes() is now
ridiculously expensive, as there's no easy way to answer that question.
Not a huge problem, since it'll be deleted in subsequent patches.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This adds two new exported functions:
- writeback_inodes_sb(), which only attempts to writeback dirty inodes on
this super_block, for WB_SYNC_NONE writeout.
- sync_inodes_sb(), which writes out all dirty inodes on this super_block
and also waits for the IO to complete.
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* topic/soundcore-preclaim:
sound: make OSS device number claiming optional and schedule its removal
sound: request char-major-* module aliases for missing OSS devices
chrdev: implement __[un]register_chrdev()
This is stage one in flattening out the callchains for the common
permission testing. Rather than have most filesystem implement their
own inode->i_op->permission function that just calls back down to the
VFS layers 'generic_permission()' with the per-filesystem ACL checking
function, the filesystem can just expose its 'check_acl' function
directly, and let the VFS layer do everything for it.
This is all just preparatory - no filesystem actually enables this yet.
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
vfs_read() offset is defined as loff_t, but kernel_read()
offset is only defined as unsigned long. Redefine
kernel_read() offset as loff_t.
Cc: stable@kernel.org
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
[un]register_chrdev() assume minor range 0-255. This patch adds __
prefixed versions which take @minorbase and @count explicitly.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
When we want to tear down an inode that lost the add to the cache race
in XFS we must not call into ->destroy_inode because that would delete
the inode that won the race from the inode cache radix tree.
This patch provides the __destroy_inode helper needed to fix this,
the actual fix will be in th next patch. As XFS was the only reason
destroy_inode was exported we shift the export to the new __destroy_inode.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
Currently inode_init_always calls into ->destroy_inode if the additional
initialization fails. That's not only counter-intuitive because
inode_init_always did not allocate the inode structure, but in case of
XFS it's actively harmful as ->destroy_inode might delete the inode from
a radix-tree that has never been added. This in turn might end up
deleting the inode for the same inum that has been instanciated by
another process and cause lots of cause subtile problems.
Also in the case of re-initializing a reclaimable inode in XFS it would
free an inode we still want to keep alive.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
Create bdgrab(). This function copies an existing reference to a
block_device. It is safe to call from any context.
Hibernation code wishes to copy a reference to the active swap device.
Right now it calls bdget() under a spinlock, but this is wrong because
bdget() can sleep. It doesn't need a full bdget() because we already
hold a reference to active swap devices (and the spinlock protects
against swapoff).
Fixes http://bugzilla.kernel.org/show_bug.cgi?id=13827
Signed-off-by: Alan Jenkins <alan-jenkins@tuffmail.co.uk>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
This patch adds ioctls to vfs for compatibility with legacy XFS
pre-allocation ioctls (XFS_IOC_*RESVP*). The implementation
effectively invokes sys_fallocate for the new ioctls.
Also handles the compat_ioctl case.
Note: These legacy ioctls are also implemented by OCFS2.
[AV: folded fixes from hch]
Signed-off-by: Ankit Jain <me@ankitjain.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* 'for-2.6.31' of git://fieldses.org/git/linux-nfsd: (60 commits)
SUNRPC: Fix the TCP server's send buffer accounting
nfsd41: Backchannel: minorversion support for the back channel
nfsd41: Backchannel: cleanup nfs4.0 callback encode routines
nfsd41: Remove ip address collision detection case
nfsd: optimise the starting of zero threads when none are running.
nfsd: don't take nfsd_mutex twice when setting number of threads.
nfsd41: sanity check client drc maxreqs
nfsd41: move channel attributes from nfsd4_session to a nfsd4_channel_attr struct
NFS: kill off complicated macro 'PROC'
sunrpc: potential memory leak in function rdma_read_xdr
nfsd: minor nfsd_vfs_write cleanup
nfsd: Pull write-gathering code out of nfsd_vfs_write
nfsd: track last inode only in use_wgather case
sunrpc: align cache_clean work's timer
nfsd: Use write gathering only with NFSv2
NFSv4: kill off complicated macro 'PROC'
NFSv4: do exact check about attribute specified
knfsd: remove unreported filehandle stats counters
knfsd: fix reply cache memory corruption
knfsd: reply cache cleanups
...
* akpm: (182 commits)
fbdev: bf54x-lq043fb: use kzalloc over kmalloc/memset
fbdev: *bfin*: fix __dev{init,exit} markings
fbdev: *bfin*: drop unnecessary calls to memset
fbdev: bfin-t350mcqb-fb: drop unused local variables
fbdev: blackfin has __raw I/O accessors, so use them in fb.h
fbdev: s1d13xxxfb: add accelerated bitblt functions
tcx: use standard fields for framebuffer physical address and length
fbdev: add support for handoff from firmware to hw framebuffers
intelfb: fix a bug when changing video timing
fbdev: use framebuffer_release() for freeing fb_info structures
radeon: P2G2CLK_ALWAYS_ONb tested twice, should 2nd be P2G2CLK_DAC_ALWAYS_ONb?
s3c-fb: CPUFREQ frequency scaling support
s3c-fb: fix resource releasing on error during probing
carminefb: fix possible access beyond end of carmine_modedb[]
acornfb: remove fb_mmap function
mb862xxfb: use CONFIG_OF instead of CONFIG_PPC_OF
mb862xxfb: restrict compliation of platform driver to PPC
Samsung SoC Framebuffer driver: add Alpha Channel support
atmel-lcdc: fix pixclock upper bound detection
offb: use framebuffer_alloc() to allocate fb_info struct
...
Manually fix up conflicts due to kmemcheck in mm/slab.c
Remove __invalidate_mapping_pages atomic variant now that its sole caller
can sleep (fixed in eccb95cee4 ("vfs: fix
lock inversion in drop_pagecache_sb()")).
This fixes softlockups that can occur while in the drop_caches path.
Signed-off-by: Mike Waychison <mikew@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This makes the performance impact of possible mmap_miss wrap around to be
temporary and tolerable: i.e. MMAP_LOTSAMISS=100 extra readarounds.
Otherwise if ever mmap_miss wraps around to negative, it takes INT_MAX
cache misses to bring it back to normal state. During the time mmap
readaround will be _enabled_ for whatever wild random workload. That's
almost permanent performance impact.
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Ying Han <yinghan@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The purpose of this change is to allow __getname() users to pass a
custom GFP mask to kmem_cache_alloc(). This is needed for annotating
a certain kmemcheck false positive.
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
writes associated buffers, then does sync_inode() to write
the inode itself (and to make it clean). Depends on
->write_inode() honouring the second argument.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The only user of the i_cindex element in the inode structure is used
is by the firewire drivers. As part of an attempt to slim down the
inode structure to save memory --- since a typical Linux system will
have hundreds of thousands if not millions of inodes cached, a
reduction in the size inode has high leverage.
The firewire driver does not need i_cindex in any fast path, so it's
simple enough to calculate when it is needed, instead of wasting space
in the inode structure.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: krh@redhat.com
Cc: stefanr@s5r6.in-berlin.de
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Rename the function so that it better describe what it really does. Also
remove the unnecessary include of buffer_head.h.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Move sync_filesystems(), __fsync_super(), fsync_super() from
super.c to sync.c where it fits better.
[build fixes folded]
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
It is unnecessarily fragile to have two places (fsync_super() and do_sync())
doing data integrity sync of the filesystem. Alter __fsync_super() to
accommodate needs of both callers and use it. So after this patch
__fsync_super() is the only place where we gather all the calls needed to
properly send all data on a filesystem to disk.
Nice bonus is that we get a complete livelock avoidance and write_supers()
is now only used for periodic writeback of superblocks.
sync_blockdevs() introduced a couple of patches ago is gone now.
[build fixes folded]
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
__fsync_super() does the same thing as fsync_super(). So change the only
caller to use fsync_super() and make __fsync_super() static. This removes
unnecessarily duplicated call to sync_blockdev() and prepares ground
for the changes to __fsync_super() in the following patches.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Remove the unused s_async_list in the superblock, a leftover of the
broken async inode deletion code that leaked into mainline. Having this
in the middle of the sync/unmount path is not helpful for the following
cleanups.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reimplement dnotify using fsnotify.
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
This patch creates a way for fsnotify groups to attach marks to inodes.
These marks have little meaning to the generic fsnotify infrastructure
and thus their meaning should be interpreted by the group that attached
them to the inode's list.
dnotify and inotify will make use of these markings to indicate which
inodes are of interest to their respective groups. But this implementation
has the useful property that in the future other listeners could actually
use the marks for the exact opposite reason, aka to indicate which inodes
it had NO interest in.
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
If f_op->splice_read() is not implemented, fall back to a plain read.
Use vfs_readv() to read into previously allocated pages.
This will allow splice and functions using splice, such as the loop
device, to work on all filesystems. This includes "direct_io" files
in fuse which bypass the page cache.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Put generic_show_options read access to s_options under rcu_read_lock,
split save_mount_options() into "we are setting it the first time"
(uses in foo_fill_super()) and "we are relacing and freeing the old one",
synchronize_rcu() before kfree() in the latter.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Does equivalent of up_write(&s->s_umount); deactivate_super(s);
However, it does not does not unlock it until it's all over.
As the result, it's safe to use to dispose of new superblock on ->get_sb()
failure exits - nobody will see the sucker until it's all over.
Equivalent using up_write/deactivate_super is safe for that purpose
if superblock is either safe to use or has NULL ->s_root when we unlock.
Normally filesystems take the required precautions, but
a) we do have bugs in that area in some of them.
b) up_write/deactivate_super sequence is extremely common,
so the helper makes sense anyway.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
For every lock request lockd creates a new file_lock object
in nlmsvc_setgrantargs() by copying the passed in file_lock with
locks_copy_lock(). A filesystem can attach it's own lock_operations
vector to the file_lock. It has to be cleaned up at the end of the
file_lock's life. However, lockd doesn't do it today, yet it
asserts in nlmclnt_release_lockargs() that the per-filesystem
state is clean.
This patch fixes it by exporting locks_release_private() and adding
it to nlmsvc_freegrantargs(), to be symmetrical to creating a
file_lock in nlmsvc_setgrantargs().
Signed-off-by: Felix Blyakher <felixb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
"int get_filesystem_list(char * buf)" is called by only
"static void __init get_fs_names(char *page)".
We can mark get_filesystem_list() as "__init".
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
There's really no reason to keep vfs_stat_fd and vfs_lstat_fd with
Oleg's vfs_fstatat. Use vfs_fstatat for the few cases having the
directory fd, and switch all others to vfs_stat / vfs_lstat.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This is a version incorporating Christoph's suggestion.
Separate out common *fstatat functionality into a single function
instead of duplicating it all over the code.
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
There are lots of sequences like this, especially in splice code:
if (pipe->inode)
mutex_lock(&pipe->inode->i_mutex);
/* do something */
if (pipe->inode)
mutex_unlock(&pipe->inode->i_mutex);
so introduce helpers which do the conditional locking and unlocking.
Also replace the inode_double_lock() call with a pipe_double_lock()
helper to avoid spreading the use of this functionality beyond the
pipe code.
This patch is just a cleanup, and should cause no behavioral changes.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Remove the now unused generic_file_splice_write_nolock() function.
It's conceptually broken anyway, because splice may need to wait for
pipe events so holding locks across the whole operation is wrong.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
It's a somewhat twisty maze of hints and behavioural modifiers, try
and clear it up a bit with some documentation.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
The mqueuefs filesystem will use this helper as well. Proc's main get_sb
could also be made to use it, but that will require a bit more rework.
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'kmemtrace-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
kmemtrace: trace kfree() calls with NULL or zero-length objects
kmemtrace: small cleanups
kmemtrace: restore original tracing data binary format, improve ABI
kmemtrace: kmemtrace_alloc() must fill type_id
kmemtrace: use tracepoints
kmemtrace, rcu: don't include unnecessary headers, allow kmemtrace w/ tracepoints
kmemtrace, rcu: fix rcupreempt.c data structure dependencies
kmemtrace, rcu: fix rcu_tree_trace.c data structure dependencies
kmemtrace, rcu: fix linux/rcutree.h and linux/rcuclassic.h dependencies
kmemtrace, mm: fix slab.h dependency problem in mm/failslab.c
kmemtrace, kbuild: fix slab.h dependency problem in lib/decompress_unlzma.c
kmemtrace, kbuild: fix slab.h dependency problem in lib/decompress_bunzip2.c
kmemtrace, kbuild: fix slab.h dependency problem in lib/decompress_inflate.c
kmemtrace, squashfs: fix slab.h dependency problem in squasfs
kmemtrace, befs: fix slab.h dependency problem
kmemtrace, security: fix linux/key.h header file dependencies
kmemtrace, fs: fix linux/fdtable.h header file dependencies
kmemtrace, fs: uninline simple_transaction_set()
kmemtrace, fs, security: move alloc_secdata() and free_secdata() to linux/security.h
By default, CFQ will anticipate more IO from a given io context if the
previously completed IO was sync. This used to be fine, since the only
sync IO was reads and O_DIRECT writes. But with more "normal" sync writes
being used now, we don't want to anticipate for those.
Add a bio/request flag that informs the IO scheduler that this is a sync
request that we should not idle for. Introduce WRITE_ODIRECT specifically
for O_DIRECT writes, and make sure that the other sync writes set this
flag.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(S)WRITE_SYNC always unplugs the device right after IO submission.
Sometimes we want to build up a queue before doing so, so add
variants that explicitly DON'T unplug the queue. The caller must
then do that after submitting all the IO.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Impact: cleanup
We want to remove percpu.h from rcupdate.h (for upcoming kmemtrace
changes), but this is not possible currently without breaking the
build because fs.h has an implicit include file depedency: it
uses PAGE_SIZE but does not include asm/page.h which defines it.
This problem gets masked in practice because most fs.h using sites
use rcupreempt.h (and other headers) which includes percpu.h which
brings in asm/page.h indirectly.
We cannot add asm/page.h to asm/fs.h because page.h is not an
exported header.
Move simple_transaction_set() to the other simple-transaction
file helpers in fs/libfs.c.
This removes the include file hell and also reduces
kernel size a bit.
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Cc: paulmck@linux.vnet.ibm.com
LKML-Reference: <1237898630.25315.83.camel@penberg-laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
We want to remove percpu.h from rcupdate.h (for upcoming kmemtrace
changes), but this is not possible currently without breaking the
build because fs.h has implicit include file depedencies: it uses
GFP_* types in inlines but does not include gfp.h.
In practice most fs.h using .c files get gfp.h included implicitly,
via an indirect route: via rcupdate.h inclusion - so this underlying
problem gets masked in practice.
So we want to solve fs.h's dependency on gfp.h.
gfp.h can not be included here directly because it is not exported and it
would break the build the following way:
/home/mingo/tip/usr/include/linux/bsg.h:11: found __[us]{8,16,32,64} type without #include <linux/types.h>
/home/mingo/tip/usr/include/linux/fs.h:11: included file 'linux/gfp.h' is not exported
make[3]: *** [/home/mingo/tip/usr/include/linux/.check] Error 1
make[2]: *** [linux] Error 2
As suggested by Alexey Dobriyan, move alloc_secdata() and free_secdata()
to linux/security.h - they belong there. This also cleans fs.h of GFP_*
usage.
Suggested-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
LKML-Reference: <1237906803.25315.96.camel@penberg-laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
Remove two unneeded exports and make two symbols static in fs/mpage.c
Cleanup after commit 585d3bc06f
Trim includes of fdtable.h
Don't crap into descriptor table in binfmt_som
Trim includes in binfmt_elf
Don't mess with descriptor table in load_elf_binary()
Get rid of indirect include of fs_struct.h
New helper - current_umask()
check_unsafe_exec() doesn't care about signal handlers sharing
New locking/refcounting for fs_struct
Take fs_struct handling to new file (fs/fs_struct.c)
Get rid of bumping fs_struct refcount in pivot_root(2)
Kill unsharing fs_struct in __set_personality()
Now that the filesystem freeze operation has been elevated to the VFS, and
is just an ioctl away, some sort of safety net for unintentionally frozen
root filesystems may be in order.
The timeout thaw originally proposed did not get merged, but perhaps
something like this would be useful in emergencies.
For example, freeze /path/to/mountpoint may freeze your root filesystem if
you forgot that you had that unmounted.
I chose 'j' as the last remaining character other than 'h' which is sort
of reserved for help (because help is generated on any unknown character).
I've tested this on a non-root fs with multiple (nested) freezers, as well
as on a system rendered unresponsive due to a frozen root fs.
[randy.dunlap@oracle.com: emergency thaw only if CONFIG_BLOCK enabled]
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Cc: Takashi Sato <t-sato@yk.jp.nec.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lockdep gripes if file->f_lock is taken in a no-IRQ situation, since that
is not always the case. We don't really want to disable IRQs for every
acquisition of f_lock; instead, just move it outside of fasync_lock.
Reported-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Reported-by: Larry Finger <Larry.Finger@lwfinger.net>
Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
simple_set_mnt() is defined as returning 'int' but always returns 0.
Callers assume simple_set_mnt() never fails and don't properly cleanup if
it were to _ever_ fail. For instance, get_sb_single() and get_sb_nodev()
should:
up_write(sb->s_unmount);
deactivate_super(sb);
if simple_set_mnt() fails.
Since simple_set_mnt() never fails, would be cleaner if it did not
return anything.
[akpm@linux-foundation.org: fix build]
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Move some block device related code out from buffer.c and put it in
block_dev.c. I'm trying to move non-buffer_head code out of buffer.c
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
There is a second set of macros for when CONFIG_FILE_LOCKING is
not set. This patch updates those to become inline functions
as well.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This avoids various issues which might give rise to compiler warnings
about missing functions and/or unused variable with the previous
macros. This also fixes a bug where one of the macros was returning
0, but it should have been void.
Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Tested-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The last user of do_pipe is in arch/alpha/, after replacing it with
do_pipe_flags, the do_pipe can be totally dropped.
Signed-off-by: Cheng Renquan <crquan@gmail.com>
Acked-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* 'bkl-removal' of git://git.lwn.net/linux-2.6:
Rationalize fasync return values
Move FASYNC bit handling to f_op->fasync()
Use f_lock to protect f_flags
Rename struct file->f_ep_lock
Add support for explicitly requesting full atime updates. This makes it
possible for kernels to default to relatime but still allow userspace to
override it.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Traditionally, changes to struct file->f_flags have been done under BKL
protection, or with no protection at all. This patch causes all f_flags
changes after file open/creation time to be done under protection of
f_lock. This allows the removal of some BKL usage and fixes a number of
longstanding (if microscopic) races.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
This lock moves out of the CONFIG_EPOLL ifdef and becomes f_lock. For now,
epoll remains the only user, but a future patch will use it to protect
f_flags as well.
Cc: Davide Libenzi <davidel@xmailserver.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
block: fix deadlock in blk_abort_queue() for drivers that readd to timeout list
block: fix booting from partitioned md array
block: revert part of 18ce3751cc
cciss: PCI power management reset for kexec
paride/pg.c: xs(): &&/|| confusion
fs/bio: bio_alloc_bioset: pass right object ptr to mempool_free
block: fix bad definition of BIO_RW_SYNC
bsg: Fix sense buffer bug in SG_IO
Separate FMODE_PREAD and FMODE_PWRITE into separate flags to reflect the
reality that the read and write paths may have independent restrictions.
A git grep verifies that these flags are always cleared together so this
new behavior will only apply to interfaces that change to clear flags
individually.
This is required for "seq_file: properly cope with pread", a post-2.6.25
regression fix.
[akpm@linux-foundation.org: add comment]
Signed-off-by: Paul Turner <pjt@google.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@kernel.org> [2.6.25.x, 2.6.26.x, 2.6.27.x, 2.6.28.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We can't OR shift values, so get rid of BIO_RW_SYNC and use BIO_RW_SYNCIO
and BIO_RW_UNPLUG explicitly. This brings back the behaviour from before
213d9417fe.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
The ioctls for the generic freeze feature are below.
o Freeze the filesystem
int ioctl(int fd, int FIFREEZE, arg)
fd: The file descriptor of the mountpoint
FIFREEZE: request code for the freeze
arg: Ignored
Return value: 0 if the operation succeeds. Otherwise, -1
o Unfreeze the filesystem
int ioctl(int fd, int FITHAW, arg)
fd: The file descriptor of the mountpoint
FITHAW: request code for unfreeze
arg: Ignored
Return value: 0 if the operation succeeds. Otherwise, -1
Error number: If the filesystem has already been unfrozen,
errno is set to EINVAL.
[akpm@linux-foundation.org: fix CONFIG_BLOCK=n]
Signed-off-by: Takashi Sato <t-sato@yk.jp.nec.com>
Signed-off-by: Masayuki Hamaguchi <m-hamaguchi@ys.jp.nec.com>
Cc: <xfs-masters@oss.sgi.com>
Cc: <linux-ext4@vger.kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Alasdair G Kergon <agk@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, ext3 in mainline Linux doesn't have the freeze feature which
suspends write requests. So, we cannot take a backup which keeps the
filesystem's consistency with the storage device's features (snapshot and
replication) while it is mounted.
In many case, a commercial filesystem (e.g. VxFS) has the freeze feature
and it would be used to get the consistent backup.
If Linux's standard filesystem ext3 has the freeze feature, we can do it
without a commercial filesystem.
So I have implemented the ioctls of the freeze feature.
I think we can take the consistent backup with the following steps.
1. Freeze the filesystem with the freeze ioctl.
2. Separate the replication volume or create the snapshot
with the storage device's feature.
3. Unfreeze the filesystem with the unfreeze ioctl.
4. Take the backup from the separated replication volume
or the snapshot.
This patch:
VFS:
Changed the type of write_super_lockfs and unlockfs from "void"
to "int" so that they can return an error.
Rename write_super_lockfs and unlockfs of the super block operation
freeze_fs and unfreeze_fs to avoid a confusion.
ext3, ext4, xfs, gfs2, jfs:
Changed the type of write_super_lockfs and unlockfs from "void"
to "int" so that write_super_lockfs returns an error if needed,
and unlockfs always returns 0.
reiserfs:
Changed the type of write_super_lockfs and unlockfs from "void"
to "int" so that they always return 0 (success) to keep a current behavior.
Signed-off-by: Takashi Sato <t-sato@yk.jp.nec.com>
Signed-off-by: Masayuki Hamaguchi <m-hamaguchi@ys.jp.nec.com>
Cc: <xfs-masters@oss.sgi.com>
Cc: <linux-ext4@vger.kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Alasdair G Kergon <agk@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (57 commits)
jbd2: Fix oops in jbd2_journal_init_inode() on corrupted fs
ext4: Remove "extents" mount option
block: Add Kconfig help which notes that ext4 needs CONFIG_LBD
ext4: Make printk's consistently prefixed with "EXT4-fs: "
ext4: Add sanity checks for the superblock before mounting the filesystem
ext4: Add mount option to set kjournald's I/O priority
jbd2: Submit writes to the journal using WRITE_SYNC
jbd2: Add pid and journal device name to the "kjournald2 starting" message
ext4: Add markers for better debuggability
ext4: Remove code to create the journal inode
ext4: provide function to release metadata pages under memory pressure
ext3: provide function to release metadata pages under memory pressure
add releasepage hooks to block devices which can be used by file systems
ext4: Fix s_dirty_blocks_counter if block allocation failed with nodelalloc
ext4: Init the complete page while building buddy cache
ext4: Don't allow new groups to be added during block allocation
ext4: mark the blocks/inode bitmap beyond end of group as used
ext4: Use new buffer_head flag to check uninit group bitmaps initialization
ext4: Fix the race between read_inode_bitmap() and ext4_new_inode()
ext4: code cleanup
...