* git://oss.sgi.com:8090/xfs-2.6: (43 commits)
[XFS] Remove files from the build that are now unused.
[XFS] Fix a Makefile issue related to exports.o handling.
[XFS] Remove version 1 directory code. Never functioned on Linux, just
[XFS] Map EFSCORRUPTED to an actual error code, not just a made up one
[XFS] Kill direct access to ->count in valusema(); all we ever use it for
[XFS] Remove unneeded conditional code on NFS export interface related
[XFS] Remove an incorrect use of unlikely() on a relatively likely code
[XFS] Push some common code out of write path into core XFS code for
[XFS] Remove unnecessary local from open_exec dmapi path.
[XFS] Minor XFS documentation updates.
[XFS] Fix broken const use inside local suffix_strtoul routine.
[XFS] Fix nused counter. It's currently getting set to -1 rather than
[XFS] Fix mismerge of the fs_writable cleanup patch causing a freeze/thaw
[XFS] Fix up debug code so that bulkstat wont generate thousands of
[XFS] Remove unused parameter from di2xflags routine.
[XFS] Cleanup a missed porting conversion, and freezing.
[XFS] Resolve a namespace collision on remaining vtypes for FreeBSD
[XFS] Resolve a namespace collision on vnode/vnodeops for FreeBSD porters.
[XFS] Resolve a namespace collision on vfs/vfsops for FreeBSD porters.
[XFS] statvfs component of directory/project quota support, code
...
Like the SUBSYTEM= key we find in the environment of the uevent, this
creates a generic "subsystem" link in sysfs for every device. Userspace
usually doesn't care at all if its a "class" or a "bus" device. This
provides an unified way to determine the subsytem of a device, regardless
of the way the driver core has created it.
Signed-off-by: Kay Sievers <kay.sievers@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* git://git.infradead.org/~dwmw2/rbtree-2.6:
[RBTREE] Switch rb_colour() et al to en_US spelling of 'color' for consistency
Update UML kernel/physmem.c to use rb_parent() accessor macro
[RBTREE] Update hrtimers to use rb_parent() accessor macro.
[RBTREE] Add explicit alignment to sizeof(long) for struct rb_node.
[RBTREE] Merge colour and parent fields of struct rb_node.
[RBTREE] Remove dead code in rb_erase()
[RBTREE] Update JFFS2 to use rb_parent() accessor macro.
[RBTREE] Update eventpoll.c to use rb_parent() accessor macro.
[RBTREE] Update key.c to use rb_parent() accessor macro.
[RBTREE] Update ext3 to use rb_parent() accessor macro.
[RBTREE] Change rbtree off-tree marking in I/O schedulers.
[RBTREE] Add accessor macros for colour and parent fields of rb_node
* git://git.infradead.org/mtd-2.6: (199 commits)
[MTD] NAND: Fix breakage all over the place
[PATCH] NAND: fix remaining OOB length calculation
[MTD] NAND Fixup NDFC merge brokeness
[MTD NAND] S3C2410 driver cleanup
[MTD NAND] s3c24x0 board: Fix clock handling, ensure proper initialisation.
[JFFS2] Check CRC32 on dirent and data nodes each time they're read
[JFFS2] When retiring nextblock, allocate a node_ref for the wasted space
[JFFS2] Mark XATTR support as experimental, for now
[JFFS2] Don't trust node headers before the CRC is checked.
[MTD] Restore MTD_ROM and MTD_RAM types
[MTD] assume mtd->writesize is 1 for NOR flashes
[MTD NAND] Fix s3c2410 NAND driver so it at least _looks_ like it compiles
[MTD] Prepare physmap for 64-bit-resources
[JFFS2] Fix more breakage caused by janitorial meddling.
[JFFS2] Remove stray __exit from jffs2_compressors_exit()
[MTD] Allow alternate JFFS2 mount variant for root filesystem.
[MTD] Disconnect struct mtd_info from ABI
[MTD] replace MTD_RAM with MTD_GENERIC_TYPE
[MTD] replace MTD_ROM with MTD_GENERIC_TYPE
[MTD] remove a forgotten MTD_XIP
...
Allow callers to remove watches from their event handler via
inotify_remove_watch_locked(). This functionality can be used to
achieve IN_ONESHOT-like functionality for a subset of events in the
mask.
Signed-off-by: Amy Griffis <amy.griffis@hp.com>
Acked-by: Robert Love <rml@novell.com>
Acked-by: John McCutchan <john@johnmccutchan.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Add inotify_init_watch() so caller can use inotify_watch refcounts
before calling inotify_add_watch().
Add inotify_find_watch() to find an existing watch for an (ih,inode)
pair. This is similar to inotify_find_update_watch(), but does not
update the watch's mask if one is found.
Add inotify_rm_watch() to remove a watch via the watch pointer instead
of the watch descriptor.
Signed-off-by: Amy Griffis <amy.griffis@hp.com>
Acked-by: Robert Love <rml@novell.com>
Acked-by: John McCutchan <john@johnmccutchan.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
When an inotify event includes a dentry name, also include the inode
associated with that name.
Signed-off-by: Amy Griffis <amy.griffis@hp.com>
Acked-by: Robert Love <rml@novell.com>
Acked-by: John McCutchan <john@johnmccutchan.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The following series of patches introduces a kernel API for inotify,
making it possible for kernel modules to benefit from inotify's
mechanism for watching inodes. With these patches, inotify will
maintain for each caller a list of watches (via an embedded struct
inotify_watch), where each inotify_watch is associated with a
corresponding struct inode. The caller registers an event handler and
specifies for which filesystem events their event handler should be
called per inotify_watch.
Signed-off-by: Amy Griffis <amy.griffis@hp.com>
Acked-by: Robert Love <rml@novell.com>
Acked-by: John McCutchan <john@johnmccutchan.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
(990). Turns out some ye-olde unices used EUCLEAN as
Filesystem-needs-cleaning, so now we use that too.
SGI-PV: 953954
SGI-Modid: xfs-linux-melb:xfs-kern:26286a
Signed-off-by: Nathan Scott <nathans@sgi.com>
is check if semaphore is actually locked, which can be trivially done in
portable way. Code gets more reabable, while we are at it...
SGI-PV: 953915
SGI-Modid: xfs-linux-melb:xfs-kern:26274a
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Nathan Scott <nathans@sgi.com>
Also, make sure dirents are marked REF_UNCHECKED when we 'discover' them
through eraseblock summary.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Failing to do so makes the calculated length of the last node incorrect,
when we're not using eraseblock summaries.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Especially when summary code is used, we can have in-memory data
structures referencing certain nodes without them actually being readable
on the flash. Discard the nodes gracefully in that case, rather than
triggering a BUG().
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
If get_user_pages() returns less pages than what we asked for, we jump
to out_unmap which will return ERR_PTR(ret). But ret can contain a
positive number just smaller than local_nr_pages, so be sure to set it
to -EFAULT always.
Problem found and diagnosed by Damien Le Moal <damien@sdl.hitachi.co.jp>
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If flock_lock_file() failed to allocate flock with locks_alloc_lock()
then "error = 0" is returned. Need to return some non-zero.
Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Signed-off-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
jffs2_zlib_exit() and free_workspaces() shouldn't be marked __exit because
they get called in the error case from the init functions.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Currently it is possible for a task to remove its locks at the same time as
the NLM recovery thread is trying to recover them. This quickly leads to an
Oops.
Protect the locks using an rw semaphore while they are being recovered.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
As fs/nfs/inode.c is rather large, heterogenous and unwieldy, the attached
patch splits it up into a number of files:
(*) fs/nfs/inode.c
Strictly inode specific functions.
(*) fs/nfs/super.c
Superblock management functions for NFS and NFS4, normal access, clones
and referrals. The NFS4 superblock functions _could_ move out into a
separate conditionally compiled file, but it's probably not worth it as
there're so many common bits.
(*) fs/nfs/namespace.c
Some namespace-specific functions have been moved here.
(*) fs/nfs/nfs4namespace.c
NFS4-specific namespace functions (this could be merged into the previous
file). This file is conditionally compiled.
(*) fs/nfs/internal.h
Inter-file declarations, plus a few simple utility functions moved from
fs/nfs/inode.c.
Additionally, all the in-.c-file externs have been moved here, and those
files they were moved from now includes this file.
For the most part, the functions have not been changed, only some multiplexor
functions have changed significantly.
I've also:
(*) Added some extra banner comments above some functions.
(*) Rearranged the function order within the files to be more logical and
better grouped (IMO), though someone may prefer a different order.
(*) Reduced the number of #ifdefs in .c files.
(*) Added missing __init and __exit directives.
Signed-Off-By: David Howells <dhowells@redhat.com>
Respond to a moved error on NFS lookup by setting up the referral.
Note: We don't actually follow the referral during lookup/getattr, but
later when we detect fsid mismatch in inode revalidation (similar to the
processing done for cloning submounts). Referrals will have fake attributes
until they are actually followed or traversed.
Signed-off-by: Manoj Naik <manoj@almaden.ibm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Set up mountpoint when hitting a referral on moved error by getting
fs_locations.
Signed-off-by: Manoj Naik <manoj@almaden.ibm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Move existing code into a separate function so that it can be also used by
referral code.
Signed-off-by: Manoj Naik <manoj@almaden.ibm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This is (similar to getattr bitmap) but includes fs_locations and
mounted_on_fileid attributes. Use this bitmap for encoding in fs_locations
requests.
Note: We can probably do better by requesting locations as part of fsinfo
itself.
Signed-off-by: Manoj Naik <manoj@almaden.ibm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Per referral draft, only fs_locations, fsid, and mounted_on_fileid can be
requested in a GETATTR on referrals.
Signed-off-by: Manoj Naik <manoj@almaden.ibm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
It is ignored if fileid is also requested. This will be used on referrals
(fs_locations).
Signed-off-by: Manoj Naik <manoj@almaden.ibm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Use component4-style formats for decoding list of servers and pathnames in
fs_locations.
Signed-off-by: Manoj Naik <manoj@almaden.ibm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
NFSv4 allows for the fact that filesystems may be replicated across
several servers or that they may be migrated to a backup server in case of
failure of the primary server.
fs_locations is an NFSv4 operation for retrieving information about the
location of migrated and/or replicated filesystems.
Based on an initial implementation by Jiaying Zhang <jiayingz@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Make automounted partitions expire using the mark_mounts_for_expiry()
function. The timeout is controlled via a sysctl.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This should enable us to detect if we are crossing a mountpoint in the
case where the server is exporting "nohide" mounts.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Allow filesystems to decide to perform pre-umount processing whether or not
MNT_FORCE is set.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Allow a submount to be marked as being 'shrinkable' by means of the
vfsmount->mnt_flags, and then add a function 'shrink_submounts()' which
attempts to recursively unmount these submounts.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Replace all module uses with the new vfs_kern_mount() interface, and fix up
simple_pin_fs().
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
do_kern_mount() does not allow the kernel to use private mount interfaces
without exposing the same interfaces to userland. The problem is that the
filesystem is referenced by name, thus meaning that it and its mount
interface must be registered in the global filesystem list.
vfs_kern_mount() passes the struct file_system_type as an explicit
parameter in order to overcome this limitation.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Now that we have a real nfs_invalidate_page() to ensure that
truncate_inode_pages() does the right thing when there are pending dirty
pages, we can get rid of nfs_delete_inode().
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
In the case of a call to truncate_inode_pages(), we should really try to
cancel any pending writes on the page.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
We just set *acl_len to zero, and attrlen is unsigned, so this comparison
is clearly bogus. I have no idea what I was thinking.
Fixes a bug that caused getacl to fail over krb5p.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Fix two errors in the client-side acl cache: First, when nfs3_proc_getacl
requests only the default acl of a file and the access acl is not cached
already, a NULL access acl entry is cached instead of ERR_PTR(-EAGAIN)
("not cached").
Second, update the cached acls in nfs3_proc_setacls: nfs_refresh_inode does
not always invalidate the cached acls, and when it does not, the cached acls
get out of sync.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Currently, we are accounting for all calls to nfs_revalidate_inode(), but not
to nfs_revalidate_mapping(), or nfs_lookup_verify_inode(), etc...
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Separate out the function of revalidating the inode metadata, and
revalidating the mapping. The former may be called by lookup(),
and only really needs to check that permissions, ctime, etc haven't changed
whereas the latter needs only done when we want to read data from the page
cache, and may need to sync and then invalidate the mapping.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Whenever the directory changes, we want to make sure that we always
invalidate its page cache. Fix up update_changeattr() and
nfs_mark_for_revalidate() so that they do so.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Fix up a bug in the handling of NFS_INO_REVAL_PAGECACHE: make sure that
nfs_update_inode() clears it when we're sure we're not racing with other
updates.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean up use of page_array, and fix an off-by-one error noticed by Tom
Talpey which causes kmalloc calls in cases where using the page_array
is sufficient.
Test plan:
Normal client functional testing with r/wsize=32768.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The Linux NFSv4 server violates RFC3530 in that the change attribute is not
guaranteed to be updated for every change to the inode. Our optimisation
for checking whether or not the inode metadata has changed or not is broken
too. Grr....
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The code that is supposed to zero the uninitialised partial pages when the
server returns a short read is currently broken: it looks at the nfs_page
wb_pgbase and wb_bytes fields instead of the equivalent nfs_read_data
values when deciding where to start truncating the page.
Also ensure that we are more careful about setting PG_uptodate
before retrying a short read: the retry will change the nfs_read_data
args.pgbase and args.count.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
getting decremented by 1. Since nused never reaches 0, the "if
(!free->hdr.nused)" check in xfs_dir2_leafn_remove() fails every time and
xfs_dir2_shrink_inode() doesn't get called when it should. This causes
extra blocks to be left on an empty directory and the directory in unable
to be converted back to inline extent mode.
SGI-PV: 951958
SGI-Modid: xfs-linux-melb:xfs-kern:211382a
Signed-off-by: Mandy Kirkconnell <alkirkco@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
truncate down followed by delayed allocation (buffered writes) - worst
case scenario for the notorious NULL files problem. This reduces the
window where we are exposed to that problem significantly.
SGI-PV: 917976
SGI-Modid: xfs-linux-melb:xfs-kern:26100a
Signed-off-by: Nathan Scott <nathans@sgi.com>
init_rwsem() has no return value. This is not a problem if init_rwsem()
is a function, but it's a problem if it's a do { ... } while (0) macro.
(which lockdep introduces)
SGI-PV: 904196
SGI-Modid: xfs-linux-melb:xfs-kern:26082a
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Nathan Scott <nathans@sgi.com>
logged version of di_next_unlinked which is actually always stored in the
correct ondisk format. This was pointed out to us by Shailendra Tripathi.
And is evident in the xfs qa test of 121.
SGI-PV: 953263
SGI-Modid: xfs-linux-melb:xfs-kern:26044a
Signed-off-by: Tim Shimmin <tes@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
transaction completion from marking the inode dirty while it is being
cleaned up on it's way out of the system.
SGI-PV: 952967
SGI-Modid: xfs-linux-melb:xfs-kern:26040a
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
64bit kernels allow recovery to handle both versions and do the necessary
decoding
SGI-PV: 952214
SGI-Modid: xfs-linux-melb:xfs-kern:26011a
Signed-off-by: Tim Shimmin <tes@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
transaction within each such operation may involve multiple locking of AGF
buffer. While the freeing extent function has sorted the extents based on
AGF number before entering into transaction, however, when the file system
space is very limited, the allocation of space would try every AGF to get
space allocated, this could potentially cause out-of-order locking, thus
deadlock could happen. This fix mitigates the scarce space for allocation
by setting aside a few blocks without reservation, and avoid deadlock by
maintaining ascending order of AGF locking.
SGI-PV: 947395
SGI-Modid: xfs-linux-melb:xfs-kern:210801a
Signed-off-by: Yingping Lu <yingping@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
ATTR_NOLOCK flag, but this was split off some time ago, as ATTR_DMI needed
to be used separately. Two asserts were added to guard correctness of the
code during the transition. These are no longer required.
SGI-PV: 952145
SGI-Modid: xfs-linux-melb:xfs-kern:209633a
Signed-off-by: Olaf Weber <olaf@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
the range spanned by modifications to the in-core extent map. Add
XFS_BUNMAPI() and XFS_SWAP_EXTENTS() macros that call xfs_bunmapi() and
xfs_swap_extents() via the ioops vector. Change all calls that may modify
the in-core extent map for the data fork to go through the ioops vector.
This allows a cache of extent map data to be kept in sync.
SGI-PV: 947615
SGI-Modid: xfs-linux-melb:xfs-kern:209226a
Signed-off-by: Olaf Weber <olaf@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
Looking at the reiser4 crash, I found a leak in debugfs. In
debugfs_mknod(), we create the inode before checking if the dentry
already has one attached. We don't free it if that is the case.
These bugs happen quite often, I'm starting to think we should disallow
such coding in CodingStyle.
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
From: Trond Myklebust <Trond.Myklebust@netapp.com>
We're presently running lock_kernel() under fs_lock via nfs's ->permission
handler. That's a ranking bug and sometimes a sleep-in-spinlock bug. This
problem was introduced in the openat() patchset.
We should not need to hold the current->fs->lock for a codepath that doesn't
use current->fs.
[vsu@altlinux.ru: fix error path]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Al Viro <viro@ftp.linux.org.uk>
Signed-off-by: Sergey Vlasov <vsu@altlinux.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It's used from the initfunc in case of failure too. We could actually do
with an '__initexit' for this kind of thing -- when built in to the
kernel, it could do with being dropped with the init text. We _could_
actually just use __init for it, but that would break if/when we start
dropping init text from modules. So let's just leave it as it was for now,
and mutter a little more about random 'janitorial' fixes from people who
aren't paying attention to what they're doing.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
From: Andrew Morton <akpm@osdl.org>
Spotted by Jan Capek <jca@sysgo.com>
Cc: "Stephen C. Tweedie" <sct@redhat.com>
Cc: Andreas Dilger <adilger@clusterfs.com>
Cc: Jan Capek <jca@sysgo.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Wasn't able to reproduce a hard hang, but was able to get an oops if
suspended the machine during a copy to the cifs mount. This led to some
things hanging, including a "sync". Also got I/O errors when trying to
access the mount afterwards (even when didn't see the oops), and had
to unmount and remount in order to access the filesystem.
This patch fixed the oops.
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Return -EUCLEAN on read when a bitflip was detected and corrected, so the
clients can react and eventually copy the affected block to a spare one.
Make all in kernel users aware of the change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Hopefully the last iteration on this!
The handling of out of band data on NAND was accompanied by tons of fruitless
discussions and halfarsed patches to make it work for a particular
problem. Sufficiently annoyed by I all those "I know it better" mails and the
resonable amount of discarded "it solves my problem" patches, I finally decided
to go for the big rework. After removing the _ecc variants of mtd read/write
functions the solution to satisfy the various requirements was to refactor the
read/write _oob functions in mtd.
The major change is that read/write_oob now takes a pointer to an operation
descriptor structure "struct mtd_oob_ops".instead of having a function with at
least seven arguments.
read/write_oob which should probably renamed to a more descriptive name, can do
the following tasks:
- read/write out of band data
- read/write data content and out of band data
- read/write raw data content and out of band data (ecc disabled)
struct mtd_oob_ops has a mode field, which determines the oob handling mode.
Aside of the MTD_OOB_RAW mode, which is intended to be especially for
diagnostic purposes and some internal functions e.g. bad block table creation,
the other two modes are for mtd clients:
MTD_OOB_PLACE puts/gets the given oob data exactly to/from the place which is
described by the ooboffs and ooblen fields of the mtd_oob_ops strcuture. It's
up to the caller to make sure that the byte positions are not used by the ECC
placement algorithms.
MTD_OOB_AUTO puts/gets the given oob data automaticaly to/from the places in
the out of band area which are described by the oobfree tuples in the ecclayout
data structre which is associated to the devicee.
The decision whether data plus oob or oob only handling is done depends on the
setting of the datbuf member of the data structure. When datbuf == NULL then
the internal read/write_oob functions are selected, otherwise the read/write
data routines are invoked.
Tested on a few platforms with all variants. Please be aware of possible
regressions for your particular device / application scenario
Disclaimer: Any whining will be ignored from those who just contributed "hot
air blurb" and never sat down to tackle the underlying problem of the mess in
the NAND driver grown over time and the big chunk of work to fix up the
existing users. The problem was not the holiness of the existing MTD
interfaces. The problems was the lack of time to go for the big overhaul. It's
easy to add more mess to the existing one, but it takes alot of effort to go
for a real solution.
Improvements and bugfixes are welcome!
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Most of those macros are unused and the used ones just obfuscate
the code. Remove them and fixup all users.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The nand_oobinfo structure is not fitting the newer error correction
demands anymore. Replace it by struct nand_ecclayout and fixup the users
all over the place. Keep the nand_oobinfo based ioctl for user space
compability reasons.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The info structure for out of band data was copied into
the mtd structure. Make it a pointer and remove the ability
to set it from userspace. The position of ecc bytes is
defined by the hardware and should not be changed by software.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This allows us to drop another pointer from the struct jffs2_raw_node_ref,
shrinking it to 8 bytes on 32-bit machines (if the TEST_TOTLEN) paranoia
check is turned off, which will be committed soon).
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Print wasted_size in scanned eraseblocks, print range correctly for
summary dirent and inode entries.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Preallocation of refs is shortly going to be a per-eraseblock thing,
rather than per-filesystem. Add the required argument to the function.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
One more place where we were changing the accounting info without
actually allocating a ref for the lost space...
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Random unthinking 'cleanup' caused debug messages like this:
Obsoleting node at 0x0006daf4 of len 0x3a4: <7>Dirtying
If messages are continuation of an existing line, they don't need
to be prefixed with KERN_DEBUG.
THINK. Or you will be replaced by a small shell script.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
It looks like metapage_releasepage was making in invalid assumption that
the releasepage method would not be called on a dirty page. Instead of
issuing a warning and releasing the metapage, it should return 0, indicating
that the private data for the page cannot be released.
I also realized that metapage_releasepage had the return code all wrong. If
it is successful in releasing the private data, it should return 1, otherwise
it needs to return 0.
Lastly, there is no need to call wait_on_page_writeback, since
try_to_release_page will not call us with a page in writback state.
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
As the first step towards eliminating the ref->next_phys member and saving
memory by using an _array_ of struct jffs2_raw_node_ref per eraseblock,
stop the write functions from allocating their own refs; have them just
_reserve_ the appropriate number instead. Then jffs2_link_node_ref() can
just fill them in.
Use a linked list of pre-allocated refs in the superblock, for now. Once
we switch to an array, it'll just be a case of extending that array.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Else a subsequent bio_clone might make a mess.
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: "Don Dupuis" <dondster@gmail.com>
Acked-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Both cause the 'entries' count in the export cache to be non-zero at module
removal time, so unregistering that cache fails and results in an oops.
1/ exp_pseudoroot (used for NFSv4 only) leaks a reference to an export
entry.
2/ sunrpc_cache_update doesn't increment the entries count when it adds
an entry.
Thanks to "david m. richter" <richterd@citi.umich.edu> for triggering the
problem and finding one of the bugs.
Cc: "david m. richter" <richterd@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
MTD clients are agnostic of FLASH which needs ECC suppport.
Remove the functions and fixup the callers.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The writev based write buffer implementation was far to complex as
in most use cases the write buffer had to be handled anyway.
Simplify the write buffer handling and use mtd->write instead.
From extensive testing no performance impact has been noted.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We don't need the upper layers to deal with the physical offset. It's
_always_ c->nextblock->offset + c->sector_size - c->nextblock->free_size
so we might as well just let the actual write functions deal with that.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
o Add a flag MTD_BIT_WRITEABLE for devices that allow single bits to be
cleared.
o Replace MTD_PROGRAM_REGIONS with a cleared MTD_BIT_WRITEABLE flag for
STMicro and Intel Sibley flashes with internal ECC. Those flashes
disallow clearing of single bits, unlike regular NOR flashes, so the
new flag models their behaviour better.
o Remove MTD_ECC. After the STMicro/Sibley merge, this flag is only set
and never checked.
Signed-off-by: Joern Engel <joern@wh.fh-wedel.de>
In 2002, STMicro started producing NOR flashes with internal ECC protection
for small blocks (8 or 16 bytes). Support for those flashes was added by me.
In 2005, Intel Sibley flashes copied this strategy and Nico added support for
those. Merge the code for both.
Signed-off-by: Joern Engel <joern@wh.fh-wedel.de>
At least two flashes exists that have the concept of a minimum write unit,
similar to NAND pages, but no other NAND characteristics. Therefore, rename
the minimum write unit to "writesize" for all flashes, including NAND.
Signed-off-by: Joern Engel <joern@wh.fh-wedel.de>
We'll be using a proper list of nodes in the jffs2_xattr_datum and
jffs2_xattr_ref structures, because the existing code to overwrite
them is just broken. Put it in the common part at the front of the
structure which is shared with the jffs2_inode_cache, so that the
jffs2_link_node_ref() function can do the right thing.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
In a couple of places, we assume that what's at the end of the
->next_in_ino list is a struct jffs2_inode_cache. Let's check
for that, since we expect it to change soon.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Let's avoid the potential for forgetting to set ref->next_in_ino, by doing
it within jffs2_link_node_ref() instead.
This highlights the ugliness of what we're currently doing with
xattr_datum and xattr_ref structures -- we should find a nicer way of
dealing with that.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
When filing REF_OBSOLETE nodes, we'd add their size to the global
'dirty_size' count, but then to the eraseblock's 'used_size' count.
That's not clever.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>