aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2008-02-14One less parameter to __d_pathJan Blunck
All callers to __d_path pass the dentry and vfsmount of a struct path to __d_path. Pass the struct path directly, instead. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Jan Blunck <jblunck@suse.de> Acked-by: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-14Make set_fs_{root,pwd} take a struct pathJan Blunck
In nearly all cases the set_fs_{root,pwd}() calls work on a struct path. Change the function to reflect this and use path_get() here. Signed-off-by: Jan Blunck <jblunck@suse.de> Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-14Use struct path in fs_structJan Blunck
* Use struct path in fs_struct. Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Jan Blunck <jblunck@suse.de> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-14Introduce path_get()Jan Blunck
This introduces the symmetric function to path_put() for getting a reference to the dentry and vfsmount of a struct path in the right order. Signed-off-by: Jan Blunck <jblunck@suse.de> Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-14Use path_put() in a few places instead of {mnt,d}put()Jan Blunck
Use path_put() in a few places instead of {mnt,d}put() Signed-off-by: Jan Blunck <jblunck@suse.de> Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Acked-by: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-14Introduce path_put()Jan Blunck
* Add path_put() functions for releasing a reference to the dentry and vfsmount of a struct path in the right order * Switch from path_release(nd) to path_put(&nd->path) * Rename dput_path() to path_put_conditional() [akpm@linux-foundation.org: fix cifs] Signed-off-by: Jan Blunck <jblunck@suse.de> Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Acked-by: Christoph Hellwig <hch@lst.de> Cc: <linux-fsdevel@vger.kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Steven French <sfrench@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-14Embed a struct path into struct nameidata instead of nd->{dentry,mnt}Jan Blunck
This is the central patch of a cleanup series. In most cases there is no good reason why someone would want to use a dentry for itself. This series reflects that fact and embeds a struct path into nameidata. Together with the other patches of this series - it enforced the correct order of getting/releasing the reference count on <dentry,vfsmount> pairs - it prepares the VFS for stacking support since it is essential to have a struct path in every place where the stack can be traversed - it reduces the overall code size: without patch series: text data bss dec hex filename 5321639 858418 715768 6895825 6938d1 vmlinux with patch series: text data bss dec hex filename 5320026 858418 715768 6894212 693284 vmlinux This patch: Switch from nd->{dentry,mnt} to nd->path.{dentry,mnt} everywhere. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: fix cifs] [akpm@linux-foundation.org: fix smack] Signed-off-by: Jan Blunck <jblunck@suse.de> Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Acked-by: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-14Remove path_release_on_umount()Jan Blunck
path_release_on_umount() should only be called from sys_umount(). I merged the function into sys_umount() instead of having in in namei.c. Signed-off-by: Jan Blunck <jblunck@suse.de> Acked-by: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-14FLAT binaries: drop BINFMT_FLAT bad header magic warningMike Frysinger
The warning issued by fs/binfmt_flat.c when the format handler is given a non-FLAT and non-script executable is annoying to say the least when working with FDPIC ELF objects. If you build a kernel that supports both FLAT and FDPIC ELFs on no-mmu, every time you execute an FDPIC ELF, the kernel spits out this message. While I understand a lot of newcomers to the no-mmu world screw up generation of FLAT binaries, this warning is not usable for systems that support more than just FLAT. Signed-off-by: Jie Zhang <jie.zhang@analog.com> Signed-off-by: Mike Frysinger <vapier@gentoo.org> Cc: Bernd Schmidt <bernds_cb1@t-online.de> Acked-by: Greg Ungerer <gerg@snapgear.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-14inotify: make variables static in inotify_user.cHarvey Harrison
inotify_max_user_instances, inotify_max_user_watches, inotify_max_queued_events can all be made static. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-13NFS: add missing spkm3 strings to mount option parserOlga Kornievskaia
This patch adds previous missing spkm3 string values that are needed to parse mount options in the kernel.
2008-02-13NFS: remove error field from nfs_readdir_descriptor_tJeff Layton
The error field in nfs_readdir_descriptor_t is never used outside of the function in which it is set. Remove the field and change the place that does use it to use an existing local variable. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-02-13NFS: missing spaces in KERN_WARNINGDan Muntz
The warning message for a v4 server returning various bad sequence-ids is missing spaces. Signed-off-by: Dan Muntz <dmuntz@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-02-13NFS: Allow text-based mounts via compat_sys_mountChuck Lever
The compat_sys_mount() system call throws EINVAL for text-based NFSv4 mounts. The text-based mount interface assumes that any mount option blob that doesn't set the version field to "1" is a C string (ie not a legacy mount request). The compat_sys_mount() call treats blobs that don't set the version field to "1" as an error. We just relax the check in compat_sys_mount() a bit to allow C strings to be passed down to the NFSv4 client. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-02-13NFS: fix reference counting for NFSv4 callback threadJeff Layton
The reference counting for the NFSv4 callback thread stays artificially high. When this thread comes down, it doesn't properly tear down the svc_serv, causing a memory leak. In my testing on an older kernel on x86_64, memory would leak out of the 8k kmalloc slab. So, we're leaking at least a page of memory every time the thread comes down. svc_create() creates the svc_serv with a sv_nrthreads count of 1, and then svc_create_thread() increments that count. Whenever the callback thread is started it has a sv_nrthreads count of 2. When coming down, it calls svc_exit_thread() which decrements that count and if it hits 0, it tears everything down. That never happens here since the count is always at 2 when the thread exits. The problem is that nfs_callback_up() should be calling svc_destroy() on the svc_serv on both success and failure. This is how lockd_up_proto() handles the reference counting, and doing that here fixes the leak. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-02-13udf: fix udf_add_free_spaceMarcin Slusarz
In commit 742ba02a51c8d0bf5446b154531179760c1ed0a2 (udf: create common function for changing free space counter) by accident I reversed safety condition which lead to null pointer dereference in case of media error and wrong counting of free space in normal situation Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Cc: Jan Kara <jack@suse.cz> Acked-by: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-13udf: fix directory offset handlingJan Kara
Patch cleaning up UDF directory offset handling missed modifications in dir.c (because I've submitted an old version :(). Fix it. Signed-off-by: Jan Kara <jack@suse.cz> Reported-by: Marcin Slusarz <marcin.slusarz@gmail.com> Tested-by: Marcin Slusarz <marcin.slusarz@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-13fs/smbfs/inode.c: fix warning message deprecating smbfsSergio Luis
Fix the warning message regarding smbfs to "smbfs is deprecated and will be removed from the 2.6.27 kernel. Please migrate to cifs" instead of "smbfs is deprecated and will be removedfrom the 2.6.27 kernel. Please migrate to cifs" Signed-off-by: Sergio Luis <sergio@uece.br> Screwed-up-by: Andrew Morton <akpm@linux-foundation.org> Cc: Steven French <sfrench@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-13xfs: convert beX_add to beX_add_cpu (new common API)Marcin Slusarz
remove beX_add functions and replace all uses with beX_add_cpu Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Cc: Mark Fasheh <mark.fasheh@oracle.com> Reviewed-by: Dave Chinner <dgc@sgi.com> Cc: Timothy Shimmin <tes@sgi.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-13kernel-doc: fix fs/pipe.c notationRandy Dunlap
Fix several kernel-doc notation errors in fs/pipe.c. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-11Merge branch 'for-linus' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
* 'for-linus' of git://linux-nfs.org/~bfields/linux: SUNPRC: Fix printk format warning nfsd: clean up svc_reserve_auth() NLM: don't requeue block if it was invalidated while GRANT_MSG was in flight NLM: don't reattempt GRANT_MSG when there is already an RPC in flight NLM: have server-side RPC clients default to soft RPC tasks NLM: set RPC_CLNT_CREATE_NOPING for NLM RPC clients
2008-02-10NLM: don't requeue block if it was invalidated while GRANT_MSG was in flightJeff Layton
It's possible for lockd to catch a SIGKILL while a GRANT_MSG callback is in flight. If this happens we don't want lockd to insert the block back into the nlm_blocked list. This helps that situation, but there's still a possible race. Fixing that will mean adding real locking for nlm_blocked. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-10NLM: don't reattempt GRANT_MSG when there is already an RPC in flightJeff Layton
With the current scheme in nlmsvc_grant_blocked, we can end up with more than one GRANT_MSG callback for a block in flight. Right now, we requeue the block unconditionally so that a GRANT_MSG callback is done again in 30s. If the client is unresponsive, it can take more than 30s for the call already in flight to time out. There's no benefit to having more than one GRANT_MSG RPC queued up at a time, so put it on the list with a timeout of NLM_NEVER before doing the RPC call. If the RPC call submission fails, we requeue it with a short timeout. If it works, then nlmsvc_grant_callback will end up requeueing it with a shorter timeout after it completes. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-10NLM: have server-side RPC clients default to soft RPC tasksJeff Layton
Now that it no longer does an RPC ping, lockd always ends up queueing an RPC task for the GRANT_MSG callback. But, it also requeues the block for later attempts. Since these are hard RPC tasks, if the client we're calling back goes unresponsive the GRANT_MSG callbacks can stack up in the RPC queue. Fix this by making server-side RPC clients default to soft RPC tasks. lockd requeues the block anyway, so this should be OK. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-10NLM: set RPC_CLNT_CREATE_NOPING for NLM RPC clientsJeff Layton
It's currently possible for an unresponsive NLM client to completely lock up a server's lockd. The scenario is something like this: 1) client1 (or a process on the server) takes a lock on a file 2) client2 tries to take a blocking lock on the same file and awaits the callback 3) client2 goes unresponsive (plug pulled, network partition, etc) 4) client1 releases the lock ...at that point the server's lockd will try to queue up a GRANT_MSG callback for client2, but first it requeues the block with a timeout of 30s. nlm_async_call will attempt to bind the RPC client to client2 and will call rpc_ping. rpc_ping entails a sync RPC call and if client2 is unresponsive it will take around 60s for that to time out. Once it times out, it's already time to retry the block and the whole process repeats. Once in this situation, nlmsvc_retry_blocked will never return until the host starts responding again. lockd won't service new calls. Fix this by skipping the RPC ping on NLM RPC clients. This makes nlm_async_call return quickly when called. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-10splice: fix user pointer access in get_iovec_page_array()Bastian Blank
Commit 8811930dc74a503415b35c4a79d14fb0b408a361 ("splice: missing user pointer access verification") added the proper access_ok() calls to copy_from_user_mmap_sem() which ensures we can copy the struct iovecs from userspace to the kernel. But we also must check whether we can access the actual memory region pointed to by the struct iovec to fix the access checks properly. Signed-off-by: Bastian Blank <waldi@debian.org> Acked-by: Oliver Pinter <oliver.pntr@gmail.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-10ext4: Add new "development flag" to the ext4 filesystemTheodore Tso
This flag is simply a generic "this is a crash/burn test filesystem" marker. If it is set, then filesystem code which is "in development" will be allowed to mount the filesystem. Filesystem code which is not considered ready for prime-time will check for this flag, and if it is not set, it will refuse to touch the filesystem. As we start rolling ext4 out to distro's like Fedora, et. al, this makes it less likely that a user might accidentally start using ext4 on a production filesystem; a bad thing, since that will essentially make it be unfsckable until e2fsprogs catches up. Signed-off-by: Theodore Tso <tytso@MIT.EDU> Signed-off-by: Mingming Cao <cmm@us.ibm.com>
2008-02-10ext4: Don't panic in case of corrupt bitmapAneesh Kumar K.V
Multiblock allocator calls BUG_ON in many case if the free and used blocks count obtained looking at the bitmap is different from what the allocator internally accounted for. Use ext4_error in such case and don't panic the system. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-02-10ext4: allocate struct ext4_allocation_context from a kmem cacheEric Sandeen
struct ext4_allocation_context is rather large, and this bloats the stack of many functions which use it. Allocating it from a named slab cache will alleviate this. For example, with this change (on top of the noinline patch sent earlier): -ext4_mb_new_blocks 200 +ext4_mb_new_blocks 40 -ext4_mb_free_blocks 344 +ext4_mb_free_blocks 168 -ext4_mb_release_inode_pa 216 +ext4_mb_release_inode_pa 40 -ext4_mb_release_group_pa 192 +ext4_mb_release_group_pa 24 Most of these stack-allocated structs are actually used only for mballoc history; and in those cases often a smaller struct would do. So changing that may be another way around it, at least for those functions, if preferred. For now, in those cases where the ac is only for history, an allocation failure simply skips the history recording, and does not cause any other failures. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-02-10JBD2: Clear buffer_ordered flag for barried IO request on successDave Kleikamp
In JBD2 jbd2_journal_write_commit_record(), clear the buffer_ordered flag for the bh after barried IO has succeed. This prevents later, if the same buffer head were submitted to the underlying device, which has been reconfigured to not support barrier request, the JBD2 commit code could treat it as a normal IO (without barrier). This is a port from JBD/ext3 fix from Neil Brown. More details from Neil: Some devices - notably dm and md - can change their behaviour in response to BIO_RW_BARRIER requests. They might start out accepting such requests but on reconfiguration, they find out that they cannot any more. JBD2 deal with this by always testing if BIO_RW_BARRIER requests fail with EOPNOTSUPP, and retrying the write requests without the barrier (probably after waiting for any pending writes to complete). However there is a bug in the handling this in JBD2 for ext4 . When ext4/JBD2 to submit a BIO_RW_BARRIER request, it sets the buffer_ordered flag on the buffer head. If the request completes successfully, the flag STAYS SET. Other code might then write the same buffer_head after the device has been reconfigured to not accept barriers. This write will then fail, but the "other code" is not ready to handle EOPNOTSUPP errors and the error will be treated as fatal. Cc: Neil Brown <neilb@suse.de> Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-02-10ext4: Fix Direct I/O lockingJan Kara
We cannot start transaction in ext4_direct_IO() and just let it last during the whole write because dio_get_page() acquires mmap_sem which ranks above transaction start (e.g. because we have dependency chain mmap_sem->PageLock->journal_start, or because we update atime while holding mmap_sem) and thus deadlocks could happen. We solve the problem by starting a transaction separately for each ext4_get_block() call. We *could* have a problem that we allocate a block and before its data are written out the machine crashes and thus we expose stale data. But that does not happen because for hole-filling generic code falls back to buffered writes and for file extension, we add inode to orphan list and thus in case of crash, journal replay will truncate inode back to the original size. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-02-10ext4: Fix circular locking dependency with migrate and rm.Aneesh Kumar K.V
In order to prevent a circular locking dependency when an unlink operation is racing with an ext4 migration, we delay taking i_data_sem until just before switch the inode format, and use i_mutex to prevent writes and truncates during the first part of the migration operation. Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-02-07[XFS] add __init/__exit mark to specific init/cleanup functionsLachlan McIlroy
SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:30459a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Denis Cheng <crquan@gmail.com>
2008-02-07[XFS] Fix oops in xfs_file_readdir()David Chinner
When xfs_file_readdir() exactly fills a buffer, it can move it's index past the end of the buffer and dereference it even though the result of the dereference is never used. On some platforms this causes an oops. SGI-PV: 976923 SGI-Modid: xfs-linux-melb:xfs-kern:30458a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-07[XFS] kill xfs_rootChristoph Hellwig
The only caller (xfs_fs_fill_super) can simplify call igrab on the root inode. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:30393a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-07[XFS] keep i_nlink updated and use proper accessorsChristoph Hellwig
To get the read-only bind mounts in -mm to work correctly with XFS we need to call the drop_nlink and inc_nlink helpers to monitor the link count. Add calls to these to xfs_bumplink and xfs_droplink and stop copying over di_nlink to i_nlink in xfs_validate_fields and vn_revalidate. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:30392a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-07[XFS] stop updating inode->i_blocksChristoph Hellwig
The VFS doesn't use i_blocks, it's only used by generic_fillattr and the generic quota code which XFS doesn't use. In XFS there is one use to check whether we have an inline or out of line sumlink, but we can replace that with a check of the XFS_IFINLINE inode flag. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:30391a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-07[XFS] Make xfs_ail_check check less by defaultDavid Chinner
Checking the entire AIL on every insert and remove is prohibitively expensive - the sustained sequntial create rate on a single disk drops from about 1800/s to 60/s because of this checking resulting in the xfslogd becoming cpu bound. By default on debug builds, only check the next and previous entries in the list to ensure they are ordered correctly. If you really want, define XFS_TRANS_DEBUG to use the old behaviour. SGI-PV: 972759 SGI-Modid: xfs-linux-melb:xfs-kern:30372a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-07[XFS] Move AIL pushing into it's own threadDavid Chinner
When many hundreds to thousands of threads all try to do simultaneous transactions and the log is in a tail-pushing situation (i.e. full), we can get multiple threads walking the AIL list and contending on the AIL lock. The AIL push is, in effect, a simple I/O dispatch algorithm complicated by the ordering constraints placed on it by the transaction subsystem. It really does not need multiple threads to push on it - even when only a single CPU is pushing the AIL, it can push the I/O out far faster that pretty much any disk subsystem can handle. So, to avoid contention problems stemming from multiple list walkers, move the list walk off into another thread and simply provide a "target" to push to. When a thread requires a push, it sets the target and wakes the push thread, then goes to sleep waiting for the required amount of space to become available in the log. This mechanism should also be a lot fairer under heavy load as the waiters will queue in arrival order, rather than queuing in "who completed a push first" order. Also, by moving the pushing to a separate thread we can do more effectively overload detection and prevention as we can keep context from loop iteration to loop iteration. That is, we can push only part of the list each loop and not have to loop back to the start of the list every time we run. This should also help by reducing the number of items we try to lock and/or push items that we cannot move. Note that this patch is not intended to solve the inefficiencies in the AIL structure and the associated issues with extremely large list contents. That needs to be addresses separately; parallel access would cause problems to any new structure as well, so I'm only aiming to isolate the structure from unbounded parallelism here. SGI-PV: 972759 SGI-Modid: xfs-linux-melb:xfs-kern:30371a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-07[XFS] use generic_permissionChristoph Hellwig
Now that all direct caller of xfs_iaccess are gone we can kill xfs_iaccess and xfs_access and just use generic_permission with a check_acl callback. This is required for the per-mount read-only patchset in -mm to work properly with XFS. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:30370a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-07[XFS] stop re-checking permissions in xfs_swapextChristoph Hellwig
xfs_swapext should simplify check if we have a writeable file descriptor instead of re-checking the permissions using xfs_iaccess. Add an additional check to refuse O_APPEND file descriptors because swapext is not an append-only write operation. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:30369a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-07[XFS] clean up xfs_swapextChristoph Hellwig
- stop using vnodes - use proper multiple label goto unwinding - give the struct file * variables saner names SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:30366a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-07[XFS] remove permission check from xfs_change_file_spaceChristoph Hellwig
Both callers of xfs_change_file_space alreaedy do the file->f_mode & FMODE_WRITE check to ensure we have a file descriptor that has been opened for write mode, so there is no need to re-check that with xfs_iaccess. Especially as the later might wrongly deny it for corner cases like file descriptor passing through unix domain sockets. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:30365a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-07[XFS] prevent panic during log recovery due to bogus op_hdr lengthLachlan McIlroy
A problem was reported where a system panicked in log recovery due to a corrupt log record. The cause of the corruption is not known but this change will at least prevent a crash for this specific scenario. Log recovery definitely needs some more work in this area. SGI-PV: 974151 SGI-Modid: xfs-linux-melb:xfs-kern:30318a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
2008-02-07[XFS] Cleanup various fid related bits:Christoph Hellwig
- merge xfs_fid2 into it's only caller xfs_dm_inode_to_fh. - remove xfs_vget and opencode it in the two callers, simplifying both of them by avoiding the awkward calling convetion. - assign directly to the dm_fid_t members in various places in the dmapi code instead of casting them to xfs_fid_t first (which is identical to dm_fid_t) SGI-PV: 974747 SGI-Modid: xfs-linux-melb:xfs-kern:30258a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Vlad Apostolov <vapo@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-07[XFS] Fix xfs_lowbit64David Chinner
xfs_lowbit64 was broken on 32 bit platforms in a recent cleanup of the xfs bitops. Fix it back up again. SGI-PV: 974005 SGI-Modid: xfs-linux-melb:xfs-kern:30202a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-07[XFS] Remove CFORK macros and use code directly in IFORK and DFORK macros.Christoph Hellwig
Currently XFS_IFORK_* and XFS_DFORK* are implemented by means of XFS_CFORK* macros. But given that XFS_IFORK_* operates on an xfs_inode that embedds and xfs_icdinode_core and XFS_DFORK_* operates on an xfs_dinode that embedds a xfs_dinode_core one will have to do endian swapping while the other doesn't. Instead of having the current mess with the CFORK macros that have byteswapping and non-byteswapping version (which are inconsistantly named while we're at it) just define each family of the macros to stand by itself and simplify the whole matter. A few direct references to the CFORK variants were cleaned up to use IFORK or DFORK to make this possible. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:30163a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-07[XFS] kill superflous buffer locking (2nd attempt)Christoph Hellwig
There is no need to lock any page in xfs_buf.c because we operate on our own address_space and all locking is covered by the buffer semaphore. If we ever switch back to main blockdeive address_space as suggested e.g. for fsblock with a similar scheme the locking will have to be totally revised anyway because the current scheme is neither correct nor coherent with itself. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:30156a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-07[XFS] Use kernel-supplied "roundup_pow_of_two" for simplicityRobert P. J. Day
SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:30098a Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-02-07[XFS] Remove the BPCSHIFT and NB* based macros from XFS.Tim Shimmin
The BPCSHIFT based macros, btoc*, ctob*, offtoc* and ctooff are either not used or don't need to be used. The NDPP, NDPP, NBBY macros don't need to be used but instead are replaced directly by PAGE_SIZE and PAGE_CACHE_SIZE where appropriate. Initial patch and motivation from Nicolas Kaiser. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:30096a Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>