aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2006-03-20NFS: remove support for multi-segment iovs in the direct read pathChuck Lever
Eliminate the persistent use of automatic storage in all parts of the NFS client's direct read path to pave the way for introducing support for aio against files opened with the O_DIRECT flag. Test plan: Compile the kernel with CONFIG_NFS and CONFIG_NFS_DIRECTIO enabled. Millions of fsx-odirect ops. OraSim. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFS: use size_t type for holding rsize bytes in NFS O_DIRECT read pathChuck Lever
size_t is used for holding byte counts, so use it for variables storing rsize. Note that the write path will be updated as we add support for async O_DIRECT writes. Test plan: Need to verify that existing comparisons against new size_t variables behave correctly. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFS: update comments and function definitions in fs/nfs/direct.cChuck Lever
Update to latest coding style standards. Remove block comments on statically defined functions, and place function definitions all on one line. Test plan: Compile kernel with CONFIG_NFS and CONFIG_NFS_DIRECTIO. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFS: clean up NFS client's a_ops->direct_IO methodChuck Lever
The NFS client's a_ops->direct_IO method, nfs_direct_IO, is required to be present to allow NFS files to be opened with O_DIRECT, but is never called because the NFS client shunts reads and writes to files opened with O_DIRECT directly to its own routines. Gut the nfs_direct_IO function. This eliminates the only part of the NFS client's direct I/O path that requires support for multi-segment iovs, allowing further simplification in subsequent patches. Test plan: Compile the kernel with CONFIG_NFS and CONFIG_NFS_DIRECTIO enabled. Millions of fsx-odirect ops. OraSim. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFS: Cleanup of NFS read codeTrond Myklebust
Same callback hierarchy inversion as for the NFS write calls. This patch is not strictly speaking needed by the O_DIRECT code, but avoids confusing differences between the asynchronous read and write code. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFS: Cleanup of NFS write code in preparation for asynchronous o_directTrond Myklebust
This patch inverts the callback hierarchy for NFS write calls. Instead of having the NFSv2/v3/v4-specific code set up the RPC callback ops, we allow the original caller to do so. This allows for more flexibility w.r.t. how to set up and tear down the nfs_write_data structure while still allowing the NFSv3/v4 code to perform error handling. The greater flexibility is needed by the asynchronous O_DIRECT code, which wants to be able to hold on to the original nfs_write_data structures after the WRITE RPC call has completed in order to be able to replay them if the COMMIT call determines that the server has rebooted. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20lockd: Remove FL_LOCKD flagJ. Bruce Fields
Currently lockd identifies its own locks using the FL_LOCKD flag. This doesn't scale well to multiple lock managers--if we did this in nfsv4 too, for example, we'd be left with only one free flag bit. Instead, we just check whether the file manager ops (fl_lmops) set on this lock are our own. The only use for this is in nlm_traverse_locks, which uses it to find locks that need cleaning up when freeing a host or a file. In the long run it might be nice to do reference counting instead of traversing all the locks like this.... Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20locks,lockd: fix race in nlmsvc_testlockAndy Adamson
posix_test_lock() returns a pointer to a struct file_lock which is unprotected and can be removed while in use by the caller. Move the conflicting lock from the return to a parameter, and copy the conflicting lock. In most cases the caller ends up putting the copy of the conflicting lock on the stack. On i386, sizeof(struct file_lock) appears to be about 100 bytes. We're assuming that's reasonable. Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20locks: remove unused posix_block_lockAndy Adamson
posix_lock_file() is used to add a blocked lock to Lockd's block, so posix_block_lock() is no longer needed. Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20lockd: make nlmsvc_lock use only posix_lock_fileAndy Adamson
Reorganize nlmsvc_lock() to make full use of posix_lock_file(), which does eveything nlmsvc_lock() needs - no need to call posix_test_lock(), posix_locks_deadlock(), or posix_block_lock() separately. Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20lockd: simplify nlmsvc_grant_blockedAndy Adamson
Reorganize nlmsvc_grant_blocked() to make full use of posix_lock_file(). Note that there's no need for separate calls to posix_test_lock(), posix_locks_deadlock(), or posix_block_lock(). Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20lockd: clean up nlmsvc_lockAndy Adamson
Slightly more consistent dprintk error reporting, consolidate some up()'s. Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFS: directory trace messagesChuck Lever
Reuse NFSDBG_DIRCACHE and NFSDBG_LOOKUPCACHE to provide additional diagnostic messages that trace the operation of the NFS client's directory behavior. A few new messages are now generated when NFSDBG_VFS is active, as well, to trace normal VFS activity. This compromise provides better trace debugging for those who use pre-built kernels, without adding a lot of extra noise to the standard debug settings. Test-plan: Enable NFS trace debugging with flags 1, 2, or 4. You should be able to see different types of trace messages with each flag setting. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20SUNRPC: eliminate rpc_call()Chuck Lever
Clean-up: replace rpc_call() helper with direct call to rpc_call_sync. This makes NFSv2 and NFSv3 synchronous calls more computationally efficient, and reduces stack consumption in functions that used to invoke rpc_call more than once. Test plan: Compile kernel with CONFIG_NFS enabled. Connectathon on NFS version 2, version 3, and version 4 mount points. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20SUNRPC: display human-readable procedure name in rpc_iostats outputChuck Lever
Add fields to the rpc_procinfo struct that allow the display of a human-readable name for each procedure in the rpc_iostats output. Also fix it so that the NFSv4 stats are broken up correctly by sub-procedure number. NFSv4 uses only two real RPC procedures: NULL, and COMPOUND. Test plan: Mount with NFSv2, NFSv3, and NFSv4, and do "cat /proc/self/mountstats". Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFS: add RPC I/O statistics to /proc/self/mountstatsChuck Lever
NFS client now shows various RPC I/O metrics in /proc/self/mountstats. Test plan: Mount/umount while doing "cat /proc/self/mountstats", multiple iterations of connectathon locking suite. Test with NFS version 2, 3, and 4. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFS: report how long an NFS file system has been mountedChuck Lever
Add a field in nfs_server to record a timestamp when a mount succeeds. Report the number of seconds the file system has been mounted via nfs_show_stats(). Test plan: Mount an NFS file system, watch the mountstats reports and compare with clock time. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFS: add hooks to account for NFSERR_JUKEBOX errorsChuck Lever
Make an inode or an nfs_server struct available in the logic that handles JUKEBOX/DELAY type errors so the NFS client can account for them. This patch is split out from the main nfs iostat patch to highlight minor architectural changes required to support this statistic. Test plan: None. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFS: add I/O performance countersChuck Lever
Invoke the byte and event counter macros where we want to count bytes and events. Clean-up: fix a possible NULL dereference in nfs_lock, and simplify nfs_file_open. Test-plan: fsx and iozone on UP and SMP systems, with and without pre-emption. Watch for memory overwrite bugs, and performance loss (significantly more CPU required per op). Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFS: introduce mechanism for tracking NFS client metricsChuck Lever
Add a per-superblock performance counter facility to the NFS client. This facility mimics the counters available for block devices and for networking. Expose these new counters via the new /proc/self/mountstats interface. Thanks to Andrew Morton and Trond Myklebust for their review and comments. Test plan: fsx and iozone on UP and SMP systems, with and without pre-emption. Watch for memory overwrite bugs, and performance loss (significantly more CPU required per op). Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFS: clean up some mount optionsChuck Lever
Get rid of "lock" and "posix", and spell out "vers=". Test plan: None. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFS: show retransmit settings when displaying mount optionsChuck Lever
Sometimes it's important to know the exact RPC retransmit settings the kernel is using for an NFS mount point. Add this facility to the NFS client's show_options method. Test plan: Set various retransmit settings via the mount command, and check that the settings are reflected in /proc/mounts. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20VFS: New /proc file /proc/self/mountstatsChuck Lever
Create a new file under /proc/self, called mountstats, where mounted file systems can export information (configuration options, performance counters, and so on). Use a mechanism similar to /proc/mounts and s_ops->show_options. This mechanism does not violate namespace security, and is safe to use while other processes are unmounting file systems. Thanks to Mike Waychison for his review and comments. Test-plan: Test concurrent mount/unmount operations while cat'ing /proc/self/mountstats. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFS: sem2mutex idmap.cIngo Molnar
semaphore to mutex conversion. the conversion was generated via scripts, and the result was validated automatically via a script as well. build and boot tested. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFS: kzalloc conversion in fs/nfsEric Sesterhenn
this converts fs/nfs to kzalloc() usage. compile tested with make allyesconfig Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFSv4: Kill braindead gcc warningsTrond Myklebust
nfs4_open_revalidate: 'res' may be used uninitialized nfs4_callback_compound: ‘hdr_res.nops’ may be used uninitialized 'op_nr’ may be used uninitialized encode_getattr_res: ‘savep’ may be used uninitialized Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFSv4: Do not call rpciod_down() before call to destroy_nfsv4_state()Trond Myklebust
The reason is that the idmapper cleanup may call flush_workqueue() on rpciod_workqueue. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20SUNRPC: Ensure that rpc_mkpipe returns a refcounted dentryTrond Myklebust
If not, we cannot guarantee that idmap->idmap_dentry, gss_auth->dentry and clnt->cl_dentry are valid dentries. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFS: reduce the number of false cache invalidations.Trond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFS: "const static" vs "static const" in nfs4Jesper Juhl
My previous "const static" vs "static const" cleanup missed a single case, patch below takes care of it. Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFSv4: Don't invalidate cached attributes if change attribute is unchangedTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFS: writes should not clobber utimes() callsTrond Myklebust
Ensure that we flush out writes in the case when someone calls utimes() in order to set the file times. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20lockd: Don't expose the process pid to the NLM serverTrond Myklebust
Instead we use the nlm_lockowner->pid. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NLM: nlm_alloc_call should not immediately fail on signalTrond Myklebust
Currently, nlm_alloc_call tests for a signal before it even tries to allocate memory. Fix it so that it tries at least once. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20VFS: Fix __posix_lock_file() copy of private lock areaTrond Myklebust
The struct file_lock->fl_u area must be copied using the fl_copy_lock() operation. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFS: Fix buglet in fs/nfs/write.cNeil Brown
I've been reading through fs/nfs/write.c trying to track down a bug that seems to be related to pages loosing a refcount and getting freed too early (you interested in detail??) and I spotted a little bug which the following patch should fix. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFS: Avoid races between writebacks and truncationTrond Myklebust
Currently, there is no serialisation between NFS asynchronous writebacks and truncation at the page level due to the fact that nfs_sync_inode() cannot lock the pages that it is about to write out. This means that it is possible to be flushing out data (and calling something like set_page_writeback()) while the page cache is busy evicting the page. Oops... Use the hooks provided in try_to_release_page() to ensure that dirty pages are always written back to storage before we evict them. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFS: Fix a busy inodes issue...Trond Myklebust
The nfs_open_context may live longer than the file descriptor that spawned it, so it needs to carry a reference to the vfsmount. If not, then generic_shutdown_super() may end up being called before reads and writes have been flushed out. Make a couple of functions static while we're at it... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-17[PATCH] nfsservctl(): remove user-triggerable printkPeter Staubach
A user can use nfsservctl() to spam the logs. This can happen because the arguments to the nfsservctl() system call are versioned. This is a good thing. However, when a bad version is detected, the kernel prints a message and then returns an error. Signed-off-by: Peter Staubach <staubach@redhat.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-17[PATCH] v9fs: fix overzealous dropping of dentry which breaks dcacheEric Van Hensbergen
There is a d_drop in dir_release which caused problems as it invalidates dcache entries too soon. This was likely a part of the wierd cwd behavior folks were seeing. Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-15[PATCH] Fix ext2 readdir f_pos re-validation logicAl Viro
This fixes not one, but _two_, silly (but admittedly hard to hit) bugs in the ext2 filesystem "readdir()" function. It also cleans up the code to avoid the unnecessary goto mess. The bugs were related to re-valiating the f_pos value after somebody had either done an "lseek()" on the directory to an invalid offset, or when the offset had become invalid due to a file being unlinked in the directory. The code would not only set the f_version too eagerly, it would also not update f_pos appropriately for when the offset fixup took place. When that happened, we'd occasionally subsequently fail the readdir() even when we shouldn't (no real harm done, but an ugly printk, and obviously you would end up not necessarily seeing all entries). Thanks to Masoud Sharbiani <masouds@google.com> who noticed the problem and had a test-case for it, and also fixed up a thinko in the first version of this patch. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Masoud Sharbiani <masouds@google.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-15[PATCH] fs/namespace.c:dup_namespace(): fix a use after freeAdrian Bunk
The Coverity checker spotted the following bug in dup_namespace(): <-- snip --> if (!new_ns->root) { up_write(&namespace_sem); kfree(new_ns); goto out; } ... out: return new_ns; <-- snip --> Callers expect a non-NULL result to not be freed. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-14[PATCH] page migration: fail if page is in a vma flagged VM_LOCKEDChristoph Lameter
page migration currently simply retries a couple of times if try_to_unmap() fails without inspecting the return code. However, SWAP_FAIL indicates that the page is in a vma that has the VM_LOCKED flag set (if ignore_refs ==1). We can check for that return code and avoid retrying the migration. migrate_page_remove_references() now needs to return a reason why the failure occured. So switch migrate_page_remove_references to use -Exx style error messages. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-14Merge git://oss.sgi.com:8090/oss/git/rc-fixesLinus Torvalds
* git://oss.sgi.com:8090/oss/git/rc-fixes: Fix a direct I/O locking issue revealed by the new mutex code.
2006-03-15Fix a direct I/O locking issue revealed by the new mutex code.Nathan Scott
Affects only XFS (i.e. DIO_OWN_LOCKING case) - currently it is not possible to get i_mutex locking correct when using DIO_OWN direct I/O locking in a filesystem due to indeterminism in the possible return code/lock/unlock combinations. This can cause a direct read to attempt a double i_mutex unlock inside XFS. We're now ensuring __blockdev_direct_IO always exits with the inode i_mutex (still) held for a direct reader. Tested with the three different locking modes (via direct block device access, ext3 and XFS) - both reading and writing; cannot find any regressions resulting from this change, and it clearly fixes the mutex_unlock warning originally reported here: http://marc.theaimsgroup.com/?l=linux-kernel&m=114189068126253&w=2 Signed-off-by: Nathan Scott <nathans@sgi.com> Acked-by: Christoph Hellwig <hch@lst.de>
2006-03-14[PATCH] JFS: Take logsync lock before testing mp->lsnDave Kleikamp
This fixes a race where lsn could be cleared before taking the lock Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-14[PATCH] NLM: Ensure we do not Oops in the case of an unlockTrond Myklebust
In theory, NLM specs assure us that the server will only reply LCK_GRANTED or LCK_DENIED_GRACE_PERIOD to our NLM_UNLOCK request. In practice, we should not assume this to be the case, and the code will currently Oops if we do. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-14[PATCH] NFSv4: fix mount segfault on errors returned that are < -1000Trond Myklebust
It turns out that nfs4_proc_get_root() may return raw NFSv4 errors instead of mapping them to kernel errors. Problem spotted by Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-14[PATCH] NFS: Fix a potential panic in O_DIRECTTrond Myklebust
Based on an original patch by Mike O'Connor and Greg Banks of SGI. Mike states: A normal user can panic an NFS client and cause a local DoS with 'judicious'(?) use of O_DIRECT. Any O_DIRECT write to an NFS file where the user buffer starts with a valid mapped page and contains an unmapped page, will crash in this way. I haven't followed the code, but O_DIRECT reads with similar user buffers will probably also crash albeit in different ways. Details: when nfs_get_user_pages() calls get_user_pages(), it detects and correctly handles get_user_pages() returning an error, which happens if the first page covered by the user buffer's address range is unmapped. However, if the first page is mapped but some subsequent page isn't, get_user_pages() will return a positive number which is less than the number of pages requested (this behaviour is sort of analagous to a short write() call and appears to be intentional). nfs_get_user_pages() doesn't detect this and hands off the array of pages (whose last few elements are random rubbish from the newly allocated array memory) to it's caller, whence they go to nfs_direct_write_seg(), which then totally ignores the nr_pages it's given, and calculates its own idea of how many pages are in the array from the user buffer length. Needless to say, when it comes to transmit those uninitialised page* pointers, we see a crash in the network stack. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-11[PATCH] ext3: fix nobh mode for chattr +j inodesBadari Pulavarty
One can do "chattr +j" on a file to change its journalling mode. Fix writeback mode with "nobh" handling for it. Even though, we mount ext3 filesystem in writeback mode with "nobh" option, some one can do "chattr +j" on a single file to force it to do journalled mode. In order to do journaling, ext3_block_truncate_page() need to fallback to default case of creating buffers and adding them to transaction etc. Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>