aboutsummaryrefslogtreecommitdiff
path: root/fs/nfs
AgeCommit message (Collapse)Author
2010-04-12NFSv4: fix delegated lockingTrond Myklebust
Arnaud Giersch reports that NFSv4 locking is broken when we hold a delegation since commit 8e469ebd6dc32cbaf620e134d79f740bf0ebab79 (NFSv4: Don't allow posix locking against servers that don't support it). According to Arnaud, the lock succeeds the first time he opens the file (since we cannot do a delegated open) but then fails after we start using delegated opens. The following patch fixes it by ensuring that locking behaviour is governed by a per-filesystem capability flag that is initially set, but gets cleared if the server ever returns an OPEN without the NFS4_OPEN_RESULT_LOCKTYPE_POSIX flag being set. Reported-by: Arnaud Giersch <arnaud.giersch@iut-bm.univ-fcomte.fr> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org
2010-04-09NFS: Ensure that the WRITE and COMMIT RPC calls are always uninterruptibleTrond Myklebust
We always want to ensure that WRITE and COMMIT completes, whether or not the user presses ^C. Do this by making the call asynchronous, and allowing the user to do an interruptible wait for rpc_task completion. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-04-09NFS: Fix a race with the new commit codeTrond Myklebust
This patch fixes a race which occurs due to the fact that we release the PG_writeback flag while still holding the nfs_page locked. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-04-09NFS: Ensure that writeback_single_inode() calls write_inode() when syncingTrond Myklebust
Since writeback_single_inode() checks the inode->i_state flags _before_ it flushes out the data, we need to ensure that the I_DIRTY_DATASYNC flag is already set. Otherwise we risk not seeing a call to write_inode(), which again means that we break fsync() et al... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-04-09NFS: Fix the mode calculation in nfs_find_open_contextTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-04-09NFSv4: Fall back to ordinary lookup if nfs4_atomic_open() returns EISDIRTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org
2010-04-07Have nfs ->d_revalidate() report errors properlyAl Viro
If nfs atomic open implementation ends up doing open request from ->d_revalidate() codepath and gets an error from server, return that error to caller explicitly and don't bother with lookup_instantiate_filp() at all. ->d_revalidate() can return an error itself just fine... See http://bugzilla.kernel.org/show_bug.cgi?id=15674 http://marc.info/?l=linux-kernel&m=126988782722711&w=2 for original report. Reported-by: Daniel J Blueman <daniel.blueman@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-30include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo
implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-22NFS: don't try to decode GETATTR if DELEGRETURN returned errorJeff Layton
The reply parsing code attempts to decode the GETATTR response even if the DELEGRETURN portion of the compound returned an error. The GETATTR response won't actually exist if that's the case and we're asking the parser to read past the end of the response. This bug is fairly benign. The parser catches this without reading past the end of the response and decode_getfattr returns -EIO. Earlier kernels however had decode_op_hdr using the READ_BUF macro, and this bug would make this printk pop any time the client got an error from a delegreturn: kernel: decode_op_hdr: reply buffer overflowed in line XXXX More recent kernels seem to have replaced this printk with a dprintk. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-19NFS: Prevent another deadlock in nfs_release_page()Trond Myklebust
We should not attempt to free the page if __GFP_FS is not set. Otherwise we can deadlock as per http://bugzilla.kernel.org/show_bug.cgi?id=15578 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org
2010-03-15NFS: ensure bdi_unregister is called on mount failure.NeilBrown
bdi_unregister is called by nfs_put_super which is only called by generic_shutdown_super if ->s_root is not NULL. So if we error out in a circumstance where we called nfs_bdi_register (i.e. server != NULL) but have not set s_root, then we need to call bdi_unregister explicitly in nfs_get_sb and various other *_get_sb() functions. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-11NFS: Avoid a deadlock in nfs_release_pageTrond Myklebust
J.R. Okajima reports the following deadlock: INFO: task kswapd0:305 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kswapd0 D 0000000000000001 0 305 2 0x00000000 ffff88001f21d4f0 0000000000000046 ffff88001fdea680 ffff88001f21c000 ffff88001f21dfd8 ffff88001f21c000 ffff88001f21dfd8 ffff88001f21dfd8 ffff88001fdea040 0000000000014c00 0000000000000001 ffff88001fdea040 Call Trace: [<ffffffff8146155d>] io_schedule+0x4d/0x70 [<ffffffff810d2be5>] sync_page+0x65/0xa0 [<ffffffff81461b12>] __wait_on_bit_lock+0x52/0xb0 [<ffffffff810d2b80>] ? sync_page+0x0/0xa0 [<ffffffff810d2b64>] __lock_page+0x64/0x70 [<ffffffff81070ce0>] ? wake_bit_function+0x0/0x40 [<ffffffff810df1d4>] truncate_inode_pages_range+0x344/0x4a0 [<ffffffff810df340>] truncate_inode_pages+0x10/0x20 [<ffffffff8112cbfe>] generic_delete_inode+0x15e/0x190 [<ffffffff8112cc8d>] generic_drop_inode+0x5d/0x80 [<ffffffff8112bb88>] iput+0x78/0x80 [<ffffffff811bc908>] nfs_dentry_iput+0x38/0x50 [<ffffffff811285f4>] dentry_iput+0x84/0x110 [<ffffffff811286ae>] d_kill+0x2e/0x60 [<ffffffff8112912a>] dput+0x7a/0x170 [<ffffffff8111e925>] path_put+0x15/0x40 [<ffffffff811c3a44>] __put_nfs_open_context+0xa4/0xb0 [<ffffffff811cb5d0>] ? nfs_free_request+0x0/0x50 [<ffffffff811c3b0b>] put_nfs_open_context+0xb/0x10 [<ffffffff811cb5f9>] nfs_free_request+0x29/0x50 [<ffffffff81234b7e>] kref_put+0x8e/0xe0 [<ffffffff811cb594>] nfs_release_request+0x14/0x20 [<ffffffff811cf769>] nfs_find_and_lock_request+0x89/0xa0 [<ffffffff811d1180>] nfs_wb_page+0x80/0x110 [<ffffffff811c0770>] nfs_release_page+0x70/0x90 [<ffffffff810d18ee>] try_to_release_page+0x5e/0x80 [<ffffffff810e1178>] shrink_page_list+0x638/0x860 [<ffffffff810e19de>] shrink_zone+0x63e/0xc40 We can fix this by making the call to put_nfs_open_context() happen when we actually remove the write request from the inode (which is done by the nfsiod thread in this case). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org
2010-03-10NFSv4: Don't ignore the NFS_INO_REVAL_FORCED flag in nfs_revalidate_inode()Trond Myklebust
If the NFS_INO_REVAL_FORCED flag is set, that means that we don't yet have an up to date attribute cache. Even if we hold a delegation, we must put a GETATTR on the wire. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org
2010-03-08nfs4: Make the v4 callback service hiddenSteve Dickson
To avoid hangs in the svc_unregister(), on version 4 mounts (and unmounts), when rpcbind is not running, make the nfs4 callback program an 'hidden' service by setting the 'vs_hidden' flag in the nfs4_callback_version structure. Signed-off-by: Steve Dickson <steved@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-08nfs: fix unlikely memory leakDan Carpenter
I'll admit that it's unlikely for the first allocation to fail and the second one to succeed. I won't be offended if you ignore this patch. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-06Merge branch 'for-2.6.34' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
* 'for-2.6.34' of git://linux-nfs.org/~bfields/linux: (22 commits) nfsd4: fix minor memory leak svcrpc: treat uid's as unsigned nfsd: ensure sockets are closed on error Revert "sunrpc: move the close processing after do recvfrom method" Revert "sunrpc: fix peername failed on closed listener" sunrpc: remove unnecessary svc_xprt_put NFSD: NFSv4 callback client should use RPC_TASK_SOFTCONN xfs_export_operations.commit_metadata commit_metadata export operation replacing nfsd_sync_dir lockd: don't clear sm_monitored on nsm_reboot_lookup lockd: release reference to nsm_handle in nlm_host_rebooted nfsd: Use vfs_fsync_range() in nfsd_commit NFSD: Create PF_INET6 listener in write_ports SUNRPC: NFS kernel APIs shouldn't return ENOENT for "transport not found" SUNRPC: Bury "#ifdef IPV6" in svc_create_xprt() NFSD: Support AF_INET6 in svc_addsock() function SUNRPC: Use rpc_pton() in ip_map_parse() nfsd: 4.1 has an rfc number nfsd41: Create the recovery entry for the NFSv4.1 client nfsd: use vfs_fsync for non-directories ...
2010-03-05Merge branch 'writeback-for-2.6.34' into nfs-for-2.6.34Trond Myklebust
2010-03-05NFS: Remove requirement for inode->i_mutex from nfs_invalidate_mappingTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-05NFS: Clean up nfs_sync_mappingTrond Myklebust
Remove the redundant call to filemap_write_and_wait(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-05NFS: Simplify nfs_wb_page()Trond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-05NFS: Replace __nfs_write_mapping with sync_inode()Trond Myklebust
Now that we have correct COMMIT semantics in writeback_single_inode, we can reduce and simplify nfs_wb_all(). Also replace nfs_wb_nocommit() with a call to filemap_write_and_wait(), which doesn't need to hold the inode->i_mutex. With that done, we can eliminate nfs_write_mapping() altogether. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-05NFS: Simplify nfs_wb_page_cancel()Trond Myklebust
In all cases we should be able to just remove the request and call cancel_dirty_page(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-05NFS: Ensure inode is always marked I_DIRTY_DATASYNC, if it has unstable pagesTrond Myklebust
Since nfs_scan_list() doesn't wait for locked pages, we have a race in which it is possible to end up with an inode that needs to send a COMMIT, but which does not have the I_DIRTY_DATASYNC flag set. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-05NFS: Run COMMIT as an asynchronous RPC call when wbc->for_background is setTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Wu Fengguang <fengguang.wu@intel.com>
2010-03-05NFS: Reduce the number of unnecessary COMMIT callsTrond Myklebust
If the caller is doing a non-blocking flush, and there are still writebacks pending on the wire, we can usually defer the COMMIT call until those writes are done. Also ensure that we honour the wbc->nonblocking flag. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-05NFS: Add a count of the number of unstable writes carried by an inodeTrond Myklebust
In order to know when we should do opportunistic commits of the unstable writes, when the VM is doing a background flush, we add a field to count the number of unstable writes. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-05NFS: Cleanup - move nfs_write_inode() into fs/nfs/write.cTrond Myklebust
The sole purpose of nfs_write_inode is to commit unstable writes, so move it into fs/nfs/write.c, and make nfs_commit_inode static. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-05pass writeback_control to ->write_inodeChristoph Hellwig
This gives the filesystem more information about the writeback that is happening. Trond requested this for the NFS unstable write handling, and other filesystems might benefit from this too by beeing able to distinguish between the different callers in more detail. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-05make sure data is on disk before calling ->write_inodeChristoph Hellwig
Similar to the fsync issue fixed a while ago in commit 2daea67e966dc0c42067ebea015ddac6834cef88 we need to write for data to actually hit the disk before writing out the metadata to guarantee data integrity for filesystems that modify the inode in the data I/O completion path. Currently XFS and NFS handle this manually, and AFS has a write_inode method that does nothing but waiting for data, while others are possibly missing out on this. Fortunately this change has a lot less impact than the fsync change as none of the write_inode methods starts data writeout of any form by itself. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-04Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs into for-2.6.34-incomingJ. Bruce Fields
Resolve merge conflict in fs/xfs/linux-2.6/xfs_export.c.
2010-03-04Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (52 commits) init: Open /dev/console from rootfs mqueue: fix typo "failues" -> "failures" mqueue: only set error codes if they are really necessary mqueue: simplify do_open() error handling mqueue: apply mathematics distributivity on mq_bytes calculation mqueue: remove unneeded info->messages initialization mqueue: fix mq_open() file descriptor leak on user-space processes fix race in d_splice_alias() set S_DEAD on unlink() and non-directory rename() victims vfs: add NOFOLLOW flag to umount(2) get rid of ->mnt_parent in tomoyo/realpath hppfs can use existing proc_mnt, no need for do_kern_mount() in there Mirror MS_KERNMOUNT in ->mnt_flags get rid of useless vfsmount_lock use in put_mnt_ns() Take vfsmount_lock to fs/internal.h get rid of insanity with namespace roots in tomoyo take check for new events in namespace (guts of mounts_poll()) to namespace.c Don't mess with generic_permission() under ->d_lock in hpfs sanitize const/signedness for udf nilfs: sanitize const/signedness in dealing with ->d_name.name ... Fix up fairly trivial (famous last words...) conflicts in drivers/infiniband/core/uverbs_main.c and security/tomoyo/realpath.c
2010-03-03a couple of mntget+dget -> path_get in nfs4procAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-03Switch alloc_nfs_open_context() to struct pathAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-03Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: percpu: add __percpu sparse annotations to what's left percpu: add __percpu sparse annotations to fs percpu: add __percpu sparse annotations to core kernel subsystems local_t: Remove leftover local.h this_cpu: Remove pageset_notifier this_cpu: Page allocator conversion percpu, x86: Generic inc / dec percpu instructions local_t: Move local.h include to ringbuffer.c and ring_buffer_benchmark.c module: Use this_cpu_xx to dynamically allocate counters local_t: Remove cpu_local_xx macros percpu: refactor the code in pcpu_[de]populate_chunk() percpu: remove compile warnings caused by __verify_pcpu_ptr() percpu: make accessors check for percpu pointer in sparse percpu: add __percpu for sparse. percpu: make access macros universal percpu: remove per_cpu__ prefix.
2010-03-02nfs41 fix NFS4ERR_CLID_INUSE for exchange idAndy Adamson
Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-02NFS: Fix an allocation-under-spinlock bugTrond Myklebust
sunrpc_cache_update() will always call detail->update() from inside the detail->hash_lock, so it cannot allocate memory. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org
2010-03-02NFSv4.1: Various fixes to the sequence flag error handlingTrond Myklebust
Ensure that we change the EXCHANGE_ID verifier (i.e. clp->cl_boot_time) when we want to reset all state. This is mainly needed when the server tells us that it is revoking our open or lock stateids. Handle revoking of recallable state by expiring the delegations. Handle callback path issues by expiring the delegations and then resetting the session. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-02nfs4: renewd renew operations should take/put a client referenceAlexandros Batsakis
renewd sends RENEW requests to the NFS server in order to renew state. As the request is asynchronous, renewd should take a reference to the nfs_client to prevent concurrent umounts from freeing the client Signed-off-by: Alexandros Batsakis <batsakis@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-02nfs41: renewd sequence operations should take/put client referenceAlexandros Batsakis
renewd sends SEQUENCE requests to the NFS server in order to renew state. As the request is asynchronous, renewd should take a reference to the nfs_client to prevent concurrent umounts from freeing the session/client Signed-off-by: Alexandros Batsakis <batsakis@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-02nfs: prevent backlogging of renewd requestsAlexandros Batsakis
If the renewd send queue gets backlogged (e.g., if the server goes down), we will keep filling the queue with periodic RENEW/SEQUENCE requests. This patch schedules a new renewd request if and only if the previous one returns (either success or failure) Signed-off-by: Alexandros Batsakis <batsakis@netapp.com> [Trond.Myklebust@netapp.com: moved nfs4_schedule_state_renewal() into separate nfs4_renew_release() and nfs41_sequence_release() callbacks to ensure correct behaviour on call setup failure] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-02nfs: kill renewd before clearing client minor versionAlexandros Batsakis
renewd should be synchronously killed before we destroy the session in nfs4_clear_minor_version Signed-off-by: Alexandros Batsakis <batsakis@netapp.com> [Trond.Myklebust@netapp.com: clean up to remove 'unused function warning when !CONFIG_NFS_V4] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-26Remove EXPERIMENTAL from NFS_FSCACHEChristian Kujau
There's currently an open Ubuntu bug[0], with the intent to compile NFS_FSCACHE (and possibly AFS_FSCACHE, 9P_FSCACHE) into the standard Ubuntu kernel. However, since *_FSCACHE still depends on EXPERIMENTAL, this won't happen. As Arjan van de Ven pointed out[1], the EXPERIMENTAL flag doesn't mean that much any more, I propose the following patch to fs/nfs/Kconfig. I'd do the same for fs/9p/Kconfig and fs/afs/Kconfig, but as I did not test 9p or AFS, I feel it would not be appropriate for me to remove the flag. [0] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/440522/comments/5 [1] http://lkml.org/lkml/2010/1/23/145 Signed-off-by: Christian Kujau <lists@nerdbynature.de> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-02-17percpu: add __percpu sparse annotations to fsTejun Heo
Add __percpu sparse annotations to fs. These annotations are to make sparse consider percpu variables to be in a different address space and warn if accessed without going through percpu accessors. This patch doesn't affect normal builds. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Alex Elder <aelder@sgi.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
2010-02-15NFS: Too many GETATTR and ACCESS calls after direct I/OChuck Lever
The cached read and write paths initialize fattr->time_start in their setup procedures. The value of fattr->time_start is propagated to read_cache_jiffies by nfs_update_inode(). Subsequent calls to nfs_attribute_timeout() will then use a good time stamp when computing the attribute cache timeout, and squelch unneeded GETATTR calls. Since the direct I/O paths erroneously leave the inode's fattr->time_start field set to zero, read_cache_jiffies for that inode is set to zero after any direct read or write operation. This triggers an otw GETATTR or ACCESS call to update the file's attribute and access caches properly, even when the NFS READ or WRITE replies have usable post-op attributes. Make sure the direct read and write setup code performs the same fattr initialization as the cached I/O paths to prevent unnecessary GETATTR calls. This was likely introduced by commit 0e574af1 in 2.6.15, which appears to add new nfs_fattr_init() call sites in the cached read and write paths, but not in the equivalent places in fs/nfs/direct.c. A subsequent commit in the same series, 33801147, introduces the fattr->time_start field. Interestingly, the direct write reschedule path already has a call to nfs_fattr_init() in the right place. Reported-by: Quentin Barnes <qbarnes@yahoo-inc.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: stable@kernel.org Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-02-10NFS: Make close(2) asynchronous when closing NFS O_DIRECT filesChuck Lever
For NFSv2 and v3: O_DIRECT writes are always synchronous, and aren't cached, so nothing should be flushed when closing an NFS O_DIRECT file descriptor. Thus there are no write errors to report on close(2). In addition, there's no cached data to verify on the next open(2), so we don't need clean GETATTR results at close time to compare with. Thus, there's no need for the nfs_revalidate_inode() call when closing an NFS O_DIRECT file. This reduces the number of synchronous on-the-wire requests for a simple open-write-close of an NFS O_DIRECT file by roughly 20%. For NFSv4: Call nfs4_do_close() with wait set to zero when closing an NFS O_DIRECT file. The CLOSE will go on the wire, but the application won't wait for it to complete. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10NFS: Improve NFS iostat byte count accuracy for writesChuck Lever
The bytes counted by the performance counters for NFS writes should reflect write and sync errors. If the write(2) system call reports an error, the bytes should not be counted. And, if the write is short, the actual number of bytes that was written should be counted, not the number of bytes that was requested. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10NFS: Account for NFS bytes read via the splice APIChuck Lever
Bytes read via the splice API should be accounted for in the NFS performance statistics. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10NFS: Fix byte accounting for generic NFS readsChuck Lever
Currently, the NFS I/O counters count the number of bytes requested by applications, rather than the number of bytes actually read by the system calls. The number of bytes requested for reads is actually not that useful, because the value is usually a buffer size for reads. That is, that requested number is usually a maximum, and frequently doesn't reflect the actual number of bytes read. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10NFS: Proper accounting for NFS VFS callsChuck Lever
Nit: The VFSOPEN and VFSFLUSH counters are function call counters. Count every call to these routines. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10nfs41: cleanup callback code to use __be32 typeAndy Adamson
Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>