aboutsummaryrefslogtreecommitdiff
path: root/fs/nfs
AgeCommit message (Collapse)Author
2009-01-04fs: symlink write_begin allocation context fixNick Piggin
With the write_begin/write_end aops, page_symlink was broken because it could no longer pass a GFP_NOFS type mask into the point where the allocations happened. They are done in write_begin, which would always assume that the filesystem can be entered from reclaim. This bug could cause filesystem deadlocks. The funny thing with having a gfp_t mask there is that it doesn't really allow the caller to arbitrarily tinker with the context in which it can be called. It couldn't ever be GFP_ATOMIC, for example, because it needs to take the page lock. The only thing any callers care about is __GFP_FS anyway, so turn that into a single flag. Add a new flag for write_begin, AOP_FLAG_NOFS. Filesystems can now act on this flag in their write_begin function. Change __grab_cache_page to accept a nofs argument as well, to honour that flag (while we're there, change the name to grab_cache_page_write_begin which is more instructive and does away with random leading underscores). This is really a more flexible way to go in the end anyway -- if a filesystem happens to want any extra allocations aside from the pagecache ones in ints write_begin function, it may now use GFP_KERNEL (rather than GFP_NOFS) for common case allocations (eg. ocfs2_alloc_write_ctxt, for a random example). [kosaki.motohiro@jp.fujitsu.com: fix ubifs] [kosaki.motohiro@jp.fujitsu.com: fix fuse] Signed-off-by: Nick Piggin <npiggin@suse.de> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: <stable@kernel.org> [2.6.28.x] Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> [ Cleaned up the calling convention: just pass in the AOP flags untouched to the grab_cache_page_write_begin() function. That just simplifies everybody, and may even allow future expansion of the logic. - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-30Merge branch 'devel' into nextTrond Myklebust
2008-12-30fs/nfs/nfs4proc.c: make nfs4_map_errors() staticWANG Cong
nfs4_map_errors() can become static. Signed-off-by: WANG Cong <wangcong@zeuux.org> Cc: J. Bruce Fields <bfields@fieldses.org> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23rpc: allow gss callbacks to clientOlga Kornievskaia
This patch adds client-side support to allow for callbacks other than AUTH_SYS. Signed-off-by: Olga Kornievskaia <aglo@citi.umich.edu> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23NFS: remove unused status from encode routinesAndy Adamson
Signed-off-by: Andy Adamson<andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23NFS: increment number of operations in each encode routineAndy Adamson
Signed-off-by: Andy Adamson<andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23NFS: fix comment placement in nfs4xdr.cBenny Halevy
Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23NFS: fix tabs in nfs4xdr.cAndy Adamson
Signed-off-by: Andy Adamson<andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23NFS: remove white space from nfs4xdr.cAndy Adamson
Clean-up Signed-off-by: Andy Adamson<andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23nfs: remove incorrect usage of nfs4 compound response hdr.statusBenny Halevy
3 call sites look at hdr.status before returning success. hdr.status must be zero in this case so there's no point in this. Currently, hdr.status is correctly processed at decode_op_hdr time if the op status cannot be decoded. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23nfs: return compound hdr.status when there are no op repliesBenny Halevy
When there are no op replies encoded in the compound reply hdr.status still contains the overall status of the compound rpc. This can happen, e.g., when the server returns a NFS4ERR_MINOR_VERS_MISMATCH error. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23NFSv4: Fix an infinite loop in the NFS state recovery codeTrond Myklebust
Marten Gajda <marten.gajda@fernuni-hagen.de> states: I tracked the problem down to the function nfs4_do_open_expired. Within this function _nfs4_open_expired is called and may return -NFS4ERR_DELAY. When a further call to _nfs4_open_expired is executed and does not return -NFS4ERR_DELAY the "exception.retry" variable is not reset to 0, causing the loop to iterate again (and as long as err != -NFS4ERR_DELAY, probably forever) Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23optimize attribute timeouts for "noac" and "actimeo=0"Peter Staubach
Hi. I've been looking at a bugzilla which describes a problem where a customer was advised to use either the "noac" or "actimeo=0" mount options to solve a consistency problem that they were seeing in the file attributes. It turned out that this solution did not work reliably for them because sometimes, the local attribute cache was believed to be valid and not timed out. (With an attribute cache timeout of 0, the cache should always appear to be timed out.) In looking at this situation, it appears to me that the problem is that the attribute cache timeout code has an off-by-one error in it. It is assuming that the cache is valid in the region, [read_cache_jiffies, read_cache_jiffies + attrtimeo]. The cache should be considered valid only in the region, [read_cache_jiffies, read_cache_jiffies + attrtimeo). With this change, the options, "noac" and "actimeo=0", work as originally expected. This problem was previously addressed by special casing the attrtimeo == 0 case. However, since the problem is only an off- by-one error, the cleaner solution is address the off-by-one error and thus, not require the special case. Thanx... ps Signed-off-by: Peter Staubach <staubach@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23NFSv4: Convert the open and close ops to use fmodeTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23NFS: Use delegations to optimise ACCESS callsTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23NFSv4: Ensure that we set the verifier when revalidating delegated dentriesTrond Myklebust
This ensures that we don't have to look up the dentry again after we return the delegation if we know that the directory didn't change. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23NFSv4: Clean up is_atomic_open()Trond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23NFSv4: Convert delegation->type field to fmode_tTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23NFSv4: Fix up delegation callbacksTrond Myklebust
Currently, the callback server is listening on IPv6 if it is enabled. This means that IPv4 addresses will always be mapped. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23NFSv4: Return unreferenced delegations more promptlyTrond Myklebust
If the client is not using a delegation, the right thing to do is to return it as soon as possible. This helps reduce the amount of state the server has to track, as well as reducing the potential for conflicts with other clients. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23NFSv4: Clean up the asynchronous delegation returnTrond Myklebust
Reuse the state management thread in order to return delegations when we get a callback. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23NFSv4: Clean up nfs_expire_all_delegations()Trond Myklebust
Let the actual delegreturn stuff be run in the state manager thread rather than allocating a separate kthread. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23NFSv4: Fix a BAD_SEQUENCEID condition.Trond Myklebust
We really shouldn't be resetting the sequence ids when doing state expiration recovery, since we don't know if the server still remembers our previous state owners. There are servers out there that do attempt to preserve client state even if the lease has expired. Such a server would only release that state if a conflicting OPEN request occurs. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23NFSv4: Don't exit the state management if there are still tasks to doTrond Myklebust
Fix up a potential race... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23NFSv4: Rename the state reclaimer threadTrond Myklebust
It is really a more general purpose state management thread at this point. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23NFSv4: Clean up NFS4ERR_CB_PATH_DOWN error management...Trond Myklebust
Add a delegation cleanup phase to the state management loop, and do the NFS4ERR_CB_PATH_DOWN recovery there. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23NFSv4: Clean up the support for returning multiple delegationsTrond Myklebust
Add a flag to mark delegations as requiring return, then run a garbage collector. In the future, this will allow for more flexible delegation management, where delegations may be marked for return if it turns out that they are not being referenced. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23NFSv4: Add recovery for individual stateidsTrond Myklebust
NFSv4 defines a number of state errors which the client does not currently handle. Among those we should worry about are: NFS4ERR_ADMIN_REVOKED - the server's administrator revoked our locks and/or delegations. NFS4ERR_BAD_STATEID - the client and server are out of sync, possibly due to a delegation return racing with an OPEN request. NFS4ERR_OPENMODE - the client attempted to do something not sanctioned by the open mode of the stateid. Should normally just occur as a result of a delegation return race. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23NFSv4: Remove nfs_client->cl_semTrond Myklebust
Now that we're using the flags to indicate state that needs to be recovered, as well as having implemented proper refcounting and spinlocking on the state and open_owners, we can get rid of nfs_client->cl_sem. The only remaining case that was dubious was the file locking, and that case is now covered by the nfsi->rwsem. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23NFSv4: Ensure that file unlock requests don't conflict with state recoveryTrond Myklebust
The unlock path is currently failing to take the nfs_client->cl_sem read lock, and hence the recovery path may see locks disappear from underneath it. Also ensure that it takes the nfs_inode->rwsem read lock so that it there is no conflict with delegation recalls. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23NFS: Remove the unnecessary argument to nfs4_wait_clnt_recover()Trond Myklebust
...and move some code around in order to clear out an unnecessary forward declaration. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23NFSv4: Ensure that nfs4_reclaim_open_state() doesn't depend on cl_semTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23NFSv4: Add a recovery marking scheme for state ownersTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23NFSv4: Don't tell server we rebooted when not necessaryTrond Myklebust
Instead of doing a full setclientid, try doing a RENEW call first. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23NFSv4: Remove redundant RENEW calls if we know the lease has expiredTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23NFSv4: Fix state recovery when the client runs over the grace periodTrond Myklebust
If the client for some reason is not able to recover all its state within the time allotted for the grace period, and the server reboots again, the client is not allowed to recover the state that was 'lost' using reboot recovery. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23NFSv4: Callers to nfs4_get_renew_cred() need to hold nfs_client->cl_lockTrond Myklebust
Ditto for nfs4_get_setclientid_cred(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23NFSv4: Clean up for the state loss reclaimerTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23NFS: Use atomic bitops when changing struct nfs_delegation->flagsTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23NFSv4: Fix up the dereferencing of delegation->inodeTrond Myklebust
Without an extra lock, we cannot just assume that the delegation->inode is valid when we're traversing the rcu-protected nfs_client lists. Use the delegation->lock to ensure that it is truly valid. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23NFSv4: Fix up another delegation related raceTrond Myklebust
When we can update_open_stateid(), we need to be certain that we don't race with a delegation return. While we could do this by grabbing the nfs_client->cl_lock, a dedicated spin lock in the delegation structure will scale better. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23NLM: allow lockd requests from an unprivileged portChuck Lever
If the admin has specified the "noresvport" option for an NFS mount point, the kernel's NFS client uses an unprivileged source port for the main NFS transport. The kernel's lockd client should use an unprivileged port in this case as well. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23NFS: "[no]resvport" mount option changes mountd client tooChuck Lever
If the admin has specified the "noresvport" option for an NFS mount point, the kernel's NFS client uses an unprivileged source port for the main NFS transport. The kernel's mountd client should use an unprivileged port in this case as well. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23NFS: add "[no]resvport" mount optionChuck Lever
The standard default security setting for NFS is AUTH_SYS. An NFS client connects to NFS servers via a privileged source port and a fixed standard destination port (2049). The client sends raw uid and gid numbers to identify users making NFS requests, and the server assumes an appropriate authority on the client has vetted these values because the source port is privileged. On Linux, by default in-kernel RPC services use a privileged port in the range between 650 and 1023 to avoid using source ports of well- known IP services. Using such a small range limits the number of NFS mount points and the number of unique NFS servers to which a client can connect concurrently. An NFS client can use unprivileged source ports to expand the range of source port numbers, allowing more concurrent server connections and more NFS mount points. Servers must explicitly allow NFS connections from unprivileged ports for this to work. In the past, bumping the value of the sunrpc.max_resvport sysctl on the client would permit the NFS client to use unprivileged ports. Bumping this setting also changes the maximum port number used by other in-kernel RPC services, some of which still required a port number less than 1023. This is exacerbated by the way source port numbers are chosen by the Linux RPC client, which starts at the top of the range and works downwards. It means that bumping the maximum means all RPC services requesting a source port will likely get an unprivileged port instead of a privileged one. Changing this setting effects all NFS mount points on a client. A sysadmin could not selectively choose which mount points would use non-privileged ports and which could not. Lastly, this mechanism of expanding the limit on the number of NFS mount points was entirely undocumented. To address the need for the NFS client to use a large range of source ports without interfering with the activity of other in-kernel RPC services, we introduce a new NFS mount option. This option explicitly tells only the NFS client to use a non-privileged source port when communicating with the NFS server for one specific mount point. This new mount option is called "resvport," like the similar NFS mount option on FreeBSD and Mac OS X. A sister patch for nfs-utils will be submitted that documents this new option in nfs(5). The default setting for this new mount option requires the NFS client to use a privileged port, as before. Explicitly specifying the "noresvport" mount option allows the NFS client to use an unprivileged source port for this mount point when connecting to the NFS server port. This mount option is supported only for text-based NFS mounts. [ Sidebar: it is widely known that security mechanisms based on the use of privileged source ports are ineffective. However, the NFS client can combine the use of unprivileged ports with the use of secure authentication mechanisms, such as Kerberos. This allows a large number of connections and mount points while ensuring a useful level of security. Eventually we may change the default setting for this option depending on the security flavor used for the mount. For example, if the mount is using only AUTH_SYS, then the default setting will be "resvport;" if the mount is using a strong security flavor such as krb5, the default setting will be "noresvport." ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> [Trond.Myklebust@netapp.com: Fixed a bug whereby nfs4_init_client() was being called with incorrect arguments.] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23NFS: move nfs_server flag initializationChuck Lever
Make it possible for the NFSv4 mount set up logic to pass mount option flags down the stack to nfs_create_rpc_client(). This is immediately useful if we want NFS mount options to modulate settings of the underlying RPC transport, but it may be useful at some later point if other parts of the NFSv4 mount initialization logic want to know what the mount options are. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23NFS: expand flags passed to nfs_create_rpc_client()Chuck Lever
The nfs_create_rpc_client() function sets up an RPC client for an NFS mount point. Add an option that allows it to set up an RPC transport from an unprivileged port. Instead of having nfs_create_rpc_client()'s callers retain local knowledge about how to set up an RPC client, create a couple of flag arguments to control the use of RPC_CLNT_CREATE flags. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23NFS: introduce nfs_mount_info struct for calling nfs_mount()Chuck Lever
Clean up: convert nfs_mount() to take a single data structure argument to make it simpler to add more arguments. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23NFS: Move declaration of nfs_mount() to fs/nfs/internal.hChuck Lever
Clean up: The nfs_mount() function is not to be used outside of the NFS client. Move its public declaration to fs/nfs/internal.h. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23NFS: rename nfs_path variableChuck Lever
Clean up: I'm about to move the declaration of nfs_mount into fs/nfs/internal.h and include it in fs/nfs/nfsroot.c. There's a conflicting definition of nfs_path in fs/nfs/internal.h and fs/nfs/nfsroot.c, so rename the private one. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23nfs: remove redundant tests on reading new pagesWu Fengguang
aops->readpages() and its NFS helper readpage_async_filler() will only be called to do readahead I/O for newly allocated pages. So it's not necessary to test for the always 0 dirty/uptodate page flags. The removal of nfs_wb_page() call also fixes a readahead bug: the NFS readahead has been synchronous since 2.6.23, because that call will clear PG_readahead, which is the reminder for asynchronous readahead. More background: the PG_readahead page flag is shared with PG_reclaim, one for read path and the other for write path. clear_page_dirty_for_io() unconditionally clears PG_readahead to prevent possible readahead residuals, assuming itself to be always called in the write path. However, NFS is one and the only exception in that it _always_ calls clear_page_dirty_for_io() in the read path, i.e. for readpages()/readpage(). Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Wu Fengguang <wfg@linux.intel.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>