aboutsummaryrefslogtreecommitdiff
path: root/fs/nfs/write.c
AgeCommit message (Collapse)Author
2006-10-20[PATCH] NFS: Fix oops in nfs_cancel_commit_listTrond Myklebust
Fix two bugs: - nfs_inode_remove_request will call nfs_clear_request, so we cannot reference req->wb_page after it. Move the call to dec_zone_page_state so that it occurs while req->wb_page is still valid. - Calling nfs_clear_page_writeback is unnecessary since the radix tree tags will have been cleared by the call to nfs_inode_remove_request. Replace with a simple call to nfs_unlock_request. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-20[PATCH] separate bdi congestion functions from queue congestion functionsAndrew Morton
Separate out the concept of "queue congestion" from "backing-dev congestion". Congestion is a backing-dev concept, not a queue concept. The blk_* congestion functions are retained, as wrappers around the core backing-dev congestion functions. This proper layering is needed so that NFS can cleanly use the congestion functions, and so that CONFIG_BLOCK=n actually links. Cc: "Thomas Maier" <balagi@justmail.de> Cc: "Jens Axboe" <jens.axboe@oracle.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: David Howells <dhowells@redhat.com> Cc: Peter Osterlund <petero2@telia.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-30[PATCH] BLOCK: Remove no-longer necessary linux/mpage.h inclusions [try #6]David Howells
Remove inclusions of linux/mpage.h that are no longer necessary due to the transfer of generic_writepages(). Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2006-09-27[PATCH] Really ignore kmem_cache_destroy return valueAlexey Dobriyan
* Rougly half of callers already do it by not checking return value * Code in drivers/acpi/osl.c does the following to be sure: (void)kmem_cache_destroy(cache); * Those who check it printk something, however, slab_error already printed the name of failed cache. * XFS BUGs on failed kmem_cache_destroy which is not the decision low-level filesystem driver should make. Converted to ignore. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-22NFS: add comments clarifying the use of nfs_post_op_update()Chuck Lever
Comments-only change to clarify a detail of the NFS protocol and how it is implemented in Linux. Test plan: None. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-09-22Add a real API for dealing with blk_congestion_wait()Trond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-09-22NFS: Share NFS superblocks per-protocol per-server per-FSIDDavid Howells
The attached patch makes NFS share superblocks between mounts from the same server and FSID over the same protocol. It does this by creating each superblock with a false root and returning the real root dentry in the vfsmount presented by get_sb(). The root dentry set starts off as an anonymous dentry if we don't already have the dentry for its inode, otherwise it simply returns the dentry we already have. We may thus end up with several trees of dentries in the superblock, and if at some later point one of anonymous tree roots is discovered by normal filesystem activity to be located in another tree within the superblock, the anonymous root is named and materialises attached to the second tree at the appropriate point. Why do it this way? Why not pass an extra argument to the mount() syscall to indicate the subpath and then pathwalk from the server root to the desired directory? You can't guarantee this will work for two reasons: (1) The root and intervening nodes may not be accessible to the client. With NFS2 and NFS3, for instance, mountd is called on the server to get the filehandle for the tip of a path. mountd won't give us handles for anything we don't have permission to access, and so we can't set up NFS inodes for such nodes, and so can't easily set up dentries (we'd have to have ghost inodes or something). With this patch we don't actually create dentries until we get handles from the server that we can use to set up their inodes, and we don't actually bind them into the tree until we know for sure where they go. (2) Inaccessible symbolic links. If we're asked to mount two exports from the server, eg: mount warthog:/warthog/aaa/xxx /mmm mount warthog:/warthog/bbb/yyy /nnn We may not be able to access anything nearer the root than xxx and yyy, but we may find out later that /mmm/www/yyy, say, is actually the same directory as the one mounted on /nnn. What we might then find out, for example, is that /warthog/bbb was actually a symbolic link to /warthog/aaa/xxx/www, but we can't actually determine that by talking to the server until /warthog is made available by NFS. This would lead to having constructed an errneous dentry tree which we can't easily fix. We can end up with a dentry marked as a directory when it should actually be a symlink, or we could end up with an apparently hardlinked directory. With this patch we need not make assumptions about the type of a dentry for which we can't retrieve information, nor need we assume we know its place in the grand scheme of things until we actually see that place. This patch reduces the possibility of aliasing in the inode and page caches for inodes that may be accessed by more than one NFS export. It also reduces the number of superblocks required for NFS where there are many NFS exports being used from a server (home directory server + autofs for example). This in turn makes it simpler to do local caching of network filesystems, as it can then be guaranteed that there won't be links from multiple inodes in separate superblocks to the same cache file. Obviously, cache aliasing between different levels of NFS protocol could still be a problem, but at least that gives us another key to use when indexing the cache. This patch makes the following changes: (1) The server record construction/destruction has been abstracted out into its own set of functions to make things easier to get right. These have been moved into fs/nfs/client.c. All the code in fs/nfs/client.c has to do with the management of connections to servers, and doesn't touch superblocks in any way; the remaining code in fs/nfs/super.c has to do with VFS superblock management. (2) The sequence of events undertaken by NFS mount is now reordered: (a) A volume representation (struct nfs_server) is allocated. (b) A server representation (struct nfs_client) is acquired. This may be allocated or shared, and is keyed on server address, port and NFS version. (c) If allocated, the client representation is initialised. The state member variable of nfs_client is used to prevent a race during initialisation from two mounts. (d) For NFS4 a simple pathwalk is performed, walking from FH to FH to find the root filehandle for the mount (fs/nfs/getroot.c). For NFS2/3 we are given the root FH in advance. (e) The volume FSID is probed for on the root FH. (f) The volume representation is initialised from the FSINFO record retrieved on the root FH. (g) sget() is called to acquire a superblock. This may be allocated or shared, keyed on client pointer and FSID. (h) If allocated, the superblock is initialised. (i) If the superblock is shared, then the new nfs_server record is discarded. (j) The root dentry for this mount is looked up from the root FH. (k) The root dentry for this mount is assigned to the vfsmount. (3) nfs_readdir_lookup() creates dentries for each of the entries readdir() returns; this function now attaches disconnected trees from alternate roots that happen to be discovered attached to a directory being read (in the same way nfs_lookup() is made to do for lookup ops). The new d_materialise_unique() function is now used to do this, thus permitting the whole thing to be done under one set of locks, and thus avoiding any race between mount and lookup operations on the same directory. (4) The client management code uses a new debug facility: NFSDBG_CLIENT which is set by echoing 1024 to /proc/net/sunrpc/nfs_debug. (5) Clone mounts are now called xdev mounts. (6) Use the dentry passed to the statfs() op as the handle for retrieving fs statistics rather than the root dentry of the superblock (which is now a dummy). Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-09-19NFS: Fix nfs_page use after free issues in fs/nfs/write.cTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-09-08[PATCH] NFS: large non-page-aligned direct I/O clobbers memoryTrond Myklebust
The logic in nfs_direct_read_schedule and nfs_direct_write_schedule can allow data->npages to be one larger than rpages. This causes a page pointer to be written beyond the end of the pagevec in nfs_read_data (or nfs_write_data). Fix this by making nfs_(read|write)_alloc() calculate the size of the pagevec array, and initialise data->npages. Also get rid of the redundant argument to nfs_commit_alloc(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-08-03NFS: make 2 functions staticAdrian Bunk
nfs_writedata_free() and nfs_readdata_free() can now become static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> (cherry picked from 5e1ce40f0c3c8f67591aff17756930d7a18ceb1a commit)
2006-07-05NFS: Fix NFS page_state usageTrond Myklebust
The introduction of the FLUSH_INVALIDATE argument to nfs_sync_inode_wait() does not clear the nr_unstable page state counter for pages that are being released. Also fix a longstanding similar bug when nfs_commit_list() fails. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-06-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivialLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: Remove obsolete #include <linux/config.h> remove obsolete swsusp_encrypt arch/arm26/Kconfig typos Documentation/IPMI typos Kconfig: Typos in net/sched/Kconfig v9fs: do not include linux/version.h Documentation/DocBook/mtdnand.tmpl: typo fixes typo fixes: specfic -> specific typo fixes in Documentation/networking/pktgen.txt typo fixes: occuring -> occurring typo fixes: infomation -> information typo fixes: disadvantadge -> disadvantage typo fixes: aquire -> acquire typo fixes: mecanism -> mechanism typo fixes: bandwith -> bandwidth fix a typo in the RTC_CLASS help text smb is no longer maintained Manually merged trivial conflict in arch/um/kernel/vmlinux.lds.S
2006-06-30[PATCH] zoned vm counters: conversion of nr_unstable to per zone counterChristoph Lameter
Conversion of nr_unstable to a per zone counter We need to do some special modifications to the nfs code since there are multiple cases of disposition and we need to have a page ref for proper accounting. This converts the last critical page state of the VM and therefore we need to remove several functions that were depending on GET_PAGE_STATE_LAST in order to make the kernel compile again. We are only left with event type counters in page state. [akpm@osdl.org: bugfixes] Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30[PATCH] zoned vm counters: conversion of nr_dirty to per zone counterChristoph Lameter
This makes nr_dirty a per zone counter. Looping over all processors is avoided during writeback state determination. The counter aggregation for nr_dirty had to be undone in the NFS layer since we summed up the page counts from multiple zones. Someone more familiar with NFS should probably review what I have done. [akpm@osdl.org: bugfix] Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30Remove obsolete #include <linux/config.h>Jörn Engel
Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-27[PATCH] fix static linking of NFSDavid Brownell
Builds on ARM report link problems with common configurations like statically linked NFS (for nfsroot). The symptom is that __init section code references __exit section code; that won't work since the exit sections are discarded (since they can never be called). The best fix for these particular cases would be an "__init_or_exit" section annotation. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-09NFS: Split fs/nfs/inode.cDavid Howells
As fs/nfs/inode.c is rather large, heterogenous and unwieldy, the attached patch splits it up into a number of files: (*) fs/nfs/inode.c Strictly inode specific functions. (*) fs/nfs/super.c Superblock management functions for NFS and NFS4, normal access, clones and referrals. The NFS4 superblock functions _could_ move out into a separate conditionally compiled file, but it's probably not worth it as there're so many common bits. (*) fs/nfs/namespace.c Some namespace-specific functions have been moved here. (*) fs/nfs/nfs4namespace.c NFS4-specific namespace functions (this could be merged into the previous file). This file is conditionally compiled. (*) fs/nfs/internal.h Inter-file declarations, plus a few simple utility functions moved from fs/nfs/inode.c. Additionally, all the in-.c-file externs have been moved here, and those files they were moved from now includes this file. For the most part, the functions have not been changed, only some multiplexor functions have changed significantly. I've also: (*) Added some extra banner comments above some functions. (*) Rearranged the function order within the files to be more logical and better grouped (IMO), though someone may prefer a different order. (*) Reduced the number of #ifdefs in .c files. (*) Added missing __init and __exit directives. Signed-Off-By: David Howells <dhowells@redhat.com>
2006-06-09NFS: Flesh out nfs_invalidate_page()Trond Myklebust
In the case of a call to truncate_inode_pages(), we should really try to cancel any pending writes on the page. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-06-09NFS: Optimize allocation of nfs_read/write_data structuresChuck Lever
Clean up use of page_array, and fix an off-by-one error noticed by Tom Talpey which causes kmalloc calls in cases where using the page_array is sufficient. Test plan: Normal client functional testing with r/wsize=32768. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-26[PATCH] mempool: use mempool_create_slab_pool()Matthew Dobson
Modify well over a dozen mempool users to call mempool_create_slab_pool() rather than calling mempool_create() with extra arguments, saving about 30 lines of code and increasing readability. Signed-off-by: Matthew Dobson <colpatch@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-20NFS: Fix a race in nfs_sync_inode()Trond Myklebust
Kudos to Neil Brown for spotting the problem: "in nfs_sync_inode, there is effectively the sequence: nfs_wait_on_requests nfs_flush_inode nfs_commit_inode This seems a bit racy to me as if the only requests are on the ->commit list, and nfs_commit_inode is called separately after nfs_wait_on_requests completes, and before nfs_commit_inode start (say: by nfs_write_inode) then none of these function will return >0, yet there will be some pending request that aren't waited for." The solution is to search for requests to wait upon, search for dirty requests, and search for uncommitted requests while holding the nfsi->req_lock The patch also cleans up nfs_sync_inode(), getting rid of the redundant FLUSH_WAIT flag. It turns out that we were always setting it. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFS: Clean up nfs_flush_list()Trond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFS: Fix a race with PG_private and nfs_release_page()Trond Myklebust
We don't need to set PG_private for readahead pages, since they never get unlocked while I/O is in progress. However there is a small race in nfs_readpage_release() whereby the page may be unlocked, and have PG_private set. Fix is to have PG_private set only for the case of writes... Also fix a bug in nfs_clear_page_writeback(): Don't attempt to clear the radix_tree tag if we've already deleted the radix tree entry. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFS: Uninline nfs_writedata_(alloc|free) and nfs_readdata_(alloc|free)Trond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFS: Make nfs_commit_alloc() externTrond Myklebust
We need to use nfs_commit_alloc() in fs/nfs/direct.c. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFS: make direct write path generate write requests concurrentlyChuck Lever
Duplicate infrastructure from direct read path that will allow write path to generate multiple write requests concurrently. This will enable us to add support for aio in this path. Temporarily we will lose the ability to do UNSTABLE writes followed by a COMMIT in the direct write path. However, all applications I am aware of that use NFS O_DIRECT currently write in relatively small chunks, so this should not be inconvenient in any way. Test plan: Millions of fsx-odirect ops. OraSim. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFS: Cleanup of NFS write code in preparation for asynchronous o_directTrond Myklebust
This patch inverts the callback hierarchy for NFS write calls. Instead of having the NFSv2/v3/v4-specific code set up the RPC callback ops, we allow the original caller to do so. This allows for more flexibility w.r.t. how to set up and tear down the nfs_write_data structure while still allowing the NFSv3/v4 code to perform error handling. The greater flexibility is needed by the asynchronous O_DIRECT code, which wants to be able to hold on to the original nfs_write_data structures after the WRITE RPC call has completed in order to be able to replay them if the COMMIT call determines that the server has rebooted. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFS: add I/O performance countersChuck Lever
Invoke the byte and event counter macros where we want to count bytes and events. Clean-up: fix a possible NULL dereference in nfs_lock, and simplify nfs_file_open. Test-plan: fsx and iozone on UP and SMP systems, with and without pre-emption. Watch for memory overwrite bugs, and performance loss (significantly more CPU required per op). Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFS: kzalloc conversion in fs/nfsEric Sesterhenn
this converts fs/nfs to kzalloc() usage. compile tested with make allyesconfig Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFS: Fix buglet in fs/nfs/write.cNeil Brown
I've been reading through fs/nfs/write.c trying to track down a bug that seems to be related to pages loosing a refcount and getting freed too early (you interested in detail??) and I spotted a little bug which the following patch should fix. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NFS: Make stat() return updated mtimes after a write()Trond Myklebust
The SuS states that a call to write() will cause mtime to be updated on the file. In order to satisfy that requirement, we need to flush out any cached writes in nfs_getattr(). Speed things up slightly by not committing the writes. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NFS: support large reads and writes on the wireChuck Lever
Most NFS server implementations allow up to 64KB reads and writes on the wire. The Solaris NFS server allows up to a megabyte, for instance. Now the Linux NFS client supports transfer sizes up to 1MB, too. This will help reduce protocol and context switch overhead on read/write intensive NFS workloads, and support larger atomic read and write operations on servers that support them. Test-plan: Connectathon and iozone on mount point with wsize=rsize>32768 over TCP. Tests with NFS over UDP to verify the maximum RPC payload size cap. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06RPC: Clean up RPC task structureTrond Myklebust
Shrink the RPC task structure. Instead of storing separate pointers for task->tk_exit and task->tk_release, put them in a structure. Also pass the user data pointer as a parameter instead of passing it via task->tk_calldata. This enables us to nest callbacks. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NFS: Work correctly with single-page ->writepage() callsTrond Myklebust
Ensure that we always initiate flushing of data before we exit a single-page ->writepage() call. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-12-03NFS: use set_page_writeback() in the appropriate placesTrond Myklebust
Ensure that we use set_page_writeback() in the appropriate places to help the VM in keeping its page radix_tree in sync. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-11-04NFS,SUNRPC,NLM: fix unused variable warnings when CONFIG_SYSCTL is disabledChuck Lever
Fix some dprintk's so that NLM, NFS client, and RPC client compile cleanly if CONFIG_SYSCTL is disabled. Test plan: Compile kernel with CONFIG_NFS enabled and CONFIG_SYSCTL disabled. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-11-04NFSv4: Fix problem with OPEN_DOWNGRADETrond Myklebust
RFC 3530 states that for OPEN_DOWNGRADE "The share_access and share_deny bits specified must be exactly equal to the union of the share_access and share_deny bits specified for some subset of the OPENs in effect for current openowner on the current file. Setattr is currently violating the NFSv4 rules for OPEN_DOWNGRADE in that it may cause a downgrade from OPEN4_SHARE_ACCESS_BOTH to OPEN4_SHARE_ACCESS_WRITE despite the fact that there exists no open file with O_WRONLY access mode. Fix the problem by replacing nfs4_find_state() with a modified version of nfs_find_open_context(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-27NFS: Cleanup initialisation of struct nfs_fattrTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22[PATCH] NFS: Replace nfs_page insertion sort with a radix sortTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22[PATCH] NFS: Make searching and waiting on busy writeback requests more ↵Trond Myklebust
efficient. Basically copies the VFS's method for tracking writebacks and applies it to the struct nfs_page. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22[PATCH] NFS: Write optimization for short files and small O_SYNC writes.Trond Myklebust
Use stable writes if we can see that we are only going to put a single write on the wire. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22[PATCH] NFS: Fix the file size revalidationTrond Myklebust
Instead of looking at whether or not the file is open for writes before we accept to update the length using the server value, we should rather be looking at whether or not we are currently caching any writes. Failure to do so means in particular that we're not updating the file length correctly after obtaining a POSIX or BSD lock. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-04-16Linux-2.6.12-rc2Linus Torvalds
Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!