aboutsummaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2006-01-06SUNRPC: new interface to force an RPC rebindChuck Lever
We'd like to hide fields in rpc_xprt and rpc_clnt from upper layer protocols. Start by creating an API to force RPC rebind, replacing logic that simply sets cl_port to zero. Test-plan: Destructive testing (unplugging the network temporarily). Connectathon with UDP and TCP. NFSv2/3 and NFSv4 mounting should be carefully checked. Probably need to rig a server where certain services aren't running, or that returns an error for some typical operation. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06SUNRPC: switchable buffer allocationChuck Lever
Add RPC client transport switch support for replacing buffer management on a per-transport basis. In the current IPv4 socket transport implementation, RPC buffers are allocated as needed for each RPC message that is sent. Some transport implementations may choose to use pre-allocated buffers for encoding, sending, receiving, and unmarshalling RPC messages, however. For transports capable of direct data placement, the buffers can be carved out of a pre-registered area of memory rather than from a slab cache. Test-plan: Millions of fsx operations. Performance characterization with "sio" and "iozone". Use oprofile and other tools to look for significant regression in CPU utilization. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NLM: Further cancel fixesJ. Bruce Fields
If the server receives an NLM cancel call and finds no waiting lock to cancel, then chances are the lock has already been applied, and the client just hadn't yet processed the NLM granted callback before it sent the cancel. The Open Group text, for example, perimts a server to return either success (LCK_GRANTED) or failure (LCK_DENIED) in this case. But returning an error seems more helpful; the client may be able to use it to recognize that a race has occurred and to recover from the race. So, modify the relevant functions to return an error in this case. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06SUNRPC: net/sunrpc/xdr.c: remove xdr_decode_string()Adrian Bunk
This patch removes ths unused function xdr_decode_string(). Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Neil Brown <neilb@suse.de> Acked-by: Charles Lever <Charles.Lever@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NFSv4: Allow user to set the port used by the NFSv4 callback channelTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NFSv4: Ensure DELEGRETURN returns attributesTrond Myklebust
Upon return of a write delegation, the server will almost always bump the change attribute. Ensure that we pick up that change so that we don't invalidate our data cache unnecessarily. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NFS: Make stat() return updated mtimes after a write()Trond Myklebust
The SuS states that a call to write() will cause mtime to be updated on the file. In order to satisfy that requirement, we need to flush out any cached writes in nfs_getattr(). Speed things up slightly by not committing the writes. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NFS: support large reads and writes on the wireChuck Lever
Most NFS server implementations allow up to 64KB reads and writes on the wire. The Solaris NFS server allows up to a megabyte, for instance. Now the Linux NFS client supports transfer sizes up to 1MB, too. This will help reduce protocol and context switch overhead on read/write intensive NFS workloads, and support larger atomic read and write operations on servers that support them. Test-plan: Connectathon and iozone on mount point with wsize=rsize>32768 over TCP. Tests with NFS over UDP to verify the maximum RPC payload size cap. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NFS: simplify inlined bit ops in nfs_page.hChuck Lever
Minor cleanup: inlined bit ops in nfs_page.h can be simpler. Test plan: Write-intensive workload against a server that requires COMMITs. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NFSv4: locking XDR cleanupTrond Myklebust
Get rid of some unnecessary intermediate structures Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NFSv4: Make open_confirm() asynchronous tooTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NFSv4: stateful NFSv4 RPC call interfaceTrond Myklebust
The NFSv4 model requires us to complete all RPC calls that might establish state on the server whether or not the user wants to interrupt it. We may also need to schedule new work (including new RPC calls) in order to cancel the new state. The asynchronous RPC model will allow us to ensure that RPC calls always complete, but in order to allow for "synchronous" RPC, we want to add the ability to wait for completion. The waits are, of course, interruptible. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06SUNRPC: Further cleanupsTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06RPC: Clean up RPC task structureTrond Myklebust
Shrink the RPC task structure. Instead of storing separate pointers for task->tk_exit and task->tk_release, put them in a structure. Also pass the user data pointer as a parameter instead of passing it via task->tk_calldata. This enables us to nest callbacks. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06SUNRPC: Yet more RPC cleanupsTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06identify multipage ->writepages() callsAndrew Morton
NFS needs to be able to distinguish between single-page ->writepage() calls and multipage ->writepages() calls. For the single-page writepage calls NFS can kick off the I/O within the context of ->writepage(). For multipage ->writepages calls, nfs_writepage() will leave the I/O pending and nfs_writepages() will kick off the I/O when it all has been queued up within NFS. Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06Merge branch 'post-2.6.15' of git://brick.kernel.dk/data/git/linux-2.6-blockLinus Torvalds
Manual fixup for merge with Jens' "Suspend support for libata", commit ID 9b847548663ef1039dd49f0eb4463d001e596bc3. Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] Suspend support for libataJens Axboe
This patch adds suspend patch to libata, and ata_piix in particular. For most low level drivers, they should just need to add the 4 hooks to work. As I can only test ata_piix, I didn't enable it for more though. Suspend support is the single most important feature on a notebook, and most new notebooks have sata drives. It's quite embarrassing that we _still_ do not support this. Right now, it's perfectly possible to suspend the drive in mid-transfer. Signed-off-by: Jens Axboe <axboe@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] md: allow sync-speed to be controlled per-deviceNeilBrown
Also export current (average) speed and status in sysfs. Signed-off-by: Neil Brown <neilb@suse.de> Acked-by: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] md: count corrected read errors per driveNeilBrown
Store this total in superblock (As appropriate), and make it available to userspace via sysfs. Signed-off-by: Neil Brown <neilb@suse.de> Acked-by: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] md: allow array level to be set textually via sysfsNeilBrown
Signed-off-by: Neil Brown <neilb@suse.de> Acked-by: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] md: make a couple of names in md.c staticNeilBrown
.. because they aren't used outside md.c Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] md: define and use safe_put_page for mdNeilBrown
md sometimes call put_page on NULL pointers (treating it like kfree). This is not safe, so define and use a 'safe_put_page' which checks for NULL. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] md: remove personality numbering from mdNeilBrown
md supports multiple different RAID level, each being implemented by a 'personality' (which is often in a separate module). These personalities have fairly artificial 'numbers'. The numbers are use to: 1- provide an index into an array where the various personalities are recorded 2- identify the module (via an alias) which implements are particular personality. Neither of these uses really justify the existence of personality numbers. The array can be replaced by a linked list which is searched (array lookup only happens very rarely). Module identification can be done using an alias based on level rather than 'personality' number. The current 'raid5' modules support two level (4 and 5) but only one personality. This slight awkwardness (which was handled in the mapping from level to personality) can be better handled by allowing raid5 to register 2 personalities. With this change in place, the core md module does not need to have an exhaustive list of all possible personalities, so other personalities can be added independently. This patch also moves the check for chunksize being non-zero into the ->run routines for the personalities that need it, rather than having it in core-md. This has a side effect of allowing 'faulty' and 'linear' not to have a chunk-size set. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] md: tidy up raid5/6 hash table codeNeilBrown
- replace open-coded hash chain with hlist macros - Fix hash-table size at one page - it is already quite generous, so there will never be a need to use multiple pages, so no need for __get_free_pages No functional change. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] md: raid10 read-error handling - resync and read-onlyNeilBrown
Add in correct read-error handling for resync and read-only situations. When read-only, we don't over-write, so we need to mark the failed drive in the r10_bio so we don't re-try it. During resync, we always read all blocks, so if there is a read error, we simply over-write it with the good block that we found (assuming we found one). Note that the recovery case still isn't handled in an interesting way. There is nothing useful to do for the 2-copies case. If there are 3 or more copies, then we could try reading from one of the non-missing copies, but this is a bit complicated and very rarely would be used, so I'm leaving it for now. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] md: auto-correct correctable read errors in raid10NeilBrown
Largely just a cross-port from raid1. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] md: fix up some rdev rcu locking in raid5/6NeilBrown
There is this "FIXME" comment with a typo in it!! that been annoying me for days, so I just had to remove it. conf->disks[i].rdev should only be accessed if - we know we hold a reference or - the mddev->reconfig_sem is down or - we have a rcu_readlock handle_stripe was referencing rdev in three places without any of these. For the first two, get an rcu_readlock. For the last, the same access (md_sync_acct call) is made a little later after the rdev has been claimed under and rcu_readlock, if R5_Syncio is set. So just use that access... However R5_Syncio isn't really needed as the 'syncing' variable contains the same information. So use that instead. Issues, comment, and fix are identical in raid5 and raid6. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] md: handle errors when read-onlyNeilBrown
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] md: attempt to auto-correct read errors in raid1NeilBrown
On a read-error we suspend the array, then synchronously read the block from other arrays until we find one where we can read it. Then we try writing the good data back everywhere and make sure it works. If any write or subsequent read fails, only then do we fail the device out of the array. To be able to suspend the array, we need to also keep track of how many requests are queued for handling by raid1d. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] md: fix raid6 resync check/repair codeNeilBrown
raid6 currently does not check the P/Q syndromes when doing a resync, it just calculates the correct value and writes it. Doing the check can reduce writes (often to 0) for a resync, and it is needed to properly implement the echo check > sync_action operation. This patch implements the appropriate checks and tidies up some related code. It also allows raid6 user-requested resync to bypass the intent bitmap. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] md: write intent bitmap support for raid10NeilBrown
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] md: allow dirty raid[456] arrays to be started at bootNeilBrown
See patch to md.txt for more details Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] md: improve raid10 "IO Barrier" conceptNeilBrown
raid10 needs to put up a barrier to new requests while it does resync or other background recovery. The code for this is currently open-coded, slighty obscure by its use of two waitqueues, and not documented. This patch gathers all the related code into 4 functions, and includes a comment which (hopefully) explains what is happening. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] md: improve raid1 "IO Barrier" conceptNeilBrown
raid1 needs to put up a barrier to new requests while it does resync or other background recovery. The code for this is currently open-coded, slighty obscure by its use of two waitqueues, and not documented. This patch gathers all the related code into 4 functions, and includes a comment which (hopefully) explains what is happening. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] device-mapper ioctl: add skip lock_fs flagAlasdair G Kergon
Add ioctl DM_SKIP_LOCKFS_FLAG for userspace to request that lock_fs is bypassed when suspending a device. There's no change to the behaviour of existing code that doesn't know about the new flag. Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] knfsd: check error status from vfs_getattr and i_op->fsyncDavid Shaw
Both vfs_getattr and i_op->fsync return error statuses which nfsd was largely ignoring. This as noticed when exporting directories using fuse. This patch cleans up most of the offences, which involves moving the call to vfs_getattr out of the xdr encoding routines (where it is too late to report an error) into the main NFS procedure handling routines. There is still a called to vfs_gettattr (related to the ACL code) where the status is ignored, and called to nfsd_sync_dir don't check return status either. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] jbd: split checkpoint listsJan Kara
Split the checkpoint list of the transaction into two lists. In the first list we keep the buffers that need to be submitted for IO. In the second list are kept buffers that were already submitted and we just have to wait for the IO to complete. This should simplify a handling of checkpoint lists a bit and can eventually be also a performance gain. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] include/linux/parport_pc.h: "extern inline" -> "static inline"Adrian Bunk
"extern inline" doesn't make much sense. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] parport: DEBUG_PARPORT build fixMarko Kohtala
Add missing "struct" keyword preventing compilation with DEBUG_PARPORT defined. Also add some "const". Signed-off-by: Marko Kohtala <marko.kohtala@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] parport: phase fixesMarko Kohtala
Did not move the parport interface properly into IEEE1284_PH_REV_IDLE phase at end of data due to comparing bytes with nibbles. Internal phase IEEE1284_PH_HBUSY_DNA became unused, so remove it. Signed-off-by: Marko Kohtala <marko.kohtala@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] fuse: make maximum write data configurableMiklos Szeredi
Make the maximum size of write data configurable by the filesystem. The previous fixed 4096 limit only worked on architectures where the page size is less or equal to this. This change make writing work on other architectures too, and also lets the filesystem receive bigger write requests in direct_io mode. Normal writes which go through the page cache are still limited to a page sized chunk per request. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] fuse: clean up request size limit checkingMiklos Szeredi
Change the way a too large request is handled. Until now in this case the device read returned -EINVAL and the operation returned -EIO. Make it more flexibible by not returning -EINVAL from the read, but restarting it instead. Also remove the fixed limit on setxattr data and let the filesystem provide as large a read buffer as it needs to handle the extended attribute data. The symbolic link length is already checked by VFS to be less than PATH_MAX, so the extra check against FUSE_SYMLINK_MAX is not needed. The check in fuse_create_open() against FUSE_NAME_MAX is not needed, since the dentry has already been looked up, and hence the name already checked. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] fuse: add frsize to statfs replyMiklos Szeredi
Add 'frsize' member to the statfs reply. I'm not sure if sending f_fsid will ever be needed, but just in case leave some space at the end of the structure, so less compatibility mess would be required. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] fuse: bump interface versionMiklos Szeredi
Change interface version to 7.4. Following changes will need backward compatibility support, so store the minor version returned by userspace. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] I2O: BugfixesMarkus Lidel
- Removed some kmalloc's with __GFP_ZERO and replace it with memset() because it didn't work properly. - Fixed returned message frame in i2o_cfg_passthru() which caused raidutils to display wrong error message in case a disk was missing. - Fixed size of printk() in i2o_scsi.c. - Fixed get_device() and put_device() in probing of the I2O controller. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] I2O: Remove wrong I2O device classMarkus Lidel
Removed wrong I2O device class, which was only needed to add sysfs attributes. Signed-off-by: Markus Lidel <Markus.Lidel@shadowconnect.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] I2O: changed I2O API to create I2O messages in kernel memoryMarkus Lidel
Changed the I2O API to create I2O messages first in kernel memory and then transfer it at once over the PCI bus instead of sending each quad-word over the PCI bus. Signed-off-by: Markus Lidel <Markus.Lidel@shadowconnect.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] s390: cleanup KconfigMartin Schwidefsky
Sanitize some s390 Kconfig options. We have ARCH_S390, ARCH_S390X, ARCH_S390_31, 64BIT, S390_SUPPORT and COMPAT. Replace these 6 options by S390, 64BIT and COMPAT. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] mm: add a new function (needed for swap suspend)Rafael J. Wysocki
This adds the function get_swap_page_of_type() allowing us to specify an index in swap_info[] and select a swap_info_struct structure to be used for allocating a swap page. This function (or another one of similar functionality) will be necessary for implementing the image-writing part of swsusp in the user space.  It can also be used for simplifying the current in-kernel implementation of the image-writing part of swsusp. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>