aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2007-07-17write support for preallocated blocksAmit Arora
This patch adds write support to the uninitialized extents that get created when a preallocation is done using fallocate(). It takes care of splitting the extents into multiple (upto three) extents and merging the new split extents with neighbouring ones, if possible. Signed-off-by: Amit Arora <aarora@in.ibm.com>
2007-07-17fallocate support in ext4Amit Arora
This patch implements ->fallocate() inode operation in ext4. With this patch users of ext4 file systems will be able to use fallocate() system call for persistent preallocation. Current implementation only supports preallocation for regular files (directories not supported as of date) with extent maps. This patch does not support block-mapped files currently. Only FALLOC_ALLOCATE and FALLOC_RESV_SPACE modes are being supported as of now. Signed-off-by: Amit Arora <aarora@in.ibm.com>
2007-07-17sys_fallocate() implementation on i386, x86_64 and powerpcAmit Arora
fallocate() is a new system call being proposed here which will allow applications to preallocate space to any file(s) in a file system. Each file system implementation that wants to use this feature will need to support an inode operation called ->fallocate(). Applications can use this feature to avoid fragmentation to certain level and thus get faster access speed. With preallocation, applications also get a guarantee of space for particular file(s) - even if later the the system becomes full. Currently, glibc provides an interface called posix_fallocate() which can be used for similar cause. Though this has the advantage of working on all file systems, but it is quite slow (since it writes zeroes to each block that has to be preallocated). Without a doubt, file systems can do this more efficiently within the kernel, by implementing the proposed fallocate() system call. It is expected that posix_fallocate() will be modified to call this new system call first and incase the kernel/filesystem does not implement it, it should fall back to the current implementation of writing zeroes to the new blocks. ToDos: 1. Implementation on other architectures (other than i386, x86_64, and ppc). Patches for s390(x) and ia64 are already available from previous posts, but it was decided that they should be added later once fallocate is in the mainline. Hence not including those patches in this take. 2. Changes to glibc, a) to support fallocate() system call b) to make posix_fallocate() and posix_fallocate64() call fallocate() Signed-off-by: Amit Arora <aarora@in.ibm.com>
2007-07-17Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs: 9p: fix debug compilation error
2007-07-17Merge branch 'uninit-var' of ↵Linus Torvalds
master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/misc-2.6 * 'uninit-var' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/misc-2.6: arch/i386/* fs/* ipc/*: mark variables with uninitialized_var() drivers/*: mark variables with uninitialized_var()
2007-07-17arch/i386/* fs/* ipc/*: mark variables with uninitialized_var()Jeff Garzik
Mark variables with uninitialized_var() if such a warning appears, and analysis proves that the var is initialized properly on all paths it is used. Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-07-17Introduce is_owner_or_cap() to wrap CAP_FOWNER use with fsuid checkSatyam Sharma
Introduce is_owner_or_cap() macro in fs.h, and convert over relevant users to it. This is done because we want to avoid bugs in the future where we check for only effective fsuid of the current task against a file's owning uid, without simultaneously checking for CAP_FOWNER as well, thus violating its semantics. [ XFS uses special macros and structures, and in general looked ... untouchable, so we leave it alone -- but it has been looked over. ] The (current->fsuid != inode->i_uid) check in generic_permission() and exec_permission_lite() is left alone, because those operations are covered by CAP_DAC_OVERRIDE and CAP_DAC_READ_SEARCH. Similarly operations falling under the purview of CAP_CHOWN and CAP_LEASE are also left alone. Signed-off-by: Satyam Sharma <ssatyam@cse.iitk.ac.in> Cc: Al Viro <viro@ftp.linux.org.uk> Acked-by: Serge E. Hallyn <serge@hallyn.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: (80 commits) KVM: Use CPU_DYING for disabling virtualization KVM: Tune hotplug/suspend IPIs KVM: Keep track of which cpus have virtualization enabled SMP: Allow smp_call_function_single() to current cpu i386: Allow smp_call_function_single() to current cpu x86_64: Allow smp_call_function_single() to current cpu HOTPLUG: Adapt thermal throttle to CPU_DYING HOTPLUG: Adapt cpuset hotplug callback to CPU_DYING HOTPLUG: Add CPU_DYING notifier KVM: Clean up #includes KVM: Remove kvmfs in favor of the anonymous inodes source KVM: SVM: Reliably detect if SVM was disabled by BIOS KVM: VMX: Remove unnecessary code in vmx_tlb_flush() KVM: MMU: Fix Wrong tlb flush order KVM: VMX: Reinitialize the real-mode tss when entering real mode KVM: Avoid useless memory write when possible KVM: Fix x86 emulator writeback KVM: Add support for in-kernel pio handlers KVM: VMX: Fix interrupt checking on lightweight exit KVM: Adds support for in-kernel mmio handlers ...
2007-07-17[CIFS] More whitespace/formatting fixes (noticed by checkpatch)Steve French
Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-07-17Couple fixes to fs/ecryptfs/inode.cMika Kukkonen
Following was uncovered by compiling the kernel with '-W' flag: CC [M] fs/ecryptfs/inode.o fs/ecryptfs/inode.c: In function ‘ecryptfs_lookup’: fs/ecryptfs/inode.c:304: warning: comparison of unsigned expression < 0 is always false fs/ecryptfs/inode.c: In function ‘ecryptfs_symlink’: fs/ecryptfs/inode.c:486: warning: comparison of unsigned expression < 0 is always false Function ecryptfs_encode_filename() can return -ENOMEM, so change the variables to plain int, as in the first case the only real use actually expects int, and in latter case there is no use beoynd the error check. Signed-off-by: Mika Kukkonen <mikukkon@iki.fi> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17knfsd: nfsd: enforce per-flavor id squashingJ. Bruce Fields
Allow root squashing to vary per-pseudoflavor, so that you can (for example) allow root access only when sufficiently strong security is in use. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17knfsd: nfsd: allow auth_sys nlm on rpcsec_gss exportsJ. Bruce Fields
Our clients (like other clients, as far as I know) use only auth_sys for nlm, even when using rpcsec_gss for the main nfs operations. Administrators that want to deny non-kerberos-authenticated locking requests will need to turn off NFS protocol versions less than 4.... Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17knfsd: nfsd4: secinfo handling without secinfo= optionJ. Bruce Fields
We could return some sort of error in the case where someone asks for secinfo on an export without the secinfo= option set--that'd be no worse than what we've been doing. But it's not really correct. So, hack up an approximate secinfo response in that case--it may not be complete, but it'll tell the client at least one acceptable security flavor. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17knfsd: nfsd4: implement secinfoAndy Adamson
Implement the secinfo operation. (Thanks to Usha Ketineni wrote an earlier version of this support.) Cc: Usha Ketineni <uketinen@us.ibm.com> Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17knfsd: nfsd: display export secinfo informationJ. Bruce Fields
Add secinfo information to the display in proc/net/sunrpc/nfsd.export/content. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17knfsd: nfsd: factor out code from show_expflagsJ. Bruce Fields
Factor out some code to be shared by secinfo display code. Remove some unnecessary conditional printing of commas where we know the condition is true. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17knfsd: nfsd4: make readonly access depend on pseudoflavorJ. Bruce Fields
Allow readonly access to vary depending on the pseudoflavor, using the flag passed with each pseudoflavor in the export downcall. The rest of the flags are ignored for now, though some day we might also allow id squashing to vary based on the flavor. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17knfsd: nfsd4: return nfserr_wrongsecAndy Adamson
Make the first actual use of the secinfo information by using it to return nfserr_wrongsec when an export is found that doesn't allow the flavor used on this request. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17knfsd: nfsd: factor nfsd_lookup into 2 piecesJ. Bruce Fields
Factor nfsd_lookup into nfsd_lookup_dentry, which finds the right dentry and export, and a second part which composes the filehandle (and which will later check the security flavor on the new export). No change in behavior. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17knfsd: nfsd: use ip-address-based domain in secinfo caseJ. Bruce Fields
With this patch, we fall back on using the gss/pseudoflavor only if we fail to find a matching auth_unix export that has a secinfo list. As long as sec= options aren't used, there's still no change in behavior here (except possibly for some additional auth_unix cache lookups, whose results will be ignored). The sec= option, however, is not actually enforced yet; later patches will add the necessary checks. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17knfsd: nfsd: set rq_client to ip-address-determined-domainJ. Bruce Fields
We want it to be possible for users to restrict exports both by IP address and by pseudoflavor. The pseudoflavor information has previously been passed using special auth_domains stored in the rq_client field. After the preceding patch that stored the pseudoflavor in rq_pflavor, that's now superfluous; so now we use rq_client for the ip information, as auth_null and auth_unix do. However, we keep around the special auth_domain in the rq_gssclient field for backwards compatibility purposes, so we can still do upcalls using the old "gss/pseudoflavor" auth_domain if upcalls using the unix domain to give us an appropriate export. This allows us to continue supporting old mountd. In fact, for this first patch, we always use the "gss/pseudoflavor" auth_domain (and only it) if it is available; thus rq_client is ignored in the auth_gss case, and this patch on its own makes no change in behavior; that will be left to later patches. Note on idmap: I'm almost tempted to just replace the auth_domain in the idmap upcall by a dummy value--no version of idmapd has ever used it, and it's unlikely anyone really wants to perform idmapping differently depending on the where the client is (they may want to perform *credential* mapping differently, but that's a different matter--the idmapper just handles id's used in getattr and setattr). But I'm updating the idmapd code anyway, just out of general backwards-compatibility paranoia. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17knfsd: nfsd: provide export lookup wrappers which take a svc_rqstJ. Bruce Fields
Split the callers of exp_get_by_name(), exp_find(), and exp_parent() into those that are processing requests and those that are doing other stuff (like looking up filehandles for mountd). No change in behavior, just a (fairly pointless, on its own) cleanup. (Note this has the effect of making nfsd_cross_mnt() pass rqstp->rq_client instead of exp->ex_client into exp_find_by_name(). However, the two should have the same value at this point.) Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17knfsd: nfsd: remove superfluous assignment from nfsd_lookupJ. Bruce Fields
The "err" variable will only be used in the final return, which always happens after either the preceding err = fh_compose(...); or after the following err = nfserrno(host_err); So the earlier assignment to err is ignored. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17knfsd: nfsd4: simplify exp_pseudoroot argumentsJ. Bruce Fields
We're passing three arguments to exp_pseudoroot, two of which are just fields of the svc_rqst. Soon we'll want to pass in a third field as well. So let's just give up and pass in the whole struct svc_rqst. Also sneak in some minor style cleanups while we're at it. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17knfsd: nfsd4: parse secinfo information in exports downcallAndy Adamson
We add a list of pseudoflavors to each export downcall, which will be used both as a list of security flavors allowed on that export, and (in the order given) as the list of pseudoflavors to return on secinfo calls. This patch parses the new downcall information and adds it to the export structure, but doesn't use it for anything yet. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17knfsd: nfsd4: build rpcsec_gss whenever nfsd4 is builtJ. Bruce Fields
Select rpcsec_gss support whenever asked for NFSv4 support. The rfc actually requires gss, and gss is also the main reason to migrate to v4. We already do this on the client side. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17knfsd: nfsd: make all exp_finding functions return -errno's on errJ. Bruce Fields
Currently exp_find(), exp_get_by_name(), and friends, return an export on success, and on failure return: errors -EAGAIN (drop this request pending an upcall) or -ETIMEDOUT (an upcall has timed out), or return NULL, which can mean either that there was a memory allocation failure, or that an export was not found, or that a passed-in export lacks an auth_domain. Many callers seem to assume that NULL means that an export was not found, which may lead to bugs in the case of a memory allocation failure. Modify these functions to distinguish between the two NULL cases by returning either -ENOENT or -ENOMEM. They now never return NULL. We get to simplify some code in the process. We return -ENOENT in the case of a missing auth_domain. This case should probably be removed (or converted to a bug) after confirming that it can never happen. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17knfsd: nfsd4: don't delegate files that have had conflictsMeelap Shah
One more incremental delegation policy improvement: don't give out a delegation on a file if conflicting access has previously required that a delegation be revoked on that file. (In practice we'll forget about the conflict when the struct nfs4_file is removed on close, so this is of limited use for now, though it should at least solve a temporary problem with self-conflicts on write opens from the same client.) Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17knfsd: nfsd4: vary maximum delegation limit based on RAM sizeMeelap Shah
Our original NFSv4 delegation policy was to give out a read delegation on any open when it was possible to. Since the lifetime of a delegation isn't limited to that of an open, a client may quite reasonably hang on to a delegation as long as it has the inode cached. This becomes an obvious problem the first time a client's inode cache approaches the size of the server's total memory. Our first quick solution was to add a hard-coded limit. This patch makes a mild incremental improvement by varying that limit according to the server's total memory size, allowing at most 4 delegations per megabyte of RAM. My quick back-of-the-envelope calculation finds that in the worst case (where every delegation is for a different inode), a delegation could take about 1.5K, which would make the worst case usage about 6% of memory. The new limit works out to be about the same as the old on a 1-gig server. [akpm@linux-foundation.org: Don't needlessly bloat vmlinux] [akpm@linux-foundation.org: Make it right for highmem machines] Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17knfsd: nfsd: remove unused header interface.hJ. Bruce Fields
It looks like Al Viro gutted this header file five years ago and it hasn't been touched since. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17knfsd: nfsd4: fix handling of acl errrorsJ. Bruce Fields
nfs4_acl_nfsv4_to_posix() returns an error and returns any posix acls calculated in two caller-provided pointers. It was setting these pointers to -errno in some error cases, resulting in nfsd4_set_nfs4_acl() calling posix_acl_release() with a -errno as an argument. Fix both the caller and the callee, by modifying nfsd4_set_nfs4_acl() to stop relying on the passed-in-pointers being left as NULL in the error case, and by modifying nfs4_acl_nfsv4_to_posix() to stop returning garbage in those pointers. Thanks to Alex Soule for reporting the bug. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Cc: Alexander Soule <soule@umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17knfsd: nfsd4: fix enc_stateid_sz for nfsd callbacksBenny Halevy
enc_stateid_sz should be given in u32 words units, not bytes, so we were overestimating the buffer space needed here. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17knfsd: nfsd4: silence a compiler warning in ACL codeJ. Bruce Fields
Silence a compiler warning in the ACL code, and add a comment making clear the initialization serves no other purpose. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17knfsd: lockd: nfsd4: use same grace period for lockd and nfsd4Marc Eshel
Both lockd and (in the nfsv4 case) nfsd enforce a "grace period" after reboot, during which clients may reclaim locks from the previous server instance, but may not acquire new locks. Currently the lockd and nfsd enforce grace periods of different lengths. This may cause problems when we reboot a server with both v2/v3 and v4 clients. For example, if the lockd grace period is shorter (as is likely the case), then a v3 client might acquire a new lock that conflicts with a lock already held (but not yet reclaimed) by a v4 client. This patch calculates a lease time that lockd and nfsd can both use. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17nfsd warning fixAndrew Morton
gcc-4.3: fs/nfsd/nfsctl.c: In function 'write_getfs': fs/nfsd/nfsctl.c:248: warning: cast from pointer to integer of different size Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17knfsd: exportfs: split out reconnecting a dentry from find_exported_dentryChristoph Hellwig
There's a clear subfunctionality of reconnecting a given dentry to the main dentry tree in find_exported_dentry, that can be called both for the dentry we're looking for or it's parent directory. This patch splits the subfunctionality out into a separate helper to make the code more readable and document it's intent. As a nice side-optimization we can avoid getting a superfluous dentry reference count in the case we need to reconnect a directory on it's own. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17knfsd: exportfs: add find_disconnected_root helperChristoph Hellwig
Break the loop that finds the root of a disconnected subtree into a helper of its own to make reading easier and document the intent. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17knfsd: exportfs: move acceptable check into find_acceptable_aliasChristoph Hellwig
All callers of find_acceptable_alias check if the current dentry is acceptable before looking for other acceptable aliases using find_acceptable_alias. Move the check into find_acceptable_alias to make the code a little more dense and add a comment to find_acceptable_alias that documents its intent. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17knfsd: exportfs: untangle ISDIR logic in find_exported_dentryChristoph Hellwig
Rework some logic in find_exported_dentry so that we only have a single S_ISDIR check and logic that makes clear to the reader what we're really doing here. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17knfsd: exportfs: remove CALL macroChristoph Hellwig
Currently exportfs uses a way to call methods very differently from the rest of the kernel. This patch changes it to the standard conventions for method calls. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17knfsd: exportfs: add procedural interface for NFSDChristoph Hellwig
Currently NFSD calls directly into filesystems through the export_operations structure. I plan to change this interface in various ways in later patches, and want to avoid the export of the default operations to NFSD, so this patch adds two simple exportfs_encode_fh/exportfs_decode_fh helpers for NFSD to call instead of poking into exportfs guts. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17knfsd: exportfs: remove iget abuseChristoph Hellwig
When the exportfs interface was added the expectation was that filesystems provide an operation to convert from a file handle to an inode/dentry, but it kept a backwards compat option that still calls into iget. Calling into iget from non-filesystem code is very bad, because it gives too little information to filesystem, and simply crashes if the filesystem doesn't implement the ->read_inode routine. Fortunately there are only two filesystems left using this fallback: efs and jfs. This patch moves a copy of export_iget to each of those to implement the get_dentry method. While this is a temporary increase of lines of code in the kernel it allows for a much cleaner interface and important code restructuring in later patches. [akpm@linux-foundation.org: add jfs_get_inode_flags() declaration] Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17knfsd: exportfs: add exportfs.h headerChristoph Hellwig
currently the export_operation structure and helpers related to it are in fs.h. fs.h is already far too large and there are very few places needing the export bits, so split them off into a separate header. [akpm@linux-foundation.org: fix cifs build] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Neil Brown <neilb@suse.de> Cc: Steven French <sfrench@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17kallsyms: make KSYM_NAME_LEN include space for trailing '\0'Tejun Heo
KSYM_NAME_LEN is peculiar in that it does not include the space for the trailing '\0', forcing all users to use KSYM_NAME_LEN + 1 when allocating buffer. This is nonsense and error-prone. Moreover, when the caller forgets that it's very likely to subtly bite back by corrupting the stack because the last position of the buffer is always cleared to zero. This patch increments KSYM_NAME_LEN by one and updates code accordingly. * off-by-one bug in asm-powerpc/kprobes.h::kprobe_lookup_name() macro is fixed. * Where MODULE_NAME_LEN and KSYM_NAME_LEN were used together, MODULE_NAME_LEN was treated as if it didn't include space for the trailing '\0'. Fix it. Signed-off-by: Tejun Heo <htejun@gmail.com> Acked-by: Paulo Marques <pmarques@grupopie.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17Freezer: make kernel threads nonfreezable by defaultRafael J. Wysocki
Currently, the freezer treats all tasks as freezable, except for the kernel threads that explicitly set the PF_NOFREEZE flag for themselves. This approach is problematic, since it requires every kernel thread to either set PF_NOFREEZE explicitly, or call try_to_freeze(), even if it doesn't care for the freezing of tasks at all. It seems better to only require the kernel threads that want to or need to be frozen to use some freezer-related code and to remove any freezer-related code from the other (nonfreezable) kernel threads, which is done in this patch. The patch causes all kernel threads to be nonfreezable by default (ie. to have PF_NOFREEZE set by default) and introduces the set_freezable() function that should be called by the freezable kernel threads in order to unset PF_NOFREEZE. It also makes all of the currently freezable kernel threads call set_freezable(), so it shouldn't cause any (intentional) change of behaviour to appear. Additionally, it updates documentation to describe the freezing of tasks more accurately. [akpm@linux-foundation.org: build fixes] Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Nigel Cunningham <nigel@nigel.suspend2.net> Cc: Pavel Machek <pavel@ucw.cz> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17fs: introduce some page/buffer invariantsNick Piggin
It is a bug to set a page dirty if it is not uptodate unless it has buffers. If the page has buffers, then the page may be dirty (some buffers dirty) but not uptodate (some buffers not uptodate). The exception to this rule is if the set_page_dirty caller is racing with truncate or invalidate. A buffer can not be set dirty if it is not uptodate. If either of these situations occurs, it indicates there could be some data loss problem. Some of these warnings could be a harmless one where the page or buffer is set uptodate immediately after it is dirtied, however we should fix those up, and enforce this ordering. Bring the order of operations for truncate into line with those of invalidate. This will prevent a page from being able to go !uptodate while we're holding the tree_lock, which is probably a good thing anyway. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17mm: clean up and kernelify shrinker registrationRusty Russell
I can never remember what the function to register to receive VM pressure is called. I have to trace down from __alloc_pages() to find it. It's called "set_shrinker()", and it needs Your Help. 1) Don't hide struct shrinker. It contains no magic. 2) Don't allocate "struct shrinker". It's not helpful. 3) Call them "register_shrinker" and "unregister_shrinker". 4) Call the function "shrink" not "shrinker". 5) Reduce the 17 lines of waffly comments to 13, but document it properly. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: David Chinner <dgc@sgi.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17Lumpy Reclaim V4Andy Whitcroft
When we are out of memory of a suitable size we enter reclaim. The current reclaim algorithm targets pages in LRU order, which is great for fairness at order-0 but highly unsuitable if you desire pages at higher orders. To get pages of higher order we must shoot down a very high proportion of memory; >95% in a lot of cases. This patch set adds a lumpy reclaim algorithm to the allocator. It targets groups of pages at the specified order anchored at the end of the active and inactive lists. This encourages groups of pages at the requested orders to move from active to inactive, and active to free lists. This behaviour is only triggered out of direct reclaim when higher order pages have been requested. This patch set is particularly effective when utilised with an anti-fragmentation scheme which groups pages of similar reclaimability together. This patch set is based on Peter Zijlstra's lumpy reclaim V2 patch which forms the foundation. Credit to Mel Gorman for sanitity checking. Mel said: The patches have an application with hugepage pool resizing. When lumpy-reclaim is used used with ZONE_MOVABLE, the hugepages pool can be resized with greater reliability. Testing on a desktop machine with 2GB of RAM showed that growing the hugepage pool with ZONE_MOVABLE on it's own was very slow as the success rate was quite low. Without lumpy-reclaim, each attempt to grow the pool by 100 pages would yield 1 or 2 hugepages. With lumpy-reclaim, getting 40 to 70 hugepages on each attempt was typical. [akpm@osdl.org: ia64 pfn_to_nid fixes and loop cleanup] [bunk@stusta.de: static declarations for internal functions] [a.p.zijlstra@chello.nl: initial lumpy V2 implementation] Signed-off-by: Andy Whitcroft <apw@shadowen.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Mel Gorman <mel@csn.ul.ie> Cc: Bob Picco <bob.picco@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17Add __GFP_MOVABLE for callers to flag allocations from high memory that may ↵Mel Gorman
be migrated It is often known at allocation time whether a page may be migrated or not. This patch adds a flag called __GFP_MOVABLE and a new mask called GFP_HIGH_MOVABLE. Allocations using the __GFP_MOVABLE can be either migrated using the page migration mechanism or reclaimed by syncing with backing storage and discarding. An API function very similar to alloc_zeroed_user_highpage() is added for __GFP_MOVABLE allocations called alloc_zeroed_user_highpage_movable(). The flags used by alloc_zeroed_user_highpage() are not changed because it would change the semantics of an existing API. After this patch is applied there are no in-kernel users of alloc_zeroed_user_highpage() so it probably should be marked deprecated if this patch is merged. Note that this patch includes a minor cleanup to the use of __GFP_ZERO in shmem.c to keep all flag modifications to inode->mapping in the shmem_dir_alloc() helper function. This clean-up suggestion is courtesy of Hugh Dickens. Additional credit goes to Christoph Lameter and Linus Torvalds for shaping the concept. Credit to Hugh Dickens for catching issues with shmem swap vector and ramfs allocations. [akpm@linux-foundation.org: build fix] [hugh@veritas.com: __GFP_ZERO cleanup] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: Andy Whitcroft <apw@shadowen.org> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-169p: fix debug compilation errorEric Van Hensbergen
s/9p/v9fs.c: In function 'v9fs_parse_options': fs/9p/v9fs.c:134: error: 'p9_debug_level' undeclared (first use in this function) Signed-off-by: Dave Jones <davej@redhat.com> Acked-by: Eric Van Hensbergen <ericvh@gmail.com>