aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2007-05-08[XFS] Fix to prevent the notorious 'NULL files' problem after a crash.Lachlan McIlroy
The problem that has been addressed is that of synchronising updates of the file size with writes that extend a file. Without the fix the update of a file's size, as a result of a write beyond eof, is independent of when the cached data is flushed to disk. Often the file size update would be written to the filesystem log before the data is flushed to disk. When a system crashes between these two events and the filesystem log is replayed on mount the file's size will be set but since the contents never made it to disk the file is full of holes. If some of the cached data was flushed to disk then it may just be a section of the file at the end that has holes. There are existing fixes to help alleviate this problem, particularly in the case where a file has been truncated, that force cached data to be flushed to disk when the file is closed. If the system crashes while the file(s) are still open then this flushing will never occur. The fix that we have implemented is to introduce a second file size, called the in-memory file size, that represents the current file size as viewed by the user. The existing file size, called the on-disk file size, is the one that get's written to the filesystem log and we only update it when it is safe to do so. When we write to a file beyond eof we only update the in- memory file size in the write operation. Later when the I/O operation, that flushes the cached data to disk completes, an I/O completion routine will update the on-disk file size. The on-disk file size will be updated to the maximum offset of the I/O or to the value of the in-memory file size if the I/O includes eof. SGI-PV: 958522 SGI-Modid: xfs-linux-melb:xfs-kern:28322a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-05-08[XFS] Fix race condition in xfs_write().Lachlan McIlroy
This change addresses a race in xfs_write() where, for direct I/O, the flags need_i_mutex and need_flush are setup before the iolock is acquired. The logic used to setup the flags may change between setting the flags and acquiring the iolock resulting in these flags having incorrect values. For example, if a file is not currently cached then need_i_mutex is set to zero and then if the file is cached before the iolock is acquired we will fail to do the flushinval before the direct write. The flush (and also the call to xfs_zero_eof()) need to be done with the iolock held exclusive so we need to acquire the iolock before checking for cached data (or if the write begins after eof) to prevent this state from changing. For direct I/O I've chosen to always acquire the iolock in shared mode initially and if there is a need to promote it then drop it and reacquire it. There's also some other tidy-ups including removing the O_APPEND offset adjustment since that work is done in generic_write_checks() (and we don't use offset as an input parameter anywhere). SGI-PV: 962170 SGI-Modid: xfs-linux-melb:xfs-kern:28319a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-05-08[XFS] Fix uquota and oquota enforcement problems.Kouta Ooizumi
When uquota and oquota (gquota/pquota) are enabled for accounting both are enforced if ether has enforcement active. Conditions: - Both XFS_UQUOTA_ACCT and XFS_GQUOTA_ACCT are enabled. - Either XFS_UQUOTA_ENFD or XFS_OQUOTA_ENFD is enabled. - The usage without enforce is reached at the soft limit. Problems: 1. "repquota" shows all grace time even if no enforcement. 2. we cannot make a file over a hard limits even if no enforcement. SGI-PV: 962291 SGI-Modid: xfs-linux-melb:xfs-kern:28272a Signed-off-by: Kouta Ooizumi <k-ooizumi@tnes.nec.co.jp> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-05-08[XFS] propogate return codes from flush routinesLachlan McIlroy
This patch handles error return values in fs_flush_pages and fs_flushinval_pages. It changes the prototype of fs_flushinval_pages so we can propogate the errors and handle them at higher layers. I also modified xfs_itruncate_start so that it could propogate the error further. SGI-PV: 961990 SGI-Modid: xfs-linux-melb:xfs-kern:28231a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Stewart Smith <stewart@flamingspork.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-05-08[XFS] Fix quotaon syscall failures for group enforcement requests.Donald Douwsma
xfs_qm_scall_quotaon was incorrectly failing requests to enable group quota enforcement. Fixes logic error in OQUOTA handling. SGI-PV: 961964 SGI-Modid: xfs-linux-melb:xfs-kern:28227a Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-05-08[XFS] Invalidate quotacheck when mounting without a quota type.Donald Douwsma
When quotas are mounted or remounted without a particular quota type the quota accounting for that type becomes invalid. Previously we were ignoring this leading to accounting errors. SGI-PV: 961964 SGI-Modid: xfs-linux-melb:xfs-kern:28225a Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Utako Kusaka <utako@tnes.nec.co.jp> Signed-off-by: Vlad Apostolov <vapo@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-05-08[XFS] reducing the number of random number functions.Joe Perches
Patch provided by Joe Perches SGI-PV: 961696 SGI-Modid: xfs-linux-melb:xfs-kern:28209a Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-05-08[XFS] remove more misc. unused argsEric Sandeen
Patch provided by Eric Sandeen. SGI-PV: 961695 SGI-Modid: xfs-linux-melb:xfs-kern:28205a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-05-08[XFS] the "aendp" arg to xfs_dir2_data_freescan is always NULL, remove it.Eric Sandeen
Patch provided by Eric Sandeen. SGI-PV: 961694 SGI-Modid: xfs-linux-melb:xfs-kern:28204a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-05-08[XFS] The last argument "lsn" of xfs_trans_commit() is always called withEric Sandeen
NULL. Patch provided by Eric Sandeen. SGI-PV: 961693 SGI-Modid: xfs-linux-melb:xfs-kern:28199a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
2007-05-07Merge branch 'server-cluster-locking-api' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
* 'server-cluster-locking-api' of git://linux-nfs.org/~bfields/linux: gfs2: nfs lock support for gfs2 lockd: add code to handle deferred lock requests lockd: always preallocate block in nlmsvc_lock() lockd: handle test_lock deferrals lockd: pass cookie in nlmsvc_testlock lockd: handle fl_grant callbacks lockd: save lock state on deferral locks: add fl_grant callback for asynchronous lock return nfsd4: Convert NFSv4 to new lock interface locks: add lock cancel command locks: allow {vfs,posix}_lock_file to return conflicting lock locks: factor out generic/filesystem switch from setlock code locks: factor out generic/filesystem switch from test_lock locks: give posix_test_lock same interface as ->lock locks: make ->lock release private data before returning in GETLK case locks: create posix-to-flock helper functions locks: trivial removal of unnecessary parentheses
2007-05-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmwLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (34 commits) [GFS2] Uncomment sprintf_symbol calling code [DLM] lowcomms style [GFS2] printk warning fixes [GFS2] Patch to fix mmap of stuffed files [GFS2] use lib/parser for parsing mount options [DLM] Lowcomms nodeid range & initialisation fixes [DLM] Fix dlm_lowcoms_stop hang [DLM] fix mode munging [GFS2] lockdump improvements [GFS2] Patch to detect corrupt number of dir entries in leaf and/or inode blocks [GFS2] bz 236008: Kernel gpf doing cat /debugfs/gfs2/xxx (lock dump) [DLM] fs/dlm/ast.c should #include "ast.h" [DLM] Consolidate transport protocols [DLM] Remove redundant assignment [GFS2] Fix bz 234168 (ignoring rgrp flags) [DLM] change lkid format [DLM] interface for purge (2/2) [DLM] add orphan purging code (1/2) [DLM] split create_message function [GFS2] Set drop_count to 0 (off) by default ...
2007-05-07blackfin architectureBryan Wu
This adds support for the Analog Devices Blackfin processor architecture, and currently supports the BF533, BF532, BF531, BF537, BF536, BF534, and BF561 (Dual Core) devices, with a variety of development platforms including those avaliable from Analog Devices (BF533-EZKit, BF533-STAMP, BF537-STAMP, BF561-EZKIT), and Bluetechnix! Tinyboards. The Blackfin architecture was jointly developed by Intel and Analog Devices Inc. (ADI) as the Micro Signal Architecture (MSA) core and introduced it in December of 2000. Since then ADI has put this core into its Blackfin processor family of devices. The Blackfin core has the advantages of a clean, orthogonal,RISC-like microprocessor instruction set. It combines a dual-MAC (Multiply/Accumulate), state-of-the-art signal processing engine and single-instruction, multiple-data (SIMD) multimedia capabilities into a single instruction-set architecture. The Blackfin architecture, including the instruction set, is described by the ADSP-BF53x/BF56x Blackfin Processor Programming Reference http://blackfin.uclinux.org/gf/download/frsrelease/29/2549/Blackfin_PRM.pdf The Blackfin processor is already supported by major releases of gcc, and there are binary and source rpms/tarballs for many architectures at: http://blackfin.uclinux.org/gf/project/toolchain/frs There is complete documentation, including "getting started" guides available at: http://docs.blackfin.uclinux.org/ which provides links to the sources and patches you will need in order to set up a cross-compiling environment for bfin-linux-uclibc This patch, as well as the other patches (toolchain, distribution, uClibc) are actively supported by Analog Devices Inc, at: http://blackfin.uclinux.org/ We have tested this on LTP, and our test plan (including pass/fails) can be found at: http://docs.blackfin.uclinux.org/doku.php?id=testing_the_linux_kernel [m.kozlowski@tuxland.pl: balance parenthesis in blackfin header files] Signed-off-by: Bryan Wu <bryan.wu@analog.com> Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl> Signed-off-by: Aubrey Li <aubrey.li@analog.com> Signed-off-by: Jie Zhang <jie.zhang@analog.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07hugetlbfs: add NULL check in hugetlb_zero_setup()Akinobu Mita
If hugetlbfs module_init() fails, hugetlbfs_vfsmount is not initialized and shmget() with SHM_HUGETLB flag will cause NULL pointer dereference. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: William Irwin <wli@holomorphy.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07slab allocators: Remove SLAB_DEBUG_INITIAL flagChristoph Lameter
I have never seen a use of SLAB_DEBUG_INITIAL. It is only supported by SLAB. I think its purpose was to have a callback after an object has been freed to verify that the state is the constructor state again? The callback is performed before each freeing of an object. I would think that it is much easier to check the object state manually before the free. That also places the check near the code object manipulation of the object. Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was compiled with SLAB debugging on. If there would be code in a constructor handling SLAB_DEBUG_INITIAL then it would have to be conditional on SLAB_DEBUG otherwise it would just be dead code. But there is no such code in the kernel. I think SLUB_DEBUG_INITIAL is too problematic to make real use of, difficult to understand and there are easier ways to accomplish the same effect (i.e. add debug code before kfree). There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be clear in fs inode caches. Remove the pointless checks (they would even be pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors. This is the last slab flag that SLUB did not support. Remove the check for unimplemented flags from SLUB. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07get_unmapped_area handles MAP_FIXED in hugetlbfsBenjamin Herrenschmidt
Generic hugetlb_get_unmapped_area() now handles MAP_FIXED by just calling prepare_hugepage_range() Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: William Irwin <bill.irwin@oracle.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: Andi Kleen <ak@suse.de> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Grant Grundler <grundler@parisc-linux.org> Cc: Matthew Wilcox <willy@debian.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Adam Litke <agl@us.ibm.com> Cc: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07KMEM_CACHE(): simplify slab cache creationChristoph Lameter
This patch provides a new macro KMEM_CACHE(<struct>, <flags>) to simplify slab creation. KMEM_CACHE creates a slab with the name of the struct, with the size of the struct and with the alignment of the struct. Additional slab flags may be specified if necessary. Example struct test_slab { int a,b,c; struct list_head; } __cacheline_aligned_in_smp; test_slab_cache = KMEM_CACHE(test_slab, SLAB_PANIC) will create a new slab named "test_slab" of the size sizeof(struct test_slab) and aligned to the alignment of test slab. If it fails then we panic. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07mm: optimize acorn partition truncatePeter Zijlstra
invalidate_bdev() is superfluous when truncate_inode_pages() is also called. do call invalidate_bh_lrus() though, to avoid stale pointers. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07mm: optimize kill_bdev()Peter Zijlstra
Remove duplicate work in kill_bdev(). It currently invalidates and then truncates the bdev's mapping. invalidate_mapping_pages() will opportunistically remove pages from the mapping. And truncate_inode_pages() will forcefully remove all pages. The only thing truncate doesn't do is flush the bh lrus. So do that explicitly. This avoids (very unlikely) but possible invalid lookup results if the same bdev is quickly re-issued. It also will prevent extreme kernel latencies which are observed when blockdevs which have a large amount of pagecache are unmounted, by avoiding invalidate_mapping_pages() on that path. invalidate_mapping_pages() has no cond_resched (it can be called under spinlock), whereas truncate_inode_pages() has one. [akpm@linux-foundation.org: restore nrpages==0 optimisation] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07mm: remove destroy_dirty_buffers from invalidate_bdev()Peter Zijlstra
Remove the destroy_dirty_buffers argument from invalidate_bdev(), it hasn't been used in 6 years (so akpm says). find * -name \*.[ch] | xargs grep -l invalidate_bdev | while read file; do quilt add $file; sed -ie 's/invalidate_bdev(\([^,]*\),[^)]*)/invalidate_bdev(\1)/g' $file; done Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07Make page->private usable in compound pagesChristoph Lameter
If we add a new flag so that we can distinguish between the first page and the tail pages then we can avoid to use page->private in the first page. page->private == page for the first page, so there is no real information in there. Freeing up page->private makes the use of compound pages more transparent. They become more usable like real pages. Right now we have to be careful f.e. if we are going beyond PAGE_SIZE allocations in the slab on i386 because we can then no longer use the private field. This is one of the issues that cause us not to support debugging for page size slabs in SLAB. Having page->private available for SLUB would allow more meta information in the page struct. I can probably avoid the 16 bit ints that I have in there right now. Also if page->private is available then a compound page may be equipped with buffer heads. This may free up the way for filesystems to support larger blocks than page size. We add PageTail as an alias of PageReclaim. Compound pages cannot currently be reclaimed. Because of the alias one needs to check PageCompound first. The RFC for the this approach was discussed at http://marc.info/?t=117574302800001&r=1&w=2 [nacc@us.ibm.com: fix hugetlbfs] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07smaps: add clear_refs file to clear referenceDavid Rientjes
Adds /proc/pid/clear_refs. When any non-zero number is written to this file, pte_mkold() and ClearPageReferenced() is called for each pte and its corresponding page, respectively, in that task's VMAs. This file is only writable by the user who owns the task. It is now possible to measure _approximately_ how much memory a task is using by clearing the reference bits with echo 1 > /proc/pid/clear_refs and checking the reference count for each VMA from the /proc/pid/smaps output at a measured time interval. For example, to observe the approximate change in memory footprint for a task, write a script that clears the references (echo 1 > /proc/pid/clear_refs), sleeps, and then greps for Pgs_Referenced and extracts the size in kB. Add the sizes for each VMA together for the total referenced footprint. Moments later, repeat the process and observe the difference. For example, using an efficient Mozilla: accumulated time referenced memory ---------------- ----------------- 0 s 408 kB 1 s 408 kB 2 s 556 kB 3 s 1028 kB 4 s 872 kB 5 s 1956 kB 6 s 416 kB 7 s 1560 kB 8 s 2336 kB 9 s 1044 kB 10 s 416 kB This is a valuable tool to get an approximate measurement of the memory footprint for a task. Cc: Hugh Dickins <hugh@veritas.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: David Rientjes <rientjes@google.com> [akpm@linux-foundation.org: build fixes] [mpm@selenic.com: rename for_each_pmd] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07smaps: add pages referenced count to smapsDavid Rientjes
Adds an additional unsigned long field to struct mem_size_stats called 'referenced'. For each pte walked in the smaps code, this field is incremented by PAGE_SIZE if it has pte-reference bits. An additional line was added to the /proc/pid/smaps output for each VMA to indicate how many pages within it are currently marked as referenced or accessed. Cc: Hugh Dickins <hugh@veritas.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07smaps: extract pmd walker from smaps codeDavid Rientjes
Extracts the pmd walker from smaps-specific code in fs/proc/task_mmu.c. The new struct pmd_walker includes the struct vm_area_struct of the memory to walk over. Iteration begins at the vma->vm_start and completes at vma->vm_end. A pointer to another data structure may be stored in the private field such as struct mem_size_stats, which acts as the smaps accumulator. For each pmd in the VMA, the action function is called with a pointer to its struct vm_area_struct, a pointer to the pmd_t, its start and end addresses, and the private field. The interface for walking pmd's in a VMA for fs/proc/task_mmu.c is now: void for_each_pmd(struct vm_area_struct *vma, void (*action)(struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr, unsigned long end, void *private), void *private); Since the pmd walker is now extracted from the smaps code, smaps_one_pmd() is invoked for each pmd in the VMA. Its behavior and efficiency is identical to the existing implementation. Cc: Hugh Dickins <hugh@veritas.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07mm/slab.c: proper prototypesAdrian Bunk
Add proper prototypes in include/linux/slab.h. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07fs: buffer don't PageUptodate without page lockedNick Piggin
__block_write_full_page is calling SetPageUptodate without the page locked. This is unusual, but not incorrect, as PG_writeback is still set. However the next patch will require that SetPageUptodate always be called with the page locked. Simply don't bother setting the page uptodate in this case (it is unusual that the write path does such a thing anyway). Instead just leave it to the read side to bring the page uptodate when it notices that all buffers are uptodate. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07mm: make read_cache_page synchronousNick Piggin
Ensure pages are uptodate after returning from read_cache_page, which allows us to cut out most of the filesystem-internal PageUptodate calls. I didn't have a great look down the call chains, but this appears to fixes 7 possible use-before uptodate in hfs, 2 in hfsplus, 1 in jfs, a few in ecryptfs, 1 in jffs2, and a possible cleared data overwritten with readpage in block2mtd. All depending on whether the filler is async and/or can return with a !uptodate page. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07proper prototype for hugetlb_get_unmapped_area()Adrian Bunk
Add a proper prototype for hugetlb_get_unmapped_area() in include/linux/hugetlb.h. Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: William Irwin <wli@holomorphy.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-06gfs2: nfs lock support for gfs2Marc Eshel
Add NFS lock support to GFS2. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Acked-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-06lockd: add code to handle deferred lock requestsMarc Eshel
Rewrite nlmsvc_lock() to use the asynchronous interface. As with testlock, we answer nlm requests in nlmsvc_lock by first looking up the block and then using the results we find in the block if B_QUEUED is set, and calling vfs_lock_file() otherwise. If this a new lock request and we get -EINPROGRESS return on a non-blocking request then we defer the request. Also modify nlmsvc_unlock() to call the filesystem method if appropriate. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2007-05-06lockd: always preallocate block in nlmsvc_lock()Marc Eshel
Normally we could skip ever having to allocate a block in the case where the client asks for a non-blocking lock, or asks for a blocking lock that succeeds immediately. However we're going to want to always look up a block first in order to check whether we're revisiting a deferred lock call, and to be prepared to handle the case where the filesystem returns -EINPROGRESS--in that case we want to make sure the lock we've given the filesystem is the one embedded in the block that we'll use to track the deferred request. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2007-05-06lockd: handle test_lock deferralsMarc Eshel
Rewrite nlmsvc_testlock() to use the new asynchronous interface: instead of immediately doing a posix_test_lock(), we first look for a matching block. If the subsequent test_lock returns anything other than -EINPROGRESS, we then remove the block we've found and return the results. If it returns -EINPROGRESS, then we defer the lock request. In the case where the block we find in the first step has B_QUEUED set, we bypass the vfs_test_lock entirely, instead using the block to decide how to respond: with nlm_lck_denied if B_TIMED_OUT is set. with nlm_granted if B_GOT_CALLBACK is set. by dropping if neither B_TIMED_OUT nor B_GOT_CALLBACK is set Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2007-05-06lockd: pass cookie in nlmsvc_testlockMarc Eshel
Change NLM internal interface to pass more information for test lock; we need this to make sure the cookie information is pushed down to the place where we do request deferral, which is handled for testlock by the following patch. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2007-05-06lockd: handle fl_grant callbacksMarc Eshel
Add code to handle file system callback when the lock is finally granted. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2007-05-06lockd: save lock state on deferralMarc Eshel
We need to keep some state for a pending asynchronous lock request, so this patch adds that state to struct nlm_block. This also adds a function which defers the request, by calling rqstp->rq_chandle.defer and storing the resulting deferred request in a nlm_block structure which we insert into lockd's global block list. That new function isn't called yet, so it's dead code until a later patch. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2007-05-06locks: add fl_grant callback for asynchronous lock returnMarc Eshel
Acquiring a lock on a cluster filesystem may require communication with remote hosts, and to avoid blocking lockd or nfsd threads during such communication, we allow the results to be returned asynchronously. When a ->lock() call needs to block, the file system will return -EINPROGRESS, and then later return the results with a call to the routine in the fl_grant field of the lock_manager_operations struct. This differs from the case when ->lock returns -EAGAIN to a blocking lock request; in that case, the filesystem calls fl_notify when the lock is granted, and the caller retries the original lock. So while fl_notify is merely a hint to the caller that it should retry, fl_grant actually communicates the final result of the lock operation (with the lock already acquired in the succesful case). Therefore fl_grant takes a lock, a status and, for the test lock case, a conflicting lock. We also allow fl_grant to return an error to the filesystem, to handle the case where the fl_grant requests arrives after the lock manager has already given up waiting for it. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2007-05-06nfsd4: Convert NFSv4 to new lock interfaceMarc Eshel
Convert NFSv4 to the new lock interface. We don't define any callback for now, so we're not taking advantage of the asynchronous feature--that's less critical for the multi-threaded nfsd then it is for the single-threaded lockd. But this does allow a cluster filesystems to export cluster-coherent locking to NFS. Note that it's cluster filesystems that are the issue--of the filesystems that define lock methods (nfs, cifs, etc.), most are not exportable by nfsd. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2007-05-06locks: add lock cancel commandMarc Eshel
Lock managers need to be able to cancel pending lock requests. In the case where the exported filesystem manages its own locks, it's not sufficient just to call posix_unblock_lock(); we need to let the filesystem know what's happening too. We do this by adding a new fcntl lock command: FL_CANCELLK. Some day this might also be made available to userspace applications that could benefit from an asynchronous locking api. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
2007-05-06locks: allow {vfs,posix}_lock_file to return conflicting lockMarc Eshel
The nfsv4 protocol's lock operation, in the case of a conflict, returns information about the conflicting lock. It's unclear how clients can use this, so for now we're not going so far as to add a filesystem method that can return a conflicting lock, but we may as well return something in the local case when it's easy to. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
2007-05-06locks: factor out generic/filesystem switch from setlock codeMarc Eshel
Factor out the code that switches between generic and filesystem-specific lock methods; eventually we want to call this from lock managers (lockd and nfsd) too; currently they only call the generic methods. This patch does that for all the setlk code. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
2007-05-06locks: factor out generic/filesystem switch from test_lockJ. Bruce Fields
Factor out the code that switches between generic and filesystem-specific lock methods; eventually we want to call this from lock managers (lockd and nfsd) too; currently they only call the generic methods. This patch does that for test_lock. Note that this hasn't been necessary until recently, because the few filesystems that define ->lock() (nfs, cifs...) aren't exportable via NFS. However GFS (and, in the future, other cluster filesystems) need to implement their own locking to get cluster-coherent locking, and also want to be able to export locking to NFS (lockd and NFSv4). So we accomplish this by factoring out code such as this and exporting it for the use of lockd and nfsd. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
2007-05-06locks: give posix_test_lock same interface as ->lockMarc Eshel
posix_test_lock() and ->lock() do the same job but have gratuitously different interfaces. Modify posix_test_lock() so the two agree, simplifying some code in the process. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
2007-05-06locks: make ->lock release private data before returning in GETLK caseJ. Bruce Fields
The file_lock argument to ->lock is used to return the conflicting lock when found. There's no reason for the filesystem to return any private information with this conflicting lock, but nfsv4 is. Fix nfsv4 client, and modify locks.c to stop calling fl_release_private for it in this case. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Cc: "Trond Myklebust" <Trond.Myklebust@netapp.com>"
2007-05-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: [CIFS] Fix typo in cifs readme from previous commit [CIFS] Make sec=none force an anonymous mount [CIFS] Change semaphore to mutex for cifs lock_sem [CIFS] Fix oops in reset_cifs_unix_caps on reconnect [CIFS] UID/GID override on CIFS mounts to Samba [CIFS] prefixpath mounts to servers supporting posix paths used wrong slash [CIFS] Update cifs version to 1.49 [CIFS] Replace kmalloc/memset combination with kzalloc [CIFS] Add IPv6 support [CIFS] New CIFS POSIX mkdir performance improvement (part 2) [CIFS] New CIFS POSIX mkdir performance improvement [CIFS] Add write perm for usr to file on windows should remove r/o dos attr [CIFS] Remove unnecessary parm to cifs_reopen_file [CIFS] Switch cifsd to kthread_run from kernel_thread [CIFS] Remove unnecessary checks
2007-05-05[CIFS] Fix typo in cifs readme from previous commitSteve French
Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-05-05Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6Linus Torvalds
* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (231 commits) [PATCH] i386: Don't delete cpu_devs data to identify different x86 types in late_initcall [PATCH] i386: type may be unused [PATCH] i386: Some additional chipset register values validation. [PATCH] i386: Add missing !X86_PAE dependincy to the 2G/2G split. [PATCH] x86-64: Don't exclude asm-offsets.c in Documentation/dontdiff [PATCH] i386: avoid redundant preempt_disable in __unlazy_fpu [PATCH] i386: white space fixes in i387.h [PATCH] i386: Drop noisy e820 debugging printks [PATCH] x86-64: Fix allnoconfig error in genapic_flat.c [PATCH] x86-64: Shut up warnings for vfat compat ioctls on other file systems [PATCH] x86-64: Share identical video.S between i386 and x86-64 [PATCH] x86-64: Remove CONFIG_REORDER [PATCH] x86-64: Print type and size correctly for unknown compat ioctls [PATCH] i386: Remove copy_*_user BUG_ONs for (size < 0) [PATCH] i386: Little cleanups in smpboot.c [PATCH] x86-64: Don't enable NUMA for a single node in K8 NUMA scanning [PATCH] x86: Use RDTSCP for synchronous get_cycles if possible [PATCH] i386: Add X86_FEATURE_RDTSCP [PATCH] i386: Implement X86_FEATURE_SYNC_RDTSC on i386 [PATCH] i386: Implement alternative_io for i386 ... Fix up trivial conflict in include/linux/highmem.h manually. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-04Merge branch 'upstream-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: ocfs2: Force use of GFP_NOFS in ocfs2_write() ocfs2: fix sparse warnings in fs/ocfs2/cluster ocfs2: fix sparse warnings in fs/ocfs2/dlm ocfs2: fix sparse warnings in fs/ocfs2 [PATCH] Copy i_flags to ocfs2 inode flags on write [PATCH] ocfs2: use __set_current_state() ocfs2: Wrap access of directory allocations with ip_alloc_sem. [PATCH] fs/ocfs2/: make 3 functions static ocfs2: Implement compat_ioctl()
2007-05-05[CIFS] Make sec=none force an anonymous mountJeff Layton
We had a customer report that attempting to make CIFS mount with a null username (i.e. doing an anonymous mount) doesn't work. Looking through the code, it looks like CIFS expects a NULL username from userspace in order to trigger an anonymous mount. The mount.cifs code doesn't seem to ever pass a null username to the kernel, however. It looks also like the kernel can take a sec=none option, but it only seems to look at it if the username is already NULL. This seems redundant and effectively makes sec=none useless. The following patch makes sec=none force an anonymous mount. Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-05-04Merge git://git.linux-nfs.org/pub/linux/nfs-2.6Linus Torvalds
* git://git.linux-nfs.org/pub/linux/nfs-2.6: (28 commits) NFS: Fix a compile glitch on 64-bit systems NFS: Clean up nfs_create_request comments spkm3: initialize hash spkm3: remove bad kfree, unnecessary export spkm3: fix spkm3's use of hmac NFS4: invalidate cached acl on setacl NFS: Fix directory caching problem - with test case and patch. NFS: Set meaningful value for fattr->time_start in readdirplus results. NFS: Added support to turn off the NFSv3 READDIRPLUS RPC. SUNRPC: RPC client should retry with different versions of rpcbind SUNRPC: remove old portmapper NFS: switch NFSROOT to use new rpcbind client SUNRPC: switch the RPC server to use the new rpcbind registration API SUNRPC: switch socket-based RPC transports to use rpcbind SUNRPC: introduce rpcbind: replacement for in-kernel portmapper SUNRPC: Eliminate side effects from rpc_malloc SUNRPC: RPC buffer size estimates are too large NLM: Shrink the maximum request size of NLM4 requests NFS: Use pgoff_t in structures and functions that pass page cache offsets NFS: Clean up nfs_sync_mapping_wait() ...
2007-05-04Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (49 commits) [SCTP]: Set assoc_id correctly during INIT collision. [SCTP]: Re-order SCTP initializations to avoid race with sctp_rcv() [SCTP]: Fix the SO_REUSEADDR handling to be similar to TCP. [SCTP]: Verify all destination ports in sctp_connectx. [XFRM] SPD info TLV aggregation [XFRM] SAD info TLV aggregationx [AF_RXRPC]: Sort out MTU handling. [AF_IUCV/IUCV] : Add missing section annotations [AF_IUCV]: Implementation of a skb backlog queue [NETLINK]: Remove bogus BUG_ON [IPV6]: Some cleanups in include/net/ipv6.h [TCP]: zero out rx_opt in tcp_disconnect() [BNX2]: Fix TSO problem with small MSS. [NET]: Rework dev_base via list_head (v3) [TCP] Highspeed: Limited slow-start is nowadays in tcp_slow_start [BNX2]: Update version and reldate. [BNX2]: Print bus information for PCIE devices. [BNX2]: Add 1-shot MSI handler for 5709. [BNX2]: Restructure PHY event handling. [BNX2]: Add indirect spinlock. ...