Age | Commit message (Collapse) | Author |
|
The problem that has been addressed is that of synchronising updates of
the file size with writes that extend a file. Without the fix the update
of a file's size, as a result of a write beyond eof, is independent of
when the cached data is flushed to disk. Often the file size update would
be written to the filesystem log before the data is flushed to disk. When
a system crashes between these two events and the filesystem log is
replayed on mount the file's size will be set but since the contents never
made it to disk the file is full of holes. If some of the cached data was
flushed to disk then it may just be a section of the file at the end that
has holes.
There are existing fixes to help alleviate this problem, particularly in
the case where a file has been truncated, that force cached data to be
flushed to disk when the file is closed. If the system crashes while the
file(s) are still open then this flushing will never occur.
The fix that we have implemented is to introduce a second file size,
called the in-memory file size, that represents the current file size as
viewed by the user. The existing file size, called the on-disk file size,
is the one that get's written to the filesystem log and we only update it
when it is safe to do so. When we write to a file beyond eof we only
update the in- memory file size in the write operation. Later when the I/O
operation, that flushes the cached data to disk completes, an I/O
completion routine will update the on-disk file size. The on-disk file
size will be updated to the maximum offset of the I/O or to the value of
the in-memory file size if the I/O includes eof.
SGI-PV: 958522
SGI-Modid: xfs-linux-melb:xfs-kern:28322a
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
|
|
This change addresses a race in xfs_write() where, for direct I/O, the
flags need_i_mutex and need_flush are setup before the iolock is acquired.
The logic used to setup the flags may change between setting the flags and
acquiring the iolock resulting in these flags having incorrect values. For
example, if a file is not currently cached then need_i_mutex is set to
zero and then if the file is cached before the iolock is acquired we will
fail to do the flushinval before the direct write.
The flush (and also the call to xfs_zero_eof()) need to be done with the
iolock held exclusive so we need to acquire the iolock before checking for
cached data (or if the write begins after eof) to prevent this state from
changing. For direct I/O I've chosen to always acquire the iolock in
shared mode initially and if there is a need to promote it then drop it
and reacquire it.
There's also some other tidy-ups including removing the O_APPEND offset
adjustment since that work is done in generic_write_checks() (and we don't
use offset as an input parameter anywhere).
SGI-PV: 962170
SGI-Modid: xfs-linux-melb:xfs-kern:28319a
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
|
|
When uquota and oquota (gquota/pquota) are enabled for accounting both are
enforced if ether has enforcement active.
Conditions:
- Both XFS_UQUOTA_ACCT and XFS_GQUOTA_ACCT are enabled.
- Either XFS_UQUOTA_ENFD or XFS_OQUOTA_ENFD is enabled.
- The usage without enforce is reached at the soft limit.
Problems:
1. "repquota" shows all grace time even if no enforcement.
2. we cannot make a file over a hard limits even if no enforcement.
SGI-PV: 962291
SGI-Modid: xfs-linux-melb:xfs-kern:28272a
Signed-off-by: Kouta Ooizumi <k-ooizumi@tnes.nec.co.jp>
Signed-off-by: Donald Douwsma <donaldd@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
|
|
This patch handles error return values in fs_flush_pages and
fs_flushinval_pages. It changes the prototype of fs_flushinval_pages so we
can propogate the errors and handle them at higher layers. I also modified
xfs_itruncate_start so that it could propogate the error further.
SGI-PV: 961990
SGI-Modid: xfs-linux-melb:xfs-kern:28231a
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Stewart Smith <stewart@flamingspork.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
|
|
xfs_qm_scall_quotaon was incorrectly failing requests to enable group
quota enforcement. Fixes logic error in OQUOTA handling.
SGI-PV: 961964
SGI-Modid: xfs-linux-melb:xfs-kern:28227a
Signed-off-by: Donald Douwsma <donaldd@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
|
|
When quotas are mounted or remounted without a particular quota type the
quota accounting for that type becomes invalid. Previously we were
ignoring this leading to accounting errors.
SGI-PV: 961964
SGI-Modid: xfs-linux-melb:xfs-kern:28225a
Signed-off-by: Donald Douwsma <donaldd@sgi.com>
Signed-off-by: Utako Kusaka <utako@tnes.nec.co.jp>
Signed-off-by: Vlad Apostolov <vapo@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
|
|
Patch provided by Joe Perches
SGI-PV: 961696
SGI-Modid: xfs-linux-melb:xfs-kern:28209a
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
|
|
Patch provided by Eric Sandeen.
SGI-PV: 961695
SGI-Modid: xfs-linux-melb:xfs-kern:28205a
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
|
|
Patch provided by Eric Sandeen.
SGI-PV: 961694
SGI-Modid: xfs-linux-melb:xfs-kern:28204a
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
|
|
NULL.
Patch provided by Eric Sandeen.
SGI-PV: 961693
SGI-Modid: xfs-linux-melb:xfs-kern:28199a
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
|
|
* 'server-cluster-locking-api' of git://linux-nfs.org/~bfields/linux:
gfs2: nfs lock support for gfs2
lockd: add code to handle deferred lock requests
lockd: always preallocate block in nlmsvc_lock()
lockd: handle test_lock deferrals
lockd: pass cookie in nlmsvc_testlock
lockd: handle fl_grant callbacks
lockd: save lock state on deferral
locks: add fl_grant callback for asynchronous lock return
nfsd4: Convert NFSv4 to new lock interface
locks: add lock cancel command
locks: allow {vfs,posix}_lock_file to return conflicting lock
locks: factor out generic/filesystem switch from setlock code
locks: factor out generic/filesystem switch from test_lock
locks: give posix_test_lock same interface as ->lock
locks: make ->lock release private data before returning in GETLK case
locks: create posix-to-flock helper functions
locks: trivial removal of unnecessary parentheses
|
|
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (34 commits)
[GFS2] Uncomment sprintf_symbol calling code
[DLM] lowcomms style
[GFS2] printk warning fixes
[GFS2] Patch to fix mmap of stuffed files
[GFS2] use lib/parser for parsing mount options
[DLM] Lowcomms nodeid range & initialisation fixes
[DLM] Fix dlm_lowcoms_stop hang
[DLM] fix mode munging
[GFS2] lockdump improvements
[GFS2] Patch to detect corrupt number of dir entries in leaf and/or inode blocks
[GFS2] bz 236008: Kernel gpf doing cat /debugfs/gfs2/xxx (lock dump)
[DLM] fs/dlm/ast.c should #include "ast.h"
[DLM] Consolidate transport protocols
[DLM] Remove redundant assignment
[GFS2] Fix bz 234168 (ignoring rgrp flags)
[DLM] change lkid format
[DLM] interface for purge (2/2)
[DLM] add orphan purging code (1/2)
[DLM] split create_message function
[GFS2] Set drop_count to 0 (off) by default
...
|
|
This adds support for the Analog Devices Blackfin processor architecture, and
currently supports the BF533, BF532, BF531, BF537, BF536, BF534, and BF561
(Dual Core) devices, with a variety of development platforms including those
avaliable from Analog Devices (BF533-EZKit, BF533-STAMP, BF537-STAMP,
BF561-EZKIT), and Bluetechnix! Tinyboards.
The Blackfin architecture was jointly developed by Intel and Analog Devices
Inc. (ADI) as the Micro Signal Architecture (MSA) core and introduced it in
December of 2000. Since then ADI has put this core into its Blackfin
processor family of devices. The Blackfin core has the advantages of a clean,
orthogonal,RISC-like microprocessor instruction set. It combines a dual-MAC
(Multiply/Accumulate), state-of-the-art signal processing engine and
single-instruction, multiple-data (SIMD) multimedia capabilities into a single
instruction-set architecture.
The Blackfin architecture, including the instruction set, is described by the
ADSP-BF53x/BF56x Blackfin Processor Programming Reference
http://blackfin.uclinux.org/gf/download/frsrelease/29/2549/Blackfin_PRM.pdf
The Blackfin processor is already supported by major releases of gcc, and
there are binary and source rpms/tarballs for many architectures at:
http://blackfin.uclinux.org/gf/project/toolchain/frs There is complete
documentation, including "getting started" guides available at:
http://docs.blackfin.uclinux.org/ which provides links to the sources and
patches you will need in order to set up a cross-compiling environment for
bfin-linux-uclibc
This patch, as well as the other patches (toolchain, distribution,
uClibc) are actively supported by Analog Devices Inc, at:
http://blackfin.uclinux.org/
We have tested this on LTP, and our test plan (including pass/fails) can
be found at:
http://docs.blackfin.uclinux.org/doku.php?id=testing_the_linux_kernel
[m.kozlowski@tuxland.pl: balance parenthesis in blackfin header files]
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl>
Signed-off-by: Aubrey Li <aubrey.li@analog.com>
Signed-off-by: Jie Zhang <jie.zhang@analog.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
If hugetlbfs module_init() fails, hugetlbfs_vfsmount is not initialized and
shmget() with SHM_HUGETLB flag will cause NULL pointer dereference.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: William Irwin <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
I have never seen a use of SLAB_DEBUG_INITIAL. It is only supported by
SLAB.
I think its purpose was to have a callback after an object has been freed
to verify that the state is the constructor state again? The callback is
performed before each freeing of an object.
I would think that it is much easier to check the object state manually
before the free. That also places the check near the code object
manipulation of the object.
Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was
compiled with SLAB debugging on. If there would be code in a constructor
handling SLAB_DEBUG_INITIAL then it would have to be conditional on
SLAB_DEBUG otherwise it would just be dead code. But there is no such code
in the kernel. I think SLUB_DEBUG_INITIAL is too problematic to make real
use of, difficult to understand and there are easier ways to accomplish the
same effect (i.e. add debug code before kfree).
There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be
clear in fs inode caches. Remove the pointless checks (they would even be
pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors.
This is the last slab flag that SLUB did not support. Remove the check for
unimplemented flags from SLUB.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Generic hugetlb_get_unmapped_area() now handles MAP_FIXED by just calling
prepare_hugepage_range()
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: William Irwin <bill.irwin@oracle.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Grant Grundler <grundler@parisc-linux.org>
Cc: Matthew Wilcox <willy@debian.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Adam Litke <agl@us.ibm.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This patch provides a new macro
KMEM_CACHE(<struct>, <flags>)
to simplify slab creation. KMEM_CACHE creates a slab with the name of the
struct, with the size of the struct and with the alignment of the struct.
Additional slab flags may be specified if necessary.
Example
struct test_slab {
int a,b,c;
struct list_head;
} __cacheline_aligned_in_smp;
test_slab_cache = KMEM_CACHE(test_slab, SLAB_PANIC)
will create a new slab named "test_slab" of the size sizeof(struct
test_slab) and aligned to the alignment of test slab. If it fails then we
panic.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
invalidate_bdev() is superfluous when truncate_inode_pages() is also
called. do call invalidate_bh_lrus() though, to avoid stale pointers.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Remove duplicate work in kill_bdev().
It currently invalidates and then truncates the bdev's mapping.
invalidate_mapping_pages() will opportunistically remove pages from the
mapping. And truncate_inode_pages() will forcefully remove all pages.
The only thing truncate doesn't do is flush the bh lrus. So do that
explicitly. This avoids (very unlikely) but possible invalid lookup
results if the same bdev is quickly re-issued.
It also will prevent extreme kernel latencies which are observed when
blockdevs which have a large amount of pagecache are unmounted, by avoiding
invalidate_mapping_pages() on that path. invalidate_mapping_pages() has no
cond_resched (it can be called under spinlock), whereas truncate_inode_pages()
has one.
[akpm@linux-foundation.org: restore nrpages==0 optimisation]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Remove the destroy_dirty_buffers argument from invalidate_bdev(), it hasn't
been used in 6 years (so akpm says).
find * -name \*.[ch] | xargs grep -l invalidate_bdev |
while read file; do
quilt add $file;
sed -ie 's/invalidate_bdev(\([^,]*\),[^)]*)/invalidate_bdev(\1)/g' $file;
done
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
If we add a new flag so that we can distinguish between the first page and the
tail pages then we can avoid to use page->private in the first page.
page->private == page for the first page, so there is no real information in
there.
Freeing up page->private makes the use of compound pages more transparent.
They become more usable like real pages. Right now we have to be careful f.e.
if we are going beyond PAGE_SIZE allocations in the slab on i386 because we
can then no longer use the private field. This is one of the issues that
cause us not to support debugging for page size slabs in SLAB.
Having page->private available for SLUB would allow more meta information in
the page struct. I can probably avoid the 16 bit ints that I have in there
right now.
Also if page->private is available then a compound page may be equipped with
buffer heads. This may free up the way for filesystems to support larger
blocks than page size.
We add PageTail as an alias of PageReclaim. Compound pages cannot currently
be reclaimed. Because of the alias one needs to check PageCompound first.
The RFC for the this approach was discussed at
http://marc.info/?t=117574302800001&r=1&w=2
[nacc@us.ibm.com: fix hugetlbfs]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Adds /proc/pid/clear_refs. When any non-zero number is written to this file,
pte_mkold() and ClearPageReferenced() is called for each pte and its
corresponding page, respectively, in that task's VMAs. This file is only
writable by the user who owns the task.
It is now possible to measure _approximately_ how much memory a task is using
by clearing the reference bits with
echo 1 > /proc/pid/clear_refs
and checking the reference count for each VMA from the /proc/pid/smaps output
at a measured time interval. For example, to observe the approximate change
in memory footprint for a task, write a script that clears the references
(echo 1 > /proc/pid/clear_refs), sleeps, and then greps for Pgs_Referenced and
extracts the size in kB. Add the sizes for each VMA together for the total
referenced footprint. Moments later, repeat the process and observe the
difference.
For example, using an efficient Mozilla:
accumulated time referenced memory
---------------- -----------------
0 s 408 kB
1 s 408 kB
2 s 556 kB
3 s 1028 kB
4 s 872 kB
5 s 1956 kB
6 s 416 kB
7 s 1560 kB
8 s 2336 kB
9 s 1044 kB
10 s 416 kB
This is a valuable tool to get an approximate measurement of the memory
footprint for a task.
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: David Rientjes <rientjes@google.com>
[akpm@linux-foundation.org: build fixes]
[mpm@selenic.com: rename for_each_pmd]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Adds an additional unsigned long field to struct mem_size_stats called
'referenced'. For each pte walked in the smaps code, this field is
incremented by PAGE_SIZE if it has pte-reference bits.
An additional line was added to the /proc/pid/smaps output for each VMA to
indicate how many pages within it are currently marked as referenced or
accessed.
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Extracts the pmd walker from smaps-specific code in fs/proc/task_mmu.c.
The new struct pmd_walker includes the struct vm_area_struct of the memory to
walk over. Iteration begins at the vma->vm_start and completes at
vma->vm_end. A pointer to another data structure may be stored in the private
field such as struct mem_size_stats, which acts as the smaps accumulator. For
each pmd in the VMA, the action function is called with a pointer to its
struct vm_area_struct, a pointer to the pmd_t, its start and end addresses,
and the private field.
The interface for walking pmd's in a VMA for fs/proc/task_mmu.c is now:
void for_each_pmd(struct vm_area_struct *vma,
void (*action)(struct vm_area_struct *vma,
pmd_t *pmd, unsigned long addr,
unsigned long end,
void *private),
void *private);
Since the pmd walker is now extracted from the smaps code, smaps_one_pmd() is
invoked for each pmd in the VMA. Its behavior and efficiency is identical to
the existing implementation.
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Add proper prototypes in include/linux/slab.h.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
__block_write_full_page is calling SetPageUptodate without the page locked.
This is unusual, but not incorrect, as PG_writeback is still set.
However the next patch will require that SetPageUptodate always be called with
the page locked. Simply don't bother setting the page uptodate in this case
(it is unusual that the write path does such a thing anyway). Instead just
leave it to the read side to bring the page uptodate when it notices that all
buffers are uptodate.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Ensure pages are uptodate after returning from read_cache_page, which allows
us to cut out most of the filesystem-internal PageUptodate calls.
I didn't have a great look down the call chains, but this appears to fixes 7
possible use-before uptodate in hfs, 2 in hfsplus, 1 in jfs, a few in
ecryptfs, 1 in jffs2, and a possible cleared data overwritten with readpage in
block2mtd. All depending on whether the filler is async and/or can return
with a !uptodate page.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Add a proper prototype for hugetlb_get_unmapped_area() in
include/linux/hugetlb.h.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: William Irwin <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Add NFS lock support to GFS2.
Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
Rewrite nlmsvc_lock() to use the asynchronous interface.
As with testlock, we answer nlm requests in nlmsvc_lock by first looking up
the block and then using the results we find in the block if B_QUEUED is
set, and calling vfs_lock_file() otherwise.
If this a new lock request and we get -EINPROGRESS return on a non-blocking
request then we defer the request.
Also modify nlmsvc_unlock() to call the filesystem method if appropriate.
Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
|
|
Normally we could skip ever having to allocate a block in the case where
the client asks for a non-blocking lock, or asks for a blocking lock that
succeeds immediately.
However we're going to want to always look up a block first in order to
check whether we're revisiting a deferred lock call, and to be prepared to
handle the case where the filesystem returns -EINPROGRESS--in that case we
want to make sure the lock we've given the filesystem is the one embedded
in the block that we'll use to track the deferred request.
Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
|
|
Rewrite nlmsvc_testlock() to use the new asynchronous interface: instead of
immediately doing a posix_test_lock(), we first look for a matching block.
If the subsequent test_lock returns anything other than -EINPROGRESS, we
then remove the block we've found and return the results.
If it returns -EINPROGRESS, then we defer the lock request.
In the case where the block we find in the first step has B_QUEUED set,
we bypass the vfs_test_lock entirely, instead using the block to decide how
to respond:
with nlm_lck_denied if B_TIMED_OUT is set.
with nlm_granted if B_GOT_CALLBACK is set.
by dropping if neither B_TIMED_OUT nor B_GOT_CALLBACK is set
Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
|
|
Change NLM internal interface to pass more information for test lock; we
need this to make sure the cookie information is pushed down to the place
where we do request deferral, which is handled for testlock by the
following patch.
Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
|
|
Add code to handle file system callback when the lock is finally granted.
Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
|
|
We need to keep some state for a pending asynchronous lock request, so this
patch adds that state to struct nlm_block.
This also adds a function which defers the request, by calling
rqstp->rq_chandle.defer and storing the resulting deferred request in a
nlm_block structure which we insert into lockd's global block list. That
new function isn't called yet, so it's dead code until a later patch.
Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
|
|
Acquiring a lock on a cluster filesystem may require communication with
remote hosts, and to avoid blocking lockd or nfsd threads during such
communication, we allow the results to be returned asynchronously.
When a ->lock() call needs to block, the file system will return
-EINPROGRESS, and then later return the results with a call to the
routine in the fl_grant field of the lock_manager_operations struct.
This differs from the case when ->lock returns -EAGAIN to a blocking
lock request; in that case, the filesystem calls fl_notify when the lock
is granted, and the caller retries the original lock. So while
fl_notify is merely a hint to the caller that it should retry, fl_grant
actually communicates the final result of the lock operation (with the
lock already acquired in the succesful case).
Therefore fl_grant takes a lock, a status and, for the test lock case, a
conflicting lock. We also allow fl_grant to return an error to the
filesystem, to handle the case where the fl_grant requests arrives after
the lock manager has already given up waiting for it.
Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
|
|
Convert NFSv4 to the new lock interface. We don't define any callback for now,
so we're not taking advantage of the asynchronous feature--that's less critical
for the multi-threaded nfsd then it is for the single-threaded lockd. But this
does allow a cluster filesystems to export cluster-coherent locking to NFS.
Note that it's cluster filesystems that are the issue--of the filesystems that
define lock methods (nfs, cifs, etc.), most are not exportable by nfsd.
Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
|
|
Lock managers need to be able to cancel pending lock requests. In the case
where the exported filesystem manages its own locks, it's not sufficient just
to call posix_unblock_lock(); we need to let the filesystem know what's
happening too.
We do this by adding a new fcntl lock command: FL_CANCELLK. Some day this
might also be made available to userspace applications that could benefit from
an asynchronous locking api.
Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
|
|
The nfsv4 protocol's lock operation, in the case of a conflict, returns
information about the conflicting lock.
It's unclear how clients can use this, so for now we're not going so far as to
add a filesystem method that can return a conflicting lock, but we may as well
return something in the local case when it's easy to.
Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
|
|
Factor out the code that switches between generic and filesystem-specific lock
methods; eventually we want to call this from lock managers (lockd and nfsd)
too; currently they only call the generic methods.
This patch does that for all the setlk code.
Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
|
|
Factor out the code that switches between generic and filesystem-specific lock
methods; eventually we want to call this from lock managers (lockd and nfsd)
too; currently they only call the generic methods.
This patch does that for test_lock.
Note that this hasn't been necessary until recently, because the few
filesystems that define ->lock() (nfs, cifs...) aren't exportable via NFS.
However GFS (and, in the future, other cluster filesystems) need to implement
their own locking to get cluster-coherent locking, and also want to be able to
export locking to NFS (lockd and NFSv4).
So we accomplish this by factoring out code such as this and exporting it for
the use of lockd and nfsd.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
|
|
posix_test_lock() and ->lock() do the same job but have gratuitously
different interfaces. Modify posix_test_lock() so the two agree,
simplifying some code in the process.
Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
|
|
The file_lock argument to ->lock is used to return the conflicting lock
when found. There's no reason for the filesystem to return any private
information with this conflicting lock, but nfsv4 is.
Fix nfsv4 client, and modify locks.c to stop calling fl_release_private
for it in this case.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Cc: "Trond Myklebust" <Trond.Myklebust@netapp.com>"
|
|
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
[CIFS] Fix typo in cifs readme from previous commit
[CIFS] Make sec=none force an anonymous mount
[CIFS] Change semaphore to mutex for cifs lock_sem
[CIFS] Fix oops in reset_cifs_unix_caps on reconnect
[CIFS] UID/GID override on CIFS mounts to Samba
[CIFS] prefixpath mounts to servers supporting posix paths used wrong slash
[CIFS] Update cifs version to 1.49
[CIFS] Replace kmalloc/memset combination with kzalloc
[CIFS] Add IPv6 support
[CIFS] New CIFS POSIX mkdir performance improvement (part 2)
[CIFS] New CIFS POSIX mkdir performance improvement
[CIFS] Add write perm for usr to file on windows should remove r/o dos attr
[CIFS] Remove unnecessary parm to cifs_reopen_file
[CIFS] Switch cifsd to kthread_run from kernel_thread
[CIFS] Remove unnecessary checks
|
|
Signed-off-by: Steve French <sfrench@us.ibm.com>
|
|
* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (231 commits)
[PATCH] i386: Don't delete cpu_devs data to identify different x86 types in late_initcall
[PATCH] i386: type may be unused
[PATCH] i386: Some additional chipset register values validation.
[PATCH] i386: Add missing !X86_PAE dependincy to the 2G/2G split.
[PATCH] x86-64: Don't exclude asm-offsets.c in Documentation/dontdiff
[PATCH] i386: avoid redundant preempt_disable in __unlazy_fpu
[PATCH] i386: white space fixes in i387.h
[PATCH] i386: Drop noisy e820 debugging printks
[PATCH] x86-64: Fix allnoconfig error in genapic_flat.c
[PATCH] x86-64: Shut up warnings for vfat compat ioctls on other file systems
[PATCH] x86-64: Share identical video.S between i386 and x86-64
[PATCH] x86-64: Remove CONFIG_REORDER
[PATCH] x86-64: Print type and size correctly for unknown compat ioctls
[PATCH] i386: Remove copy_*_user BUG_ONs for (size < 0)
[PATCH] i386: Little cleanups in smpboot.c
[PATCH] x86-64: Don't enable NUMA for a single node in K8 NUMA scanning
[PATCH] x86: Use RDTSCP for synchronous get_cycles if possible
[PATCH] i386: Add X86_FEATURE_RDTSCP
[PATCH] i386: Implement X86_FEATURE_SYNC_RDTSC on i386
[PATCH] i386: Implement alternative_io for i386
...
Fix up trivial conflict in include/linux/highmem.h manually.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2:
ocfs2: Force use of GFP_NOFS in ocfs2_write()
ocfs2: fix sparse warnings in fs/ocfs2/cluster
ocfs2: fix sparse warnings in fs/ocfs2/dlm
ocfs2: fix sparse warnings in fs/ocfs2
[PATCH] Copy i_flags to ocfs2 inode flags on write
[PATCH] ocfs2: use __set_current_state()
ocfs2: Wrap access of directory allocations with ip_alloc_sem.
[PATCH] fs/ocfs2/: make 3 functions static
ocfs2: Implement compat_ioctl()
|
|
We had a customer report that attempting to make CIFS mount with a null
username (i.e. doing an anonymous mount) doesn't work. Looking through the
code, it looks like CIFS expects a NULL username from userspace in order
to trigger an anonymous mount. The mount.cifs code doesn't seem to ever
pass a null username to the kernel, however.
It looks also like the kernel can take a sec=none option, but it only seems
to look at it if the username is already NULL. This seems redundant and
effectively makes sec=none useless.
The following patch makes sec=none force an anonymous mount.
Signed-off-by: Steve French <sfrench@us.ibm.com>
|
|
* git://git.linux-nfs.org/pub/linux/nfs-2.6: (28 commits)
NFS: Fix a compile glitch on 64-bit systems
NFS: Clean up nfs_create_request comments
spkm3: initialize hash
spkm3: remove bad kfree, unnecessary export
spkm3: fix spkm3's use of hmac
NFS4: invalidate cached acl on setacl
NFS: Fix directory caching problem - with test case and patch.
NFS: Set meaningful value for fattr->time_start in readdirplus results.
NFS: Added support to turn off the NFSv3 READDIRPLUS RPC.
SUNRPC: RPC client should retry with different versions of rpcbind
SUNRPC: remove old portmapper
NFS: switch NFSROOT to use new rpcbind client
SUNRPC: switch the RPC server to use the new rpcbind registration API
SUNRPC: switch socket-based RPC transports to use rpcbind
SUNRPC: introduce rpcbind: replacement for in-kernel portmapper
SUNRPC: Eliminate side effects from rpc_malloc
SUNRPC: RPC buffer size estimates are too large
NLM: Shrink the maximum request size of NLM4 requests
NFS: Use pgoff_t in structures and functions that pass page cache offsets
NFS: Clean up nfs_sync_mapping_wait()
...
|
|
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (49 commits)
[SCTP]: Set assoc_id correctly during INIT collision.
[SCTP]: Re-order SCTP initializations to avoid race with sctp_rcv()
[SCTP]: Fix the SO_REUSEADDR handling to be similar to TCP.
[SCTP]: Verify all destination ports in sctp_connectx.
[XFRM] SPD info TLV aggregation
[XFRM] SAD info TLV aggregationx
[AF_RXRPC]: Sort out MTU handling.
[AF_IUCV/IUCV] : Add missing section annotations
[AF_IUCV]: Implementation of a skb backlog queue
[NETLINK]: Remove bogus BUG_ON
[IPV6]: Some cleanups in include/net/ipv6.h
[TCP]: zero out rx_opt in tcp_disconnect()
[BNX2]: Fix TSO problem with small MSS.
[NET]: Rework dev_base via list_head (v3)
[TCP] Highspeed: Limited slow-start is nowadays in tcp_slow_start
[BNX2]: Update version and reldate.
[BNX2]: Print bus information for PCIE devices.
[BNX2]: Add 1-shot MSI handler for 5709.
[BNX2]: Restructure PHY event handling.
[BNX2]: Add indirect spinlock.
...
|