aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2007-01-26[PATCH] knfsd: fix an NFSD bug with full sized, non-page-aligned readsNeilBrown
NFSd assumes that largest number of pages that will be needed for a request+response is 2+N where N pages is the size of the largest permitted read/write request. The '2' are 1 for the non-data part of the request, and 1 for the non-data part of the reply. However, when a read request is not page-aligned, and we choose to use ->sendfile to send it directly from the page cache, we may need N+1 pages to hold the whole reply. This can overflow and array and cause an Oops. This patch increases size of the array for holding pages by one and makes sure that entry is NULL when it is not in use. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-26[PATCH] knfsd: fix setting of ACL server versionsNeilBrown
Due to silly typos, if the nfs versions are explicitly set, no NFSACL versions get enabled. Also improve an error message that would have made this bug a little easier to find. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-26[PATCH] Fix NULL ->nsproxy dereference in /proc/*/mountsAlexey Dobriyan
/proc/*/mounstats was fixed, all right, but... To reproduce: while true; do find /proc -type f 2>/dev/null | xargs cat 1>/dev/null 2>/dev/null; done BUG: unable to handle kernel NULL pointer dereference at virtual address 0000000c printing eip: c01754df *pde = 00000000 Oops: 0000 [#28] Modules linked in: af_packet ohci_hcd e1000 ehci_hcd uhci_hcd usbcore xfs CPU: 0 EIP: 0060:[<c01754df>] Not tainted VLI EFLAGS: 00010286 (2.6.20-rc5 #1) EIP is at mounts_open+0x1c/0xac eax: 00000000 ebx: d5898ac0 ecx: d1d27b18 edx: d1d27a50 esi: e6083e10 edi: d3c87f38 ebp: d5898ac0 esp: d3c87ef0 ds: 007b es: 007b ss: 0068 Process cat (pid: 18071, ti=d3c86000 task=f7d5f070 task.ti=d3c86000) Stack: d5898ac0 e6083e10 d3c87f38 c01754c3 c0147c91 c18c52c0 d343f314 d5898ac0 00008000 d3c87f38 ffffff9c c0147e09 d5898ac0 00000000 00000000 c0147e4b 00000000 d3c87f38 d343f314 c18c52c0 c015e53e 00001000 08051000 00000101 Call Trace: [<c01754c3>] mounts_open+0x0/0xac [<c0147c91>] __dentry_open+0xa1/0x18c [<c0147e09>] nameidata_to_filp+0x31/0x3a [<c0147e4b>] do_filp_open+0x39/0x40 [<c015e53e>] seq_read+0x128/0x2aa [<c0147e8c>] do_sys_open+0x3a/0x6d [<c0147efa>] sys_open+0x1c/0x20 [<c0102b76>] sysenter_past_esp+0x5f/0x85 [<c02a0033>] unix_stream_recvmsg+0x3bf/0x4bf ======================= Code: 5d c3 89 d8 e8 06 e0 f9 ff eb bd 0f 0b eb fe 55 57 56 53 89 d5 8b 40 f0 31 d2 e8 02 c1 fa ff 89 c2 85 c0 74 5c 8b 80 48 04 00 00 <8b> 58 0c 85 db 74 02 ff 03 ff 4a 08 0f 94 c0 84 c0 75 74 85 db EIP: [<c01754df>] mounts_open+0x1c/0xac SS:ESP 0068:d3c87ef0 A race with do_exit()'s call to exit_namespaces(). Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-26[PATCH] i386 vDSO: use VM_ALWAYSDUMPRoland McGrath
This patch fixes core dumps to include the vDSO vma, which is left out now. It removes the special-case core writing macros, which were not doing the right thing for the vDSO vma anyway. Instead, it uses VM_ALWAYSDUMP in the vma; there is no need for the fixmap page to be installed. It handles the CONFIG_COMPAT_VDSO case by making elf_core_dump use the fake vma from get_gate_vma after real vmas in the same way the /proc/PID/maps code does. This changes core dumps so they no longer include the non-PT_LOAD phdrs from the vDSO. I made the change to add them in the first place, but in turned out that nothing ever wanted them there since the advent of NT_AUXV. It's cleaner to leave them out, and just let the phdrs inside the vDSO image speak for themselves. Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-26[PATCH] Add VM_ALWAYSDUMPRoland McGrath
This patch adds the VM_ALWAYSDUMP flag for vm_flags in vm_area_struct. This provides a clean explicit way to have a vma always included in core dumps, as is needed for vDSO's. Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-26Write back inode data pages even when the inode itself is lockedLinus Torvalds
In __writeback_single_inode(), when we find a locked inode and we're not doing a data-integrity sync, we used to just skip writing entirely, since we didn't want to wait for the inode to unlock. However, there's really no reason to skip writing the data pages, which are likely to be the the bulk of the dirty state anyway (and the main reason why writeback was started for the non-data-integrity case, of course!) Acked-by: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Andrew Morton <akpm@osdl.org>, Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hugh@veritas.com> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-26Resurrect 'try_to_free_buffers()' VM hackeryLinus Torvalds
It's not pretty, but it appears that ext3 with data=journal will clean pages without ever actually telling the VM that they are clean. This, in turn, will result in the VM (and balance_dirty_pages() in particular) to never realize that the pages got cleaned, and wait forever for an event that already happened. Technically, this seems to be a problem with ext3 itself, but it used to be hidden by 'try_to_free_buffers()' noticing this situation on its own, and just working around the filesystem problem. This commit re-instates that hack, in order to avoid a regression for the 2.6.20 release. This fixes bugzilla 7844: http://bugzilla.kernel.org/show_bug.cgi?id=7844 Peter Zijlstra points out that we should probably retain the debugging code that this removes from cancel_dirty_page(), and I agree, but for the imminent release we might as well just silence the warning too (since it's not a new bug: anything that triggers that warning has been around forever). Acked-by: Randy Dunlap <rdunlap@xenotime.net> Acked-by: Jens Axboe <jens.axboe@oracle.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-24[PATCH] NFS: Fix races in nfs_revalidate_mapping()Trond Myklebust
Prevent the call to invalidate_inode_pages2() from racing with file writes by taking the inode->i_mutex across the page cache flush and invalidate. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: [CIFS] Fix oops when Windows server sent bad domain name null terminator [CIFS] cifs sprintf fix [CIFS] Remove 2 unneeded kzalloc casts [CIFS] Update CIFS version number
2007-01-23[PATCH] resierfs: avoid tail packing if an inode was ever mmappedVladimir Saveliev
This patch fixes a confusion reiserfs has for a long time. On release file operation reiserfs used to try to pack file data stored in last incomplete page of some files into metadata blocks. After packing the page got cleared with clear_page_dirty. It did not take into account that the page may be mmaped into other process's address space. Recent replacement for clear_page_dirty cancel_dirty_page found the confusion with sanity check that page has to be not mapped. The patch fixes the confusion by making reiserfs avoid tail packing if an inode was ever mmapped. reiserfs_mmap and reiserfs_file_release are serialized with mutex in reiserfs specific inode. reiserfs_mmap locks the mutex and sets a bit in reiserfs specific inode flags. reiserfs_file_release checks the bit having the mutex locked. If bit is set - tail packing is avoided. This eliminates a possibility that mmapped page gets cancel_page_dirty-ed. Signed-off-by: Vladimir Saveliev <vs@namesys.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <mason@suse.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-23[PATCH] fix blk_direct_IO bio preparationChen, Kenneth W
For large size DIO that needs multiple bio, one full page worth of data was lost at the boundary of bio's maximum sector or segment limits. After a bio is full and got submitted. The outer while (nbytes) { ... } loop will allocate a new bio and just march on to index into next page. It just forgets about the page that bio_add_page() rejected when previous bio is full. Fix it by put the rejected page back to pvec so we pick it up again for the next bio. Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-23[PATCH] blockdev direct_io: fix signedness bugAndrew Morton
size_t is unsigned. IO errors aren't getting through. Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-22Merge git://git.infradead.org/mtd-2.6Linus Torvalds
* git://git.infradead.org/mtd-2.6: (84 commits) [JFFS2] debug.h: include <linux/sched.h> for current->pid [MTD] OneNAND: Handle DDP chip boundary during read-while-load [MTD] OneNAND: return ecc error code only when 2-bit ecc occurs [MTD] OneNAND: Implement read-while-load [MTD] OneNAND: fix onenand_wait bug in read ecc error [MTD] OneNAND: release CPU in cycles [MTD] OneNAND: add subpage write support [MTD] OneNAND: fix onenand_wait bug [JFFS2] use the ref_offset macro [JFFS2] Reschedule in loops [JFFS2] Fix error-path leak in summary scan [JFFS2] add cond_resched() when garbage collecting deletion dirent [MTD] Nuke IVR leftovers [MTD] OneNAND: fix oob handling in recent oob patch [MTD] Fix ssfdc blksize typo [JFFS2] replace kmalloc+memset with kzalloc [MTD] Fix SSFDC build for variable blocksize. [MTD] ESB2ROM uses PCI [MTD] of_device-based physmap driver [MTD] Support combined RedBoot FIS directory and configuration area ...
2007-01-22Merge branch 'upstream-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: ocfs2: Add backup superblock info to ocfs2_fs.h ocfs2: cleanup ocfs2_iget() errors ocfs2: Directory c/mtime update fixes ocfs2: Don't print errors when following symlinks
2007-01-22[CIFS] Fix oops when Windows server sent bad domain name null terminatorSteve French
Fixes RedHat bug 211672 Windows sends one byte (instead of two) of null to terminate final Unicode string (domain name) in session setup response in some cases - this caused cifs to misalign some informational strings (making it hard to convert from UCS16 to UTF8). Thanks to Shaggy for his help and Akemi Yagi for debugging/testing Signed-off-by: Shirish Pargaonkar <shirishp@us.ibm.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-01-21ocfs2: Add backup superblock info to ocfs2_fs.hMark Fasheh
This synchronizes us with recent ocfs2-tools changes. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-01-21ocfs2: cleanup ocfs2_iget() errorsMark Fasheh
Get rid of some error prints in the ocfs2_iget() path from ocfs2_get_dentry(). NFSD can easily cause us to read stale inodes. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-01-21ocfs2: Directory c/mtime update fixesMark Fasheh
ocfs2 wasn't updating c/mtime on directories during dirent creation/deletion. Fix ocfs2_unlink(), ocfs2_rename() and __ocfs2_add_entry() by adding the proper code to update the struct inode and push the change out to disk. This helps rename/unlink on nfs exported file systems in particular as those clients compare directory time values to avoid a full re-reading a directory which hasn't changed. ocfs2_rename() loses some superfluous error handling as a result of this patch. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-01-21ocfs2: Don't print errors when following symlinksMark Fasheh
We shouldn't print errors returned from vfs_follow_link(). This was causing spurious errors to show up in the logs. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-01-21[CIFS] cifs sprintf fixSteve French
Cc: <alert7@xfocus.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-01-21[CIFS] Remove 2 unneeded kzalloc castsSteve French
Signed-off-by: Ahmed Darwish <darwish.07@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-01-18NTFS: Forgot to bump version number in makefile to 2.1.28...Anton Altaparmakov
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2007-01-18NTFS: 2.1.28 - Fix deadlock reported by Sergey Vlasov due to ntfs_put_inode().Anton Altaparmakov
- Fix deadlock in fs/ntfs/inode.c::ntfs_put_inode(). Thanks to Sergey Vlasov for the report and detailed analysis of the deadlock. The fix involved getting rid of ntfs_put_inode() altogether and hence NTFS no longer has a ->put_inode super operation. Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2007-01-18Merge branch 'master' of ↵David Woodhouse
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
2007-01-13[JFFS2] debug.h: include <linux/sched.h> for current->pidDavid Woodhouse
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2007-01-11[PATCH] Revert bd_mount_mutex back to a semaphoreDavid Chinner
Revert bd_mount_mutex back to a semaphore so that xfs_freeze -f /mnt/newtest; xfs_freeze -u /mnt/newtest works safely and doesn't produce lockdep warnings. (XFS unlocks the semaphore from a different task, by design. The mutex code warns about this) Signed-off-by: Dave Chinner <dgc@sgi.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-11[PATCH] NFS: Fix race in nfs_release_page()Trond Myklebust
NFS: Fix race in nfs_release_page() invalidate_inode_pages2() may find the dirty bit has been set on a page owing to the fact that the page may still be mapped after it was locked. Only after the call to unmap_mapping_range() are we sure that the page can no longer be dirtied. In order to fix this, NFS has hooked the releasepage() method and tries to write the page out between the call to unmap_mapping_range() and the call to remove_mapping(). This, however leads to deadlocks in the page reclaim code, where the page may be locked without holding a reference to the inode or dentry. Fix is to add a new address_space_operation, launder_page(), which will attempt to write out a dirty page without releasing the page lock. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Also, the bare SetPageDirty() can skew all sort of accounting leading to other nasties. [akpm@osdl.org: cleanup] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-10[PATCH] fix linux banner format stringRoman Zippel
Revert previous attempts at messing with the linux banner string and simply use a separate format string for proc. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Acked-by: Olaf Hering <olaf@aepfle.de> Acked-by: Jean Delvare <khali@linux-fr.org> Cc: Andrey Borzenkov <arvidjaar@mail.ru> Cc: Andrew Morton <akpm@osdl.org> Cc: Andy Whitcroft <apw@shadowen.org> Cc: Herbert Poetzl <herbert@13thfloor.at> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-10[JFFS2] use the ref_offset macroKyungmin Park
Don't use ref->flash_offset directly in debugging code, use the ref_offset macro instead. Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Signed-off-by: Artem Bityutskiy <dedekind@infradead.org>
2007-01-10[JFFS2] Reschedule in loopsArtem Bityutskiy
Make JFFS2 nicer and teach it to call cond_resched() in loops which may be quite large. Signed-off-by: Artem Bityutskiy <dedekind@infradead.org>
2007-01-06Revert "[PATCH] binfmt_elf: randomize PIE binaries (2nd try)"Linus Torvalds
This reverts commit 59287c0913cc9a6c75712a775f6c1c1ef418ef3b. Hugh Dickins reports that it causes random failures on x86 with SuSE 10.2, and points out "Isn't that randomization, anywhere from 0x10000 to ELF_ET_DYN_BASE, sure to place the ET_DYN from time to time just where the comment says it's trying to avoid? I assume that somehow results in the error reported." (where the comment in question is the existing comment in the source code about mmap/brk clashes). Suggested-by: Hugh Dickins <hugh@veritas.com> Acked-by: Marcus Meissner <meissner@suse.de> Cc: Andrew Morton <akpm@osdl.org> Cc: Andi Kleen <ak@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05[PATCH] fix garbage instead of zeroes in UFSEvgeniy Dushistov
Looks like this is the problem, which point Al Viro some time ago: ufs's get_block callback allocates 16k of disk at a time, and links that entire 16k into the file's metadata. But because get_block is called for only a single buffer_head (a 2k buffer_head in this case?) we are only able to tell the VFS that this 2k is buffer_new(). So when ufs_getfrag_block() is later called to map some more data in the file, and when that data resides within the remaining 14k of this fragment, ufs_getfrag_block() will incorrectly return a !buffer_new() buffer_head. I don't see _right_ way to do nullification of whole block, if use inode page cache, some pages may be outside of inode limits (inode size), and will be lost; if use blockdev page cache it is possible to zero real data, if later inode page cache will be used. The simpliest way, as can I see usage of block device page cache, but not only mark dirty, but also sync it during "nullification". I use my simple tests collection, which I used for check that create,open,write,read,close works on ufs, and I see that this patch makes ufs code 18% slower then before. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05[PATCH] fix memory corruption from misinterpreted bad_inode_ops return valuesEric Sandeen
CVE-2006-5753 is for a case where an inode can be marked bad, switching the ops to bad_inode_ops, which are all connected as: static int return_EIO(void) { return -EIO; } #define EIO_ERROR ((void *) (return_EIO)) static struct inode_operations bad_inode_ops = { .create = bad_inode_create ...etc... The problem here is that the void cast causes return types to not be promoted, and for ops such as listxattr which expect more than 32 bits of return value, the 32-bit -EIO is interpreted as a large positive 64-bit number, i.e. 0x00000000fffffffa instead of 0xfffffffa. This goes particularly badly when the return value is taken as a number of bytes to copy into, say, a user's buffer for example... I originally had coded up the fix by creating a return_EIO_<TYPE> macro for each return type, like this: static int return_EIO_int(void) { return -EIO; } #define EIO_ERROR_INT ((void *) (return_EIO_int)) static struct inode_operations bad_inode_ops = { .create = EIO_ERROR_INT, ...etc... but Al felt that it was probably better to create an EIO-returner for each actual op signature. Since so few ops share a signature, I just went ahead & created an EIO function for each individual file & inode op that returns a value. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05[PATCH] adfs: fix filename handlingJames Bursa
Fix filenames on adfs discs being terminated at the first character greater than 128 (adfs filenames are Latin 1). I saw this problem when using a loopback adfs image on a 2.6.17-rc5 x86_64 machine, and the patch fixed it there. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-02[JFFS2] Fix error-path leak in summary scanAmit Choudhary
Signed-off-by: Amit Choudhary <amit2030@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-12-30Merge branch 'upstream-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: ocfs2: export heartbeat thread pid via configfs ocfs2: always unmap in ocfs2_data_convert_worker() ocfs2: ignore NULL vfsmnt in ocfs2_should_update_atime() ocfs2: Allow direct I/O read past end of file ocfs2: don't print error in ocfs2_permission()
2006-12-30[PATCH] ramfs breaks without CONFIG_BLOCKDimitri Gorokhovik
ramfs doesn't provide the .set_dirty_page a_op, and when the BLOCK layer is not configured in, 'set_page_dirty' makes a call via a NULL pointer. Signed-off-by: Dimitri Gorokhovik <dimitri.gorokhovik@free.fr> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-30[PATCH] Fix lock inversion aio_kick_handler()Zach Brown
lockdep found a AB BC CA lock inversion in retry-based AIO: 1) The task struct's alloc_lock (A) is acquired in process context with interrupts enabled. An interrupt might arrive and call wake_up() which grabs the wait queue's q->lock (B). 2) When performing retry-based AIO the AIO core registers aio_wake_function() as the wake funtion for iocb->ki_wait. It is called with the wait queue's q->lock (B) held and then tries to add the iocb to the run list after acquiring the ctx_lock (C). 3) aio_kick_handler() holds the ctx_lock (C) while acquiring the alloc_lock (A) via lock_task() and unuse_mm(). Lockdep emits a warning saying that we're trying to connect the irq-safe q->lock to the irq-unsafe alloc_lock via ctx_lock. This fixes the inversion by calling unuse_mm() in the AIO kick handing path after we've released the ctx_lock. As Ben LaHaise pointed out __put_ioctx could set ctx->mm to NULL, so we must only access ctx->mm while we have the lock. Signed-off-by: Zach Brown <zach.brown@oracle.com> Signed-off-by: Suparna Bhattacharya <suparna@in.ibm.com> Acked-by: Benjamin LaHaise <bcrl@kvack.org> Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-28ocfs2: export heartbeat thread pid via configfsZhen Wei
The patch allows the ocfs2 heartbeat thread to prioritize I/O which may help cut down on spurious fencing. Most of this will be in the tools - we can have a pid configfs attribute and let userspace (ocfs2_hb_ctl) calls the ioprio_set syscall after starting heartbeat, but only cfq scheduler supports I/O priorities now. Signed-off-by: Zhen Wei <zwei@novell.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-28ocfs2: always unmap in ocfs2_data_convert_worker()Mark Fasheh
Mmap-heavy clustered workloads were sometimes finding stale data on mmap reads. The solution is to call unmap_mapping_range() on any down convert of a data lock. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-28ocfs2: ignore NULL vfsmnt in ocfs2_should_update_atime()Mark Fasheh
This can come from NFSD. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-28ocfs2: Allow direct I/O read past end of fileMark Fasheh
ocfs2_direct_IO_get_blocks() was incorrectly returning -EIO for a direct I/O read whose start block was past the end of the file allocation tree. Fix things so that we return a hole instead. do_direct_IO() will then notice that the range start is past eof and return a short read. While there, remove the unused vbo_max variable. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-28ocfs2: don't print error in ocfs2_permission()Mark Fasheh
Errors from generic_permission() can happen in valid cases and shouldn't be reported. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-23Fix up CIFS for "test_clear_page_dirty()" removalLinus Torvalds
This also adds he required page "writeback" flag handling, that cifs hasn't been doing and that the page dirty flag changes made obvious. Acked-by: Steve French <smfltc@us.ibm.com> Acked-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-23[CIFS] Update CIFS version numberSteve French
Signed-off-by: Steve French <sfrench@us.ibm.com>
2006-12-23Fix reiserfs after "test_clear_page_dirty()" removalLinus Torvalds
Thanks to Len Brown for testing this fix, since while they have in the past, none of my machines run reiserfs at the moment. Cc: Vladimir V. Saveliev <vs@namesys.com> Acked-by: Len Brown <lenb@kernel.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-22[PATCH] jbd: wait for already submitted t_sync_datalist buffer to completeHisashi Hifumi
In the current jbd code, if a buffer on BJ_SyncData list is dirty and not locked, the buffer is refiled to BJ_Locked list, submitted to the IO and waited for IO completion. But the fsstress test showed the case that when a buffer was already submitted to the IO just before the buffer_dirty(bh) check, the buffer was not waited for IO completion. Following patch solves this problem. If it is assumed that a buffer is submitted to the IO before the buffer_dirty(bh) check and still being written to disk, this buffer is refiled to BJ_Locked list. Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp> Cc: Jan Kara <jack@ucw.cz> Cc: "Stephen C. Tweedie" <sct@redhat.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-22[PATCH] fdtable: Provide free_fdtable() wrapperVadim Lobanov
Christoph Hellwig has expressed concerns that the recent fdtable changes expose the details of the RCU methodology used to release no-longer-used fdtable structures to the rest of the kernel. The trivial patch below addresses these concerns by introducing the appropriate free_fdtable() calls, which simply wrap the release RCU usage. Since free_fdtable() is a one-liner, it makes sense to promote it to an inline helper. Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-22[PATCH] Make JFFS depend on CONFIG_BROKENJosh Boyer
Mark JFFS as broken and provide a warning to users that it is deprecated and scheduled for removal in 2.6.21 Signed-off-by: Josh Boyer <jwboyer@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-22[PATCH] fsstack: Remove inode copyMichael Halcrow
Trevor found a file size problem in eCryptfs in recent kernels, and he tracked it down to an fsstack change. This was the eCryptfs copy_attr_all: > -void ecryptfs_copy_attr_all(struct inode *dest, const struct inode *src) > -{ > - dest->i_mode = src->i_mode; > - dest->i_nlink = src->i_nlink; > - dest->i_uid = src->i_uid; > - dest->i_gid = src->i_gid; > - dest->i_rdev = src->i_rdev; > - dest->i_atime = src->i_atime; > - dest->i_mtime = src->i_mtime; > - dest->i_ctime = src->i_ctime; > - dest->i_blkbits = src->i_blkbits; > - dest->i_flags = src->i_flags; > -} This is the fsstack copy_attr_all: > +void fsstack_copy_attr_all(struct inode *dest, const struct inode *src, > + int (*get_nlinks)(struct inode *)) > +{ > + if (!get_nlinks) > + dest->i_nlink = src->i_nlink; > + else > + dest->i_nlink = (*get_nlinks)(dest); > + > + dest->i_mode = src->i_mode; > + dest->i_uid = src->i_uid; > + dest->i_gid = src->i_gid; > + dest->i_rdev = src->i_rdev; > + dest->i_atime = src->i_atime; > + dest->i_mtime = src->i_mtime; > + dest->i_ctime = src->i_ctime; > + dest->i_blkbits = src->i_blkbits; > + dest->i_flags = src->i_flags; > + > + fsstack_copy_inode_size(dest, src); > +} The addition of copy_inode_size breaks eCryptfs, since eCryptfs needs to interpolate the file sizes (eCryptfs has extra space in the lower file for the header). The setting of the upper inode size occurs elsewhere in eCryptfs, and the new copy_attr_all now undoes what eCryptfs was doing right beforehand. I see three ways of going forward from here. (1) Something like this patch needs to go in (assuming it jives with Unionfs), (2) we need to make a change to the fsstack API for more fine-grained control over copying attributes (e.g., by also including a callback function for calculating the right file size, which will require some more work on both eCryptfs and Unionfs), or (3) the fsstack patch on eCryptfs (commit 0cc72dc7f050188d8d7344b1dd688cbc68d3cd30 made on Fri Dec 8 02:36:31 2006 -0800) needs to be yanked in 2.6.20. I think the simplest solution, from eCryptfs' perspective, is to just remove the inode size copy. Remove inode size copy in general fsstack attr copy code. Stacked filesystems may need to interpolate the inode size, since the file size in the lower file may be different than the file size in the stacked layer. Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Acked-by: Josef "Jeff" Sipek <jsipek@cs.sunysb.edu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>