aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2009-11-30dlm: always use GFP_NOFSDavid Teigland
Replace all GFP_KERNEL and ls_allocation with GFP_NOFS. ls_allocation would be GFP_KERNEL for userland lockspaces and GFP_NOFS for file system lockspaces. It was discovered that any lockspaces on the system can affect all others by triggering memory reclaim in the file system which could in turn call back into the dlm to acquire locks, deadlocking dlm threads that were shared by all lockspaces, like dlm_recv. Signed-off-by: David Teigland <teigland@redhat.com>
2009-11-19Merge branch 'upstream-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: ocfs2: Trivial cleanup of jbd compatibility layer removal ocfs2: Refresh documentation ocfs2: return f_fsid info in ocfs2_statfs() ocfs2: duplicate inline data properly during reflink. ocfs2: Move ocfs2_complete_reflink to the right place. ocfs2: Return -EINVAL when a device is not ocfs2.
2009-11-19Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6Linus Torvalds
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: SUNRPC: Address buffer overrun in rpc_uaddr2sockaddr() NFSv4: Fix a cache validation bug which causes getcwd() to return ENOENT
2009-11-17fcntl: rename F_OWNER_GID to F_OWNER_PGRPPeter Zijlstra
This is for consistency with various ioctl() operations that include the suffix "PGRP" in their names, and also for consistency with PRIO_PGRP, used with setpriority() and getpriority(). Also, using PGRP instead of GID avoids confusion with the common abbreviation of "group ID". I'm fine with anything that makes it more consistent, and if PGRP is what is the predominant abbreviation then I see no need to further confuse matters by adding a third one. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-11-17procfs: fix /proc/<pid>/stat stack pointer for kernel threadsStefani Seibold
Fix a small issue for the stack pointer in /proc/<pid>/stat. In case of a kernel thread the value of the printed stack pointer should be 0. Signed-off-by: Stefani Seibold <stefani@seibold.net> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-11-17Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfsLinus Torvalds
* 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: copy li_lsn before dropping AIL lock XFS bug in log recover with quota (bugzilla id 855)
2009-11-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: cifs: clear server inode number flag while autodisabling
2009-11-17Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: nilfs2: deleted inconsistent comment in nilfs_load_inode_block() nilfs2: deleted struct nilfs_dat_group_desc nilfs2: fix lock order reversal in chcp operation
2009-11-17xfs: copy li_lsn before dropping AIL lockNathaniel W. Turner
Access to log items on the AIL is generally protected by m_ail_lock; this is particularly needed when we're getting or setting the 64-bit li_lsn on a 32-bit platform. This patch fixes a couple places where we were accessing the log item after dropping the AIL lock on 32-bit machines. This can result in a partially-zeroed log->l_tail_lsn if xfs_trans_ail_delete is racing with xfs_trans_ail_update, and in at least some cases, this can leave the l_tail_lsn with a zero cycle number, which means xlog_space_left will think the log is full (unless CONFIG_XFS_DEBUG is set, in which case we'll trip an ASSERT), leading to processes stuck forever in xlog_grant_log_space. Thanks to Adrian VanderSpek for first spotting the race potential and to Dave Chinner for debug assistance. Signed-off-by: Nathaniel W. Turner <nate@houseofnate.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2009-11-17XFS bug in log recover with quota (bugzilla id 855)Jan Rekorajski
Hi, I was hit by a bug in linux 2.6.31 when XFS is not able to recover the log after a crash if fs was mounted with quotas. Gory details in XFS bugzilla: http://oss.sgi.com/bugzilla/show_bug.cgi?id=855. It looks like wrong struct is used in buffer length check, and the following patch should fix the problem. xfs_dqblk_t has a size of 104+32 bytes, while xfs_disk_dquot_t is 104 bytes long, and this is exactly what I see in system logs - "XFS: dquot too small (104) in xlog_recover_do_dquot_trans." Signed-off-by: Jan Rekorajski <baggins@sith.mimuw.edu.pl> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2009-11-16cifs: clear server inode number flag while autodisablingSuresh Jayaraman
Fix the commit ec06aedd44 that intended to turn off querying for server inode numbers when server doesn't consistently support inode numbers. Presumably the commit didn't actually clear the CIFS_MOUNT_SERVER_INUM flag, perhaps a typo. Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de> Acked-by: Jeff Layton <jlayton@redhat.com> Cc: Stable <stable@kernel.org> Signed-off-by: Steve French <sfrench@us.ibm.com>
2009-11-15nilfs2: deleted inconsistent comment in nilfs_load_inode_block()Jiro SEKIBA
The comment says, "Caller of this function MUST lock s_inode_lock", however just above the comment, it locks s_inode_lock in the function. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-14Fix memory corruption caused by nfsd readdir+Petr Vandrovec
Commit 8177e6d6dfb9cd03d9bdeb647c32161f8f58f686 ("nfsd: clean up readdirplus encoding") introduced single character typo in nfs3 readdir+ implementation. Unfortunately that typo has quite bad side effects: random memory corruption, followed (on my box) with immediate spontaneous box reboot. Using 'p1' instead of 'p' fixes my Linux box rebooting whenever VMware ESXi box tries to list contents of my home directory. Signed-off-by: Petr Vandrovec <petr@vandrovec.name> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-11-13ocfs2: Trivial cleanup of jbd compatibility layer removalSunil Mushran
Mainline commit 53ef99cad9878f02f27bb30bc304fc42af8bdd6e removed the JBD compatibility layer from OCFS2. This patch removes the last remaining remnants of that. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-11-13nilfs2: fix lock order reversal in chcp operationRyusuke Konishi
Will fix the following lock order reversal lockdep detected: ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.32-rc6 #7 ------------------------------------------------------- chcp/30157 is trying to acquire lock: (&nilfs->ns_mount_mutex){+.+.+.}, at: [<fed7cfcc>] nilfs_cpfile_change_cpmode+0x46/0x752 [nilfs2] but task is already holding lock: (&nilfs->ns_segctor_sem){++++.+}, at: [<fed7ca32>] nilfs_transaction_begin+0xba/0x110 [nilfs2] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&nilfs->ns_segctor_sem){++++.+}: [<c105799c>] __lock_acquire+0x109c/0x139d [<c1057d26>] lock_acquire+0x89/0xa0 [<c14151e2>] down_read+0x31/0x45 [<fed6d77b>] nilfs_attach_checkpoint+0x8f/0x16b [nilfs2] [<fed6e393>] nilfs_get_sb+0x3e7/0x653 [nilfs2] [<c10c0ccb>] vfs_kern_mount+0x8b/0x124 [<c10c0db2>] do_kern_mount+0x37/0xc3 [<c10d7517>] do_mount+0x64d/0x69d [<c10d75cd>] sys_mount+0x66/0x95 [<c1002a14>] sysenter_do_call+0x12/0x32 -> #1 (&type->s_umount_key#31/1){+.+.+.}: [<c105799c>] __lock_acquire+0x109c/0x139d [<c1057d26>] lock_acquire+0x89/0xa0 [<c104c0f3>] down_write_nested+0x34/0x52 [<c10c08fe>] sget+0x22e/0x389 [<fed6e133>] nilfs_get_sb+0x187/0x653 [nilfs2] [<c10c0ccb>] vfs_kern_mount+0x8b/0x124 [<c10c0db2>] do_kern_mount+0x37/0xc3 [<c10d7517>] do_mount+0x64d/0x69d [<c10d75cd>] sys_mount+0x66/0x95 [<c1002a14>] sysenter_do_call+0x12/0x32 -> #0 (&nilfs->ns_mount_mutex){+.+.+.}: [<c1057727>] __lock_acquire+0xe27/0x139d [<c1057d26>] lock_acquire+0x89/0xa0 [<c1414d63>] mutex_lock_nested+0x41/0x23e [<fed7cfcc>] nilfs_cpfile_change_cpmode+0x46/0x752 [nilfs2] [<fed801b2>] nilfs_ioctl+0x11a/0x7da [nilfs2] [<c10cca12>] vfs_ioctl+0x27/0x6e [<c10ccf93>] do_vfs_ioctl+0x491/0x4db [<c10cd022>] sys_ioctl+0x45/0x5f [<c1002a14>] sysenter_do_call+0x12/0x32 Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-12__generic_block_fiemap(): fix for files bigger than 4GBMike Hommey
Because of an integer overflow on start_blk, various kind of wrong results would be returned by the generic_block_fiemap() handler, such as no extents when there is a 4GB+ hole at the beginning of the file, or wrong fe_logical when an extent starts after the first 4GB. Signed-off-by: Mike Hommey <mh@glandium.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Eric Sandeen <sandeen@sgi.com> Cc: Josef Bacik <jbacik@redhat.com> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-11-12exec: setup_arg_pages() fails to return errorsAnton Blanchard
In setup_arg_pages we work hard to assign a value to ret, but on exit we always return 0. Also remove a now duplicated exit path and branch to out_unlock instead. Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-11-12fs: add missing compat_ptr handling for FS_IOC_RESVSP ioctlHeiko Carstens
For FS_IOC_RESVSP and FS_IOC_RESVSP64 compat_sys_ioctl() uses its arg argument as a pointer to userspace. However it is missing a a call to compat_ptr() which will do a proper pointer conversion. This was introduced with 3e63cbb1 "fs: Add new pre-allocation ioctls to vfs for compatibility with legacy xfs ioctls". Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ankit Jain <me@ankitjain.org> Acked-by: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Arnd Bergmann <arndbergmann@googlemail.com> Acked-by: David S. Miller <davem@davemloft.net> Cc: <stable@kernel.org> [2.6.31.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-11-12pidns: fix a leak in /proc dentries and inodes with pid namespaces.Sukadev Bhattiprolu
Daniel Lezcano reported a leak in 'struct pid' and 'struct pid_namespace' that is discussed in: http://lkml.org/lkml/2009/10/2/159. To summarize the thread, when container-init is terminated, it sets the PF_EXITING flag, zaps other processes in the container and waits to reap them. As a part of reaping, the container-init should flush any /proc dentries associated with the processes. But because the container-init is itself exiting and the following PF_EXITING check, the dentries are not flushed, resulting in leak in /proc inodes and dentries. This fix reverts the commit 7766755a2f249e7e0 ("Fix /proc dcache deadlock in do_exit") which introduced the check for PF_EXITING. At the time of the commit, shrink_dcache_parent() flushed dentries from other filesystems also and could have caused a deadlock which the commit fixed. But as pointed out by Eric Biederman, after commit 0feae5c47aabdde59, shrink_dcache_parent() no longer affects other filesystems. So reverting the commit is now safe. As pointed out by Jan Kara, the leak is not as critical since the unclaimed space will be reclaimed under memory pressure or by: echo 3 > /proc/sys/vm/drop_caches But since this check is no longer required, its best to remove it. Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Reported-by: Daniel Lezcano <dlezcano@fr.ibm.com> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Jan Kara <jack@ucw.cz> Cc: Andrea Arcangeli <andrea@cpushare.com> Cc: Serge Hallyn <serue@us.ibm.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-11-12fs/jbd: Export log_start_commit to fix ext3 build.Stefan Schmidt
This fixes: ERROR: "log_start_commit" [fs/ext3/ext3.ko] undefined! Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2009-11-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstableLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: fix panic when trying to destroy a newly allocated Btrfs: allow more metadata chunk preallocation Btrfs: fallback on uncompressed io if compressed io fails Btrfs: find ideal block group for caching Btrfs: avoid null deref in unpin_extent_cache() Btrfs: skip btrfs_release_path in btrfs_update_root and btrfs_del_root Btrfs: fix some metadata enospc issues Btrfs: fix how we set max_size for free space clusters Btrfs: cleanup transaction starting and fix journal_info usage Btrfs: fix data allocation hint start
2009-11-11Btrfs: fix panic when trying to destroy a newly allocatedJosef Bacik
There is a problem where iget5_locked will look for an inode, not find it, and then subsequently try to allocate it. Another CPU will have raced in and allocated the inode instead, so when iget5_locked gets the inode spin lock again and does a search, it finds the new inode. So it goes ahead and calls destroy_inode on the inode it just allocated. The problem is we don't set BTRFS_I(inode)->root until the new inode is completely initialized. This patch makes us set root to NULL when alloc'ing a new inode, so when we get to btrfs_destroy_inode and we see that root is NULL we can just free up the memory and continue on. This fixes the panic http://www.kerneloops.org/submitresult.php?number=812690 Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-11-11Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: JBD/JBD2: free j_wbuf if journal init fails. ext3: Wait for proper transaction commit on fsync ext3: retry failed direct IO allocations
2009-11-11Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: partial revert to fix double brelse WARNING() ext4: Fix return value of ext4_split_unwritten_extents() to fix direct I/O ext4: code clean up for dio fallocate handling ext4: skip conversion of uninit extents after direct IO if there isn't any ext4: fix ext4_ext_direct_IO()'s return value after converting uninit extents ext4: discard preallocation when restarting a transaction during truncate
2009-11-11Btrfs: allow more metadata chunk preallocationChris Mason
On an FS where all of the space has not been allocated into chunks yet, the enospc can return enospc just because the existing metadata chunks are full. We get around this by allowing more metadata chunks to be allocated up to a certain limit, and finding the right limit is a little fuzzy. The problem is the reservations for delalloc would preallocate way too much of the FS as metadata. We need to start saying no and just force some IO to happen. But we also need to let a reasonable amount of the FS become metadata. This bumps the hard limit up, later releases will have a better system. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-11-11Btrfs: fallback on uncompressed io if compressed io failsJosef Bacik
Currently compressed IO does not deal with not having its entire extent able to be allocated. So if we have enough free space to allocate for the extent, but its not contiguous, it will fail spectacularly. This patch fixes this by falling back on uncompressed IO which lets us spread the delalloc extent across multiple extents. I tested this by making us randomly think the reservation had failed to make it fallback on the uncompressed io way and it seemed to work fine. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-11-11Btrfs: find ideal block group for cachingJosef Bacik
This patch changes a few things. Hopefully the comments are helpfull, but I'll try and be as verbose here. Problem: My fedora box was taking 1 minute and 21 seconds to boot with btrfs as root. Part of this problem was we pick the first block group we can find and start caching it, even if it may not have enough free space. The other problem is we only search for cached block groups the first time around, which we won't find any cached block groups because this is a newly mounted fs, so we end up caching several block groups during bootup, which with alot of fragmentation takes around 30-45 seconds to complete, which bogs down the system. So Solution: 1) Don't cache block groups willy-nilly at first. Instead try and figure out which block group has the most free, and therefore will take the least amount of time to cache. 2) Don't be so picky about cached block groups. The other problem is once we've filled up a cluster, if the block group isn't finished caching the next time we try and do the allocation we'll completely ignore the cluster and start searching from the beginning of the space, which makes us cache more block groups, which slows us down even more. So instead of skipping block groups that are not finished caching when we have a hint, only skip the block group if it hasn't started caching yet. There is one other tweak in here. Before if we allocated a chunk and still couldn't find new space, we'd end up switching the space info to force another chunk allocation. This could make us end up with way too many chunks, so keep track of this particular case. With this patch and my previous cluster fixes my fedora box now boots in 43 seconds, and according to the bootchart is not held up by our block group caching at all. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-11-11Btrfs: avoid null deref in unpin_extent_cache()Dan Carpenter
I re-orderred the checks to avoid dereferencing "em" if it was null. Found by smatch static checker. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-11-11Btrfs: skip btrfs_release_path in btrfs_update_root and btrfs_del_rootLi Dongyang
We don't need to call btrfs_release_path because btrfs_free_path will do that for us. Signed-off-by: Li Dongyang <Jerry87905@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-11-11Btrfs: fix some metadata enospc issuesJosef Bacik
We weren't reserving metadata space for rename, rmdir and unlink, which could cause problems. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-11-11Btrfs: fix how we set max_size for free space clustersJosef Bacik
This patch fixes a problem where max_size can be set to 0 even though we filled the cluster properly. We set max_size to 0 if we restart the cluster window, but if the new start entry is big enough to be our new cluster then we could return with a max_size set to 0, which will mean the next time we try to allocate from this cluster it will fail. So set max_extent to the entry's size. Tested this on my box and now we actually allocate from the cluster after we fill it. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-11-11Btrfs: cleanup transaction starting and fix journal_info usageJosef Bacik
We use journal_info to tell if we're in a nested transaction to make sure we don't commit the transaction within a nested transaction. We use another method to see if there are any outstanding ioctl trans handles, so if we're starting one do not set current->journal_info, since it will screw with other filesystems. This patch also cleans up the starting stuff so there aren't any magic numbers. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-11-11Btrfs: fix data allocation hint startJosef Bacik
Sometimes our start allocation hint when we cow a file can be either EXTENT_HOLE or some other such place holder, which is not optimal. So if we find that our em->block_start is one of these special values, check to see where the first block of the inode is stored, and use that as a hint. If that block is also a special value, just fallback on a hint of 0 and let the allocator figure out a good place to put the data. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-11-11JBD/JBD2: free j_wbuf if journal init fails.Tao Ma
If journal init fails, we need to free j_wbuf. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz>
2009-11-11ext3: Wait for proper transaction commit on fsyncJan Kara
We cannot rely on buffer dirty bits during fsync because pdflush can come before fsync is called and clear dirty bits without forcing a transaction commit. What we do is that we track which transaction has last changed the inode and which transaction last changed allocation and force it to disk on fsync. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
2009-11-11ext3: retry failed direct IO allocationsEric Sandeen
On a 256M 4k block filesystem, doing this in a loop: dd if=/dev/zero of=test oflag=direct bs=1M count=64 rm -f test eventually leads to spurious ENOSPC: dd: writing `test': No space left on device As with other block allocation callers, it looks like we need to potentially retry the allocations on the initial ENOSPC. A similar patch went into ext4 (commit fbbf69456619de5d251cb9f1df609069178c62d5) Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz>
2009-11-11NFSv4: Fix a cache validation bug which causes getcwd() to return ENOENTTrond Myklebust
Changeset a65318bf3afc93ce49227e849d213799b072c5fd (NFSv4: Simplify some cache consistency post-op GETATTRs) incorrectly changed the getattr bitmap for readdir(). This causes the readdir() function to fail to return a fileid/inode number, which again exposed a bug in the NFS readdir code that causes spurious ENOENT errors to appear in applications (see http://bugzilla.kernel.org/show_bug.cgi?id=14541). The immediate band aid is to revert the incorrect bitmap change, but more long term, we should change the NFS readdir code to cope with the fact that NFSv4 servers are not required to support fileids/inode numbers. Reported-by: Daniel J Blueman <daniel.blueman@gmail.com> Cc: stable@kernel.org Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-11-09Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: nilfs2: fix missing cleanup of gc cache on error cases nilfs2: fix kernel oops in error case of nilfs_ioctl_move_blocks
2009-11-08ext4: partial revert to fix double brelse WARNING()Theodore Ts'o
This is a partial revert of commit 6487a9d (only the changes made to fs/ext4/namei.c), since it is causing the following brelse() double-free warning when running fsstress on a file system with 1k blocksize and we run into a block allocation failure while converting a single-block directory to a multi-block hash-tree indexed directory. WARNING: at fs/buffer.c:1197 __brelse+0x2e/0x33() Hardware name: VFS: brelse: Trying to free free buffer Modules linked in: Pid: 2226, comm: jbd2/sdd-8 Not tainted 2.6.32-rc6-00577-g0003f55 #101 Call Trace: [<c01587fb>] warn_slowpath_common+0x65/0x95 [<c0158869>] warn_slowpath_fmt+0x29/0x2c [<c021168e>] __brelse+0x2e/0x33 [<c0288a9f>] jbd2_journal_refile_buffer+0x67/0x6c [<c028a9ed>] jbd2_journal_commit_transaction+0x319/0x14d8 [<c0164d73>] ? try_to_del_timer_sync+0x58/0x60 [<c0175bcc>] ? sched_clock_cpu+0x12a/0x13e [<c017f6b4>] ? trace_hardirqs_off+0xb/0xd [<c0175c1f>] ? cpu_clock+0x3f/0x5b [<c017f6ec>] ? lock_release_holdtime+0x36/0x137 [<c0664ad0>] ? _spin_unlock_irqrestore+0x44/0x51 [<c0180af3>] ? trace_hardirqs_on_caller+0x103/0x124 [<c0180b1f>] ? trace_hardirqs_on+0xb/0xd [<c0164d73>] ? try_to_del_timer_sync+0x58/0x60 [<c0290d1c>] kjournald2+0x11a/0x310 [<c017118e>] ? autoremove_wake_function+0x0/0x38 [<c0290c02>] ? kjournald2+0x0/0x310 [<c0170ee6>] kthread+0x66/0x6b [<c0170e80>] ? kthread+0x0/0x6b [<c01251b3>] kernel_thread_helper+0x7/0x10 ---[ end trace 5579351b86af61e3 ]--- Commit 6487a9d was an attempt some buffer head leaks in an ENOSPC error path, but in some cases it actually results in an excess ENOSPC, as shown above. Fixing this means cleaning up who is responsible for releasing the buffer heads from the callee to the caller of add_dirent_to_buf(). Since that's a relatively complex change, and we're late in the rcX development cycle, I'm reverting this now, and holding back a more complete fix until after 2.6.32 ships. We've lived with this buffer_head leak on ENOSPC in ext3 and ext4 for a very long time; a few more months won't kill us. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Curt Wohlgemuth <curtw@google.com>
2009-11-08nilfs2: fix missing cleanup of gc cache on error casesRyusuke Konishi
This fixes an -rc1 regression brought by the commit: 1cf58fa840472ec7df6bf2312885949ebb308853 ("nilfs2: shorten freeze period due to GC in write operation v3"). Although the patch moved out a function call of nilfs_ioctl_move_blocks() to nilfs_ioctl_clean_segments() from nilfs_ioctl_prepare_clean_segments(), it didn't move corresponding cleanup job needed for the error case. This will move the missing cleanup job to the destination function. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Acked-by: Jiro SEKIBA <jir@unicus.jp>
2009-11-08nilfs2: fix kernel oops in error case of nilfs_ioctl_move_blocksRyusuke Konishi
This fixes a kernel oops reported by Markus Trippelsdorf in the email titled "[NILFS users] kernel Oops while running nilfs_cleanerd". The oops was caused by a bug of error path in nilfs_ioctl_move_blocks() function, which was inlined in nilfs_ioctl_clean_segments(). nilfs_ioctl_move_blocks checks duplication of blocks which will be moved in garbage collection. But, the check should have be done within nilfs_ioctl_move_inode_block() to prevent list corruption among buffers storing the target blocks. To fix the kernel oops, this moves forward the duplication check before the list insertion. I also tested this for stable trees [2.6.30, 2.6.31]. Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: stable <stable@kernel.org>
2009-11-06cifs: don't use CIFSGetSrvInodeNumber in is_path_accessibleJeff Layton
Because it's lighter weight, CIFS tries to use CIFSGetSrvInodeNumber to verify the accessibility of the root inode and then falls back to doing a full QPathInfo if that fails with -EOPNOTSUPP. I have at least a report of a server that returns NT_STATUS_INTERNAL_ERROR rather than something that translates to EOPNOTSUPP. Rather than trying to be clever with that call, just have is_path_accessible do a normal QPathInfo. That call is widely supported and it shouldn't increase the overhead significantly. Cc: Stable <stable@kernel.org> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2009-11-06cifs: clean up handling when server doesn't consistently support inode numbersJeff Layton
It's possible that a server will return a valid FileID when we query the FILE_INTERNAL_INFO for the root inode, but then zeroed out inode numbers when we do a FindFile with an infolevel of SMB_FIND_FILE_ID_FULL_DIR_INFO. In this situation turn off querying for server inode numbers, generate a warning for the user and just generate an inode number using iunique. Once we generate any inode number with iunique we can no longer use any server inode numbers or we risk collisions, so ensure that we don't do that in cifs_get_inode_info either. Cc: Stable <stable@kernel.org> Reported-by: Timothy Normand Miller <theosib@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2009-11-06ext4: Fix return value of ext4_split_unwritten_extents() to fix direct I/OMingming
To prepare for a direct I/O write, we need to split the unwritten extents before submitting the I/O. When no extents needed to be split, ext4_split_unwritten_extents() was incorrectly returning 0 instead of the size of uninitialized extents. This bug caused the wrong return value sent back to VFS code when it gets called from async IO path, leading to an unnecessary fall back to buffered IO. This bug also hid the fact that the check to see whether or not a split would be necessary was incorrect; we can only skip splitting the extent if the write completely covers the uninitialized extent. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-11-05Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: invalidate target of rename fuse: fix kunmap in fuse_ioctl_copy_user fuse: prevent fuse_put_request on invalid pointer
2009-11-05Merge branch 'merge' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: powerpc/kvm: Remove problematic BUILD_BUG_ON statement powerpc/pci: Fix regression in powerpc MSI-X powerpc: Avoid giving out RTC dates below EPOCH powerpc/mm: Remove debug context clamping from nohash code powerpc: Cleanup Kconfig selection of hugetlbfs support
2009-11-05Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: sysfs: Don't leak secdata when a sysfs_dirent is freed.
2009-11-05Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, fs: Fix x86 procfs stack information for threads on 64-bit x86: Add reboot quirk for 3 series Mac mini x86: Fix printk message typo in mtrr cleanup code dma-debug: Fix compile warning with PAE enabled x86/amd-iommu: Un__init function required on shutdown x86/amd-iommu: Workaround for erratum 63
2009-11-05sysfs: Don't leak secdata when a sysfs_dirent is freed.Eric W. Biederman
While refreshing my sysfs patches I noticed a leak in the secdata implementation. We don't free the secdata when we free the sysfs dirent. This is a bug in 2.6.32-rc5 that we really should close. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>
2009-11-04x86, fs: Fix x86 procfs stack information for threads on 64-bitStefani Seibold
This patch fixes two issues in the procfs stack information on x86-64 linux. The 32 bit loader compat_do_execve did not store stack start. (this was figured out by Alexey Dobriyan). The stack information on a x64_64 kernel always shows 0 kbyte stack usage, because of a missing implementation of the KSTK_ESP macro which always returned -1. The new implementation now returns the right value. Signed-off-by: Stefani Seibold <stefani@seibold.net> Cc: Americo Wang <xiyou.wangcong@gmail.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <1257240160.4889.24.camel@wall-e> Signed-off-by: Ingo Molnar <mingo@elte.hu>