aboutsummaryrefslogtreecommitdiff
path: root/fs/ext4
AgeCommit message (Collapse)Author
2010-01-01ext4: Calculate metadata requirements more accuratelyTheodore Ts'o
In the past, ext4_calc_metadata_amount(), and its sub-functions ext4_ext_calc_metadata_amount() and ext4_indirect_calc_metadata_amount() badly over-estimated the number of metadata blocks that might be required for delayed allocation blocks. This didn't matter as much when functions which managed the reserved metadata blocks were more aggressive about dropping reserved metadata blocks as delayed allocation blocks were written, but unfortunately they were too aggressive. This was fixed in commit 0637c6f, but as a result the over-estimation by ext4_calc_metadata_amount() would lead to reserving 2-3 times the number of pending delayed allocation blocks as potentially required metadata blocks. So if there are 1 megabytes of blocks which have been not yet been allocation, up to 3 megabytes of space would get reserved out of the user's quota and from the file system free space pool until all of the inode's data blocks have been allocated. This commit addresses this problem by much more accurately estimating the number of metadata blocks that will be required. It will still somewhat over-estimate the number of blocks needed, since it must make a worst case estimate not knowing which physical blocks will be needed, but it is much more accurate than before. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-01-01ext4: Fix accounting of reserved metadata blocksTheodore Ts'o
Commit 0637c6f had a typo which caused the reserved metadata blocks to not be released correctly. Fix this. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-12-30ext4: Patch up how we claim metadata blocks for quota purposesTheodore Ts'o
As reported in Kernel Bugzilla #14936, commit d21cd8f triggered a BUG in the function ext4_da_update_reserve_space() found in fs/ext4/inode.c. The root cause of this BUG() was caused by the fact that ext4_calc_metadata_amount() can severely over-estimate how many metadata blocks will be needed, especially when using direct block-mapped files. In addition, it can also badly *under* estimate how much space is needed, since ext4_calc_metadata_amount() assumes that the blocks are contiguous, and this is not always true. If the application is writing blocks to a sparse file, the number of metadata blocks necessary can be severly underestimated by the functions ext4_da_reserve_space(), ext4_da_update_reserve_space() and ext4_da_release_space(). This was the cause of the dq_claim_space reports found on kerneloops.org. Unfortunately, doing this right means that we need to massively over-estimate the amount of free space needed. So in some cases we may need to force the inode to be written to disk asynchronously in to avoid spurious quota failures. http://bugzilla.kernel.org/show_bug.cgi?id=14936 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-12-29ext4: Ensure zeroout blocks have no dirty metadataAneesh Kumar K.V
This fixes a bug (found by Curt Wohlgemuth) in which new blocks returned from an extent created with ext4_ext_zeroout() can have dirty metadata still associated with them. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-12-25ext4: return correct wbc.nr_to_write in ext4_da_writepagesRichard Kennedy
When ext4_da_writepages increases the nr_to_write in writeback_control then it must always re-base the return value. Originally there was a (misguided) attempt prevent wbc.nr_to_write from going negative. In fact, it's necessary to allow nr_to_write to be negative so that wb_writeback() can correctly calculate how many pages were actually written. Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-12-23ext4: flush delalloc blocks when space is lowEric Sandeen
Creating many small files in rapid succession on a small filesystem can lead to spurious ENOSPC; on a 104MB filesystem: for i in `seq 1 22500`; do echo -n > $SCRATCH_MNT/$i echo XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX > $SCRATCH_MNT/$i done leads to ENOSPC even though after a sync, 40% of the fs is free again. This is because we reserve worst-case metadata for delalloc writes, and when data is allocated that worst-case reservation is not usually needed. When freespace is low, kicking off an async writeback will start converting that worst-case space usage into something more realistic, almost always freeing up space to continue. This resolves the testcase for me, and survives all 4 generic ENOSPC tests in xfstests. We'll still need a hard synchronous sync to squeeze out the last bit, but this fixes things up to a large degree. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-12-23ext4: Eliminate potential double free on error pathJulia Lawall
b_entry_name and buffer are initially NULL, are initialized within a loop to the result of calling kmalloc, and are freed at the bottom of this loop. The loop contains gotos to cleanup, which also frees b_entry_name and buffer. Some of these gotos are before the reinitializations of b_entry_name and buffer. To maintain the invariant that b_entry_name and buffer are NULL at the top of the loop, and thus acceptable arguments to kfree, these variables are now set to NULL after the kfrees. This seems to be the simplest solution. A more complicated solution would be to introduce more labels in the error handling code at the end of the function. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @r@ identifier E; expression E1; iterator I; statement S; @@ *kfree(E); ... when != E = E1 when != I(E,...) S when != &E *kfree(E); // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-12-23ext4: fix unsigned long long printk warning in super.cAndrew Morton
sparc64 allmodconfig: fs/ext4/super.c: In function `lifetime_write_kbytes_show': fs/ext4/super.c:2174: warning: long long unsigned int format, long unsigned int arg (arg 4) fs/ext4/super.c:2174: warning: long long unsigned int format, long unsigned int arg (arg 4) Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-12-23ext4, jbd2: Add barriers for file systems with exernal journalsTheodore Ts'o
This is a bit complicated because we are trying to optimize when we send barriers to the fs data disk. We could just throw in an extra barrier to the data disk whenever we send a barrier to the journal disk, but that's not always strictly necessary. We only need to send a barrier during a commit when there are data blocks which are must be written out due to an inode written in ordered mode, or if fsync() depends on the commit to force data blocks to disk. Finally, before we drop transactions from the beginning of the journal during a checkpoint operation, we need to guarantee that any blocks that were flushed out to the data disk are firmly on the rust platter before we drop the transaction from the journal. Thanks to Oleg Drokin for pointing out this flaw in ext3/ext4. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-12-14ext4: replace BUG() with return -EIO in ext4_ext_get_blocksSurbhi Palande
This patch fixes the Kernel BZ #14286. When the address of an extent corresponding to a valid block is corrupted, a -EIO should be reported instead of a BUG(). This situation should not normally not occur except in the case of a corrupted filesystem. If however it does, then the system should not panic directly but depending on the mount time options appropriate action should be taken. If the mount options so permit, the I/O should be gracefully aborted by returning a -EIO. http://bugzilla.kernel.org/show_bug.cgi?id=14286 Signed-off-by: Surbhi Palande <surbhi.palande@canonical.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-12-21ext4: add module aliases for ext2 and ext3Theodore Ts'o
Add module aliases for ext2 and ext3 when CONFIG_EXT4_USE_FOR_EXT23 is set. This makes the existing user-space stuff like mkinitrd working as is. Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-12-21ext4: Don't ask about supporting ext2/3 in ext4 if ext4 is not configuredDavid Howells
Don't offer to build ext2/3 support into ext4 if ext4 itself is not configured on. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-12-14ext4: remove unused #include <linux/version.h>Huang Weiyi
Remove unused #include <linux/version.h>('s) in fs/ext4/block_validity.c fs/ext4/mballoc.h Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-12-23ext4: fix sleep inside spinlock issue with quota and dealloc (#14739)Dmitry Monakhov
Unlock i_block_reservation_lock before vfs_dq_reserve_block(). This patch fixes http://bugzilla.kernel.org/show_bug.cgi?id=14739 CC: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-23ext4: Fix potential quota deadlockDmitry Monakhov
We have to delay vfs_dq_claim_space() until allocation context destruction. Currently we have following call-trace: ext4_mb_new_blocks() /* task is already holding ac->alloc_semp */ ->ext4_mb_mark_diskspace_used ->vfs_dq_claim_space() /* acquire dqptr_sem here. Possible deadlock */ ->ext4_mb_release_context() /* drop ac->alloc_semp here */ Let's move quota claiming to ext4_da_update_reserve_space() ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.32-rc7 #18 ------------------------------------------------------- write-truncate-/3465 is trying to acquire lock: (&s->s_dquot.dqptr_sem){++++..}, at: [<c025e73b>] dquot_claim_space+0x3b/0x1b0 but task is already holding lock: (&meta_group_info[i]->alloc_sem){++++..}, at: [<c02ce962>] ext4_mb_load_buddy+0xb2/0x370 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (&meta_group_info[i]->alloc_sem){++++..}: [<c017d04b>] __lock_acquire+0xd7b/0x1260 [<c017d5ea>] lock_acquire+0xba/0xd0 [<c0527191>] down_read+0x51/0x90 [<c02ce962>] ext4_mb_load_buddy+0xb2/0x370 [<c02d0c1c>] ext4_mb_free_blocks+0x46c/0x870 [<c029c9d3>] ext4_free_blocks+0x73/0x130 [<c02c8cfc>] ext4_ext_truncate+0x76c/0x8d0 [<c02a8087>] ext4_truncate+0x187/0x5e0 [<c01e0f7b>] vmtruncate+0x6b/0x70 [<c022ec02>] inode_setattr+0x62/0x190 [<c02a2d7a>] ext4_setattr+0x25a/0x370 [<c022ee81>] notify_change+0x151/0x340 [<c021349d>] do_truncate+0x6d/0xa0 [<c0221034>] may_open+0x1d4/0x200 [<c022412b>] do_filp_open+0x1eb/0x910 [<c021244d>] do_sys_open+0x6d/0x140 [<c021258e>] sys_open+0x2e/0x40 [<c0103100>] sysenter_do_call+0x12/0x32 -> #2 (&ei->i_data_sem){++++..}: [<c017d04b>] __lock_acquire+0xd7b/0x1260 [<c017d5ea>] lock_acquire+0xba/0xd0 [<c0527191>] down_read+0x51/0x90 [<c02a5787>] ext4_get_blocks+0x47/0x450 [<c02a74c1>] ext4_getblk+0x61/0x1d0 [<c02a7a7f>] ext4_bread+0x1f/0xa0 [<c02bcddc>] ext4_quota_write+0x12c/0x310 [<c0262d23>] qtree_write_dquot+0x93/0x120 [<c0261708>] v2_write_dquot+0x28/0x30 [<c025d3fb>] dquot_commit+0xab/0xf0 [<c02be977>] ext4_write_dquot+0x77/0x90 [<c02be9bf>] ext4_mark_dquot_dirty+0x2f/0x50 [<c025e321>] dquot_alloc_inode+0x101/0x180 [<c029fec2>] ext4_new_inode+0x602/0xf00 [<c02ad789>] ext4_create+0x89/0x150 [<c0221ff2>] vfs_create+0xa2/0xc0 [<c02246e7>] do_filp_open+0x7a7/0x910 [<c021244d>] do_sys_open+0x6d/0x140 [<c021258e>] sys_open+0x2e/0x40 [<c0103100>] sysenter_do_call+0x12/0x32 -> #1 (&sb->s_type->i_mutex_key#7/4){+.+...}: [<c017d04b>] __lock_acquire+0xd7b/0x1260 [<c017d5ea>] lock_acquire+0xba/0xd0 [<c0526505>] mutex_lock_nested+0x65/0x2d0 [<c0260c9d>] vfs_load_quota_inode+0x4bd/0x5a0 [<c02610af>] vfs_quota_on_path+0x5f/0x70 [<c02bc812>] ext4_quota_on+0x112/0x190 [<c026345a>] sys_quotactl+0x44a/0x8a0 [<c0103100>] sysenter_do_call+0x12/0x32 -> #0 (&s->s_dquot.dqptr_sem){++++..}: [<c017d361>] __lock_acquire+0x1091/0x1260 [<c017d5ea>] lock_acquire+0xba/0xd0 [<c0527191>] down_read+0x51/0x90 [<c025e73b>] dquot_claim_space+0x3b/0x1b0 [<c02cb95f>] ext4_mb_mark_diskspace_used+0x36f/0x380 [<c02d210a>] ext4_mb_new_blocks+0x34a/0x530 [<c02c83fb>] ext4_ext_get_blocks+0x122b/0x13c0 [<c02a5966>] ext4_get_blocks+0x226/0x450 [<c02a5ff3>] mpage_da_map_blocks+0xc3/0xaa0 [<c02a6ed6>] ext4_da_writepages+0x506/0x790 [<c01de272>] do_writepages+0x22/0x50 [<c01d766d>] __filemap_fdatawrite_range+0x6d/0x80 [<c01d7b9b>] filemap_flush+0x2b/0x30 [<c02a40ac>] ext4_alloc_da_blocks+0x5c/0x60 [<c029e595>] ext4_release_file+0x75/0xb0 [<c0216b59>] __fput+0xf9/0x210 [<c0216c97>] fput+0x27/0x30 [<c02122dc>] filp_close+0x4c/0x80 [<c014510e>] put_files_struct+0x6e/0xd0 [<c01451b7>] exit_files+0x47/0x60 [<c0146a24>] do_exit+0x144/0x710 [<c0147028>] do_group_exit+0x38/0xa0 [<c0159abc>] get_signal_to_deliver+0x2ac/0x410 [<c0102849>] do_notify_resume+0xb9/0x890 [<c01032d2>] work_notifysig+0x13/0x21 other info that might help us debug this: 3 locks held by write-truncate-/3465: #0: (jbd2_handle){+.+...}, at: [<c02e1f8f>] start_this_handle+0x38f/0x5c0 #1: (&ei->i_data_sem){++++..}, at: [<c02a57f6>] ext4_get_blocks+0xb6/0x450 #2: (&meta_group_info[i]->alloc_sem){++++..}, at: [<c02ce962>] ext4_mb_load_buddy+0xb2/0x370 stack backtrace: Pid: 3465, comm: write-truncate- Not tainted 2.6.32-rc7 #18 Call Trace: [<c0524cb3>] ? printk+0x1d/0x22 [<c017ac9a>] print_circular_bug+0xca/0xd0 [<c017d361>] __lock_acquire+0x1091/0x1260 [<c016bca2>] ? sched_clock_local+0xd2/0x170 [<c0178fd0>] ? trace_hardirqs_off_caller+0x20/0xd0 [<c017d5ea>] lock_acquire+0xba/0xd0 [<c025e73b>] ? dquot_claim_space+0x3b/0x1b0 [<c0527191>] down_read+0x51/0x90 [<c025e73b>] ? dquot_claim_space+0x3b/0x1b0 [<c025e73b>] dquot_claim_space+0x3b/0x1b0 [<c02cb95f>] ext4_mb_mark_diskspace_used+0x36f/0x380 [<c02d210a>] ext4_mb_new_blocks+0x34a/0x530 [<c02c601d>] ? ext4_ext_find_extent+0x25d/0x280 [<c02c83fb>] ext4_ext_get_blocks+0x122b/0x13c0 [<c016bca2>] ? sched_clock_local+0xd2/0x170 [<c016be60>] ? sched_clock_cpu+0x120/0x160 [<c016beef>] ? cpu_clock+0x4f/0x60 [<c0178fd0>] ? trace_hardirqs_off_caller+0x20/0xd0 [<c052712c>] ? down_write+0x8c/0xa0 [<c02a5966>] ext4_get_blocks+0x226/0x450 [<c016be60>] ? sched_clock_cpu+0x120/0x160 [<c016beef>] ? cpu_clock+0x4f/0x60 [<c017908b>] ? trace_hardirqs_off+0xb/0x10 [<c02a5ff3>] mpage_da_map_blocks+0xc3/0xaa0 [<c01d69cc>] ? find_get_pages_tag+0x16c/0x180 [<c01d6860>] ? find_get_pages_tag+0x0/0x180 [<c02a73bd>] ? __mpage_da_writepage+0x16d/0x1a0 [<c01dfc4e>] ? pagevec_lookup_tag+0x2e/0x40 [<c01ddf1b>] ? write_cache_pages+0xdb/0x3d0 [<c02a7250>] ? __mpage_da_writepage+0x0/0x1a0 [<c02a6ed6>] ext4_da_writepages+0x506/0x790 [<c016beef>] ? cpu_clock+0x4f/0x60 [<c016bca2>] ? sched_clock_local+0xd2/0x170 [<c016be60>] ? sched_clock_cpu+0x120/0x160 [<c016be60>] ? sched_clock_cpu+0x120/0x160 [<c02a69d0>] ? ext4_da_writepages+0x0/0x790 [<c01de272>] do_writepages+0x22/0x50 [<c01d766d>] __filemap_fdatawrite_range+0x6d/0x80 [<c01d7b9b>] filemap_flush+0x2b/0x30 [<c02a40ac>] ext4_alloc_da_blocks+0x5c/0x60 [<c029e595>] ext4_release_file+0x75/0xb0 [<c0216b59>] __fput+0xf9/0x210 [<c0216c97>] fput+0x27/0x30 [<c02122dc>] filp_close+0x4c/0x80 [<c014510e>] put_files_struct+0x6e/0xd0 [<c01451b7>] exit_files+0x47/0x60 [<c0146a24>] do_exit+0x144/0x710 [<c017b163>] ? lock_release_holdtime+0x33/0x210 [<c0528137>] ? _spin_unlock_irq+0x27/0x30 [<c0147028>] do_group_exit+0x38/0xa0 [<c017babb>] ? trace_hardirqs_on+0xb/0x10 [<c0159abc>] get_signal_to_deliver+0x2ac/0x410 [<c0102849>] do_notify_resume+0xb9/0x890 [<c0178fd0>] ? trace_hardirqs_off_caller+0x20/0xd0 [<c017b163>] ? lock_release_holdtime+0x33/0x210 [<c0165b50>] ? autoremove_wake_function+0x0/0x50 [<c017ba54>] ? trace_hardirqs_on_caller+0x134/0x190 [<c017babb>] ? trace_hardirqs_on+0xb/0x10 [<c0300ba4>] ? security_file_permission+0x14/0x20 [<c0215761>] ? vfs_write+0x131/0x190 [<c0214f50>] ? do_sync_write+0x0/0x120 [<c0103115>] ? sysenter_do_call+0x27/0x32 [<c01032d2>] work_notifysig+0x13/0x21 CC: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-23ext4: Convert to generic reserved quota's space management.Dmitry Monakhov
This patch also fixes write vs chown race condition. Acked-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-17Revert "task_struct: make journal_info conditional"Linus Torvalds
This reverts commit e4c570c4cb7a95dbfafa3d016d2739bf3fdfe319, as requested by Alexey: "I think I gave a good enough arguments to not merge it. To iterate: * patch makes impossible to start using ext3 on EXT3_FS=n kernels without reboot. * this is done only for one pointer on task_struct" None of config options which define task_struct are tristate directly or effectively." Requested-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16sanitize xattr handler prototypesChristoph Hellwig
Add a flags argument to struct xattr_handler and pass it to all xattr handler methods. This allows using the same methods for multiple handlers, e.g. for the ACL methods which perform exactly the same action for the access and default ACLs, just using a different underlying attribute. With a little more groundwork it'll also allow sharing the methods for the regular user/trusted/secure handlers in extN, ocfs2 and jffs2 like it's already done for xfs in this patch. Also change the inode argument to the handlers to a dentry to allow using the handlers mechnism for filesystems that require it later, e.g. cifs. [with GFS2 bits updated by Steven Whitehouse <swhiteho@redhat.com>] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: James Morris <jmorris@namei.org> Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-12-15tree-wide: convert open calls to remove spaces to skip_spaces() lib functionAndré Goddard Rosa
Makes use of skip_spaces() defined in lib/string.c for removing leading spaces from strings all over the tree. It decreases lib.a code size by 47 bytes and reuses the function tree-wide: text data bss dec hex filename 64688 584 592 65864 10148 (TOTALS-BEFORE) 64641 584 592 65817 10119 (TOTALS-AFTER) Also, while at it, if we see (*str && isspace(*str)), we can be sure to remove the first condition (*str) as the second one (isspace(*str)) also evaluates to 0 whenever *str == 0, making it redundant. In other words, "a char equals zero is never a space". Julia Lawall tried the semantic patch (http://coccinelle.lip6.fr) below, and found occurrences of this pattern on 3 more files: drivers/leds/led-class.c drivers/leds/ledtrig-timer.c drivers/video/output.c @@ expression str; @@ ( // ignore skip_spaces cases while (*str && isspace(*str)) { \(str++;\|++str;\) } | - *str && isspace(*str) ) Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com> Cc: Julia Lawall <julia@diku.dk> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Richard Purdie <rpurdie@rpsys.net> Cc: Neil Brown <neilb@suse.de> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Cc: David Howells <dhowells@redhat.com> Cc: <linux-ext4@vger.kernel.org> Cc: Samuel Ortiz <samuel@sortiz.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Takashi Iwai <tiwai@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15task_struct: make journal_info conditionalHiroshi Shimamoto
journal_info in task_struct is used in journaling file system only. So introduce CONFIG_FS_JOURNAL_INFO and make it conditional. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-14Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (34 commits) m68k: rename global variable vmalloc_end to m68k_vmalloc_end percpu: add missing per_cpu_ptr_to_phys() definition for UP percpu: Fix kdump failure if booted with percpu_alloc=page percpu: make misc percpu symbols unique percpu: make percpu symbols in ia64 unique percpu: make percpu symbols in powerpc unique percpu: make percpu symbols in x86 unique percpu: make percpu symbols in xen unique percpu: make percpu symbols in cpufreq unique percpu: make percpu symbols in oprofile unique percpu: make percpu symbols in tracer unique percpu: make percpu symbols under kernel/ and mm/ unique percpu: remove some sparse warnings percpu: make alloc_percpu() handle array types vmalloc: fix use of non-existent percpu variable in put_cpu_var() this_cpu: Use this_cpu_xx in trace_functions_graph.c this_cpu: Use this_cpu_xx for ftrace this_cpu: Use this_cpu_xx in nmi handling this_cpu: Use this_cpu operations in RCU this_cpu: Use this_cpu ops for VM statistics ... Fix up trivial (famous last words) global per-cpu naming conflicts in arch/x86/kvm/svm.c mm/slab.c
2009-12-11Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: (21 commits) ext3: PTR_ERR return of wrong pointer in setup_new_group_blocks() ext3: Fix data / filesystem corruption when write fails to copy data ext4: Support for 64-bit quota format ext3: Support for vfsv1 quota format quota: Implement quota format with 64-bit space and inode limits quota: Move definition of QFMT_OCFS2 to linux/quota.h ext2: fix comment in ext2_find_entry about return values ext3: Unify log messages in ext3 ext2: clear uptodate flag on super block I/O error ext2: Unify log messages in ext2 ext3: make "norecovery" an alias for "noload" ext3: Don't update the superblock in ext3_statfs() ext3: journal all modifications in ext3_xattr_set_handle ext2: Explicitly assign values to on-disk enum of filetypes quota: Fix WARN_ON in lookup_one_len const: struct quota_format_ops ubifs: remove manual O_SYNC handling afs: remove manual O_SYNC handling kill wait_on_page_writeback_range vfs: Implement proper O_SYNC semantics ...
2009-12-10Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (47 commits) ext4: Fix potential fiemap deadlock (mmap_sem vs. i_data_sem) ext4: Do not override ext2 or ext3 if built they are built as modules jbd2: Export jbd2_log_start_commit to fix ext4 build ext4: Fix insufficient checks in EXT4_IOC_MOVE_EXT ext4: Wait for proper transaction commit on fsync ext4: fix incorrect block reservation on quota transfer. ext4: quota macros cleanup ext4: ext4_get_reserved_space() must return bytes instead of blocks ext4: remove blocks from inode prealloc list on failure ext4: wait for log to commit when umounting ext4: Avoid data / filesystem corruption when write fails to copy data ext4: Use ext4 file system driver for ext2/ext3 file system mounts ext4: Return the PTR_ERR of the correct pointer in setup_new_group_blocks() jbd2: Add ENOMEM checking in and for jbd2_journal_write_metadata_buffer() ext4: remove unused parameter wbc from __ext4_journalled_writepage() ext4: remove encountered_congestion trace ext4: move_extent_per_page() cleanup ext4: initialize moved_len before calling ext4_move_extents() ext4: Fix double-free of blocks with EXT4_IOC_MOVE_EXT ext4: use ext4_data_block_valid() in ext4_free_blocks() ...
2009-12-10ext4: Support for 64-bit quota formatJan Kara
Add support for new 64-bit quota format. It is enough to add proper mount options handling. The rest is done by the generic code. Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-09ext4: Fix potential fiemap deadlock (mmap_sem vs. i_data_sem)Theodore Ts'o
Fix the following potential circular locking dependency between mm->mmap_sem and ei->i_data_sem: ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.32-04115-gec044c5 #37 ------------------------------------------------------- ureadahead/1855 is trying to acquire lock: (&mm->mmap_sem){++++++}, at: [<ffffffff81107224>] might_fault+0x5c/0xac but task is already holding lock: (&ei->i_data_sem){++++..}, at: [<ffffffff811be1fd>] ext4_fiemap+0x11b/0x159 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&ei->i_data_sem){++++..}: [<ffffffff81099bfa>] __lock_acquire+0xb67/0xd0f [<ffffffff81099e7e>] lock_acquire+0xdc/0x102 [<ffffffff81516633>] down_read+0x51/0x84 [<ffffffff811a2414>] ext4_get_blocks+0x50/0x2a5 [<ffffffff811a3453>] ext4_get_block+0xab/0xef [<ffffffff81154f39>] do_mpage_readpage+0x198/0x48d [<ffffffff81155360>] mpage_readpages+0xd0/0x114 [<ffffffff811a104b>] ext4_readpages+0x1d/0x1f [<ffffffff810f8644>] __do_page_cache_readahead+0x12f/0x1bc [<ffffffff810f86f2>] ra_submit+0x21/0x25 [<ffffffff810f0cfd>] filemap_fault+0x19f/0x32c [<ffffffff81107b97>] __do_fault+0x55/0x3a2 [<ffffffff81109db0>] handle_mm_fault+0x327/0x734 [<ffffffff8151aaa9>] do_page_fault+0x292/0x2aa [<ffffffff81518205>] page_fault+0x25/0x30 [<ffffffff812a34d8>] clear_user+0x38/0x3c [<ffffffff81167e16>] padzero+0x20/0x31 [<ffffffff81168b47>] load_elf_binary+0x8bc/0x17ed [<ffffffff81130e95>] search_binary_handler+0xc2/0x259 [<ffffffff81166d64>] load_script+0x1b8/0x1cc [<ffffffff81130e95>] search_binary_handler+0xc2/0x259 [<ffffffff8113255f>] do_execve+0x1ce/0x2cf [<ffffffff81027494>] sys_execve+0x43/0x5a [<ffffffff8102918a>] stub_execve+0x6a/0xc0 -> #0 (&mm->mmap_sem){++++++}: [<ffffffff81099aa4>] __lock_acquire+0xa11/0xd0f [<ffffffff81099e7e>] lock_acquire+0xdc/0x102 [<ffffffff81107251>] might_fault+0x89/0xac [<ffffffff81139382>] fiemap_fill_next_extent+0x95/0xda [<ffffffff811bcb43>] ext4_ext_fiemap_cb+0x138/0x157 [<ffffffff811be069>] ext4_ext_walk_space+0x178/0x1f1 [<ffffffff811be21e>] ext4_fiemap+0x13c/0x159 [<ffffffff811390e6>] do_vfs_ioctl+0x348/0x4d6 [<ffffffff811392ca>] sys_ioctl+0x56/0x79 [<ffffffff81028cb2>] system_call_fastpath+0x16/0x1b other info that might help us debug this: 1 lock held by ureadahead/1855: #0: (&ei->i_data_sem){++++..}, at: [<ffffffff811be1fd>] ext4_fiemap+0x11b/0x159 stack backtrace: Pid: 1855, comm: ureadahead Not tainted 2.6.32-04115-gec044c5 #37 Call Trace: [<ffffffff81098c70>] print_circular_bug+0xa8/0xb7 [<ffffffff81099aa4>] __lock_acquire+0xa11/0xd0f [<ffffffff8102f229>] ? sched_clock+0x9/0xd [<ffffffff81099e7e>] lock_acquire+0xdc/0x102 [<ffffffff81107224>] ? might_fault+0x5c/0xac [<ffffffff81107251>] might_fault+0x89/0xac [<ffffffff81107224>] ? might_fault+0x5c/0xac [<ffffffff81124b44>] ? __kmalloc+0x13b/0x18c [<ffffffff81139382>] fiemap_fill_next_extent+0x95/0xda [<ffffffff811bcb43>] ext4_ext_fiemap_cb+0x138/0x157 [<ffffffff811bca0b>] ? ext4_ext_fiemap_cb+0x0/0x157 [<ffffffff811be069>] ext4_ext_walk_space+0x178/0x1f1 [<ffffffff811be21e>] ext4_fiemap+0x13c/0x159 [<ffffffff81107224>] ? might_fault+0x5c/0xac [<ffffffff811390e6>] do_vfs_ioctl+0x348/0x4d6 [<ffffffff8129f6d0>] ? __up_read+0x8d/0x95 [<ffffffff81517fb5>] ? retint_swapgs+0x13/0x1b [<ffffffff811392ca>] sys_ioctl+0x56/0x79 [<ffffffff81028cb2>] system_call_fastpath+0x16/0x1b Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-12-09ext4: Do not override ext2 or ext3 if built they are built as modulesTheodore Ts'o
The CONFIG_EXT4_USE_FOR_EXT23 option must not try to take over the ext2 or ext3 file systems if the those file system drivers are configured to be built as mdoules. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-12-07Merge branch 'for-next' into for-linusJiri Kosina
Conflicts: kernel/irq/chip.c
2009-12-06ext4: Fix insufficient checks in EXT4_IOC_MOVE_EXTAkira Fujita
This patch fixes three problems in the handling of the EXT4_IOC_MOVE_EXT ioctl: 1. In current EXT4_IOC_MOVE_EXT, there are read access mode checks for original and donor files, but they allow the illegal write access to donor file, since donor file is overwritten by original file data. To fix this problem, change access mode checks of original (r->r/w) and donor (r->w) files. 2. Disallow the use of donor files that have a setuid or setgid bits. 3. Call mnt_want_write() and mnt_drop_write() before and after ext4_move_extents() calling to get write access to a mount. Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-12-08ext4: Wait for proper transaction commit on fsyncJan Kara
We cannot rely on buffer dirty bits during fsync because pdflush can come before fsync is called and clear dirty bits without forcing a transaction commit. What we do is that we track which transaction has last changed the inode and which transaction last changed allocation and force it to disk on fsync. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-12-08ext4: fix incorrect block reservation on quota transfer.Dmitry Monakhov
Inside ->setattr() call both ATTR_UID and ATTR_GID may be valid This means that we may end-up with transferring all quotas. Add we have to reserve QUOTA_DEL_BLOCKS for all quotas, as we do in case of QUOTA_INIT_BLOCKS. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Reviewed-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-12-08ext4: quota macros cleanupDmitry Monakhov
Currently all quota block reservation macros contains hard-coded "2" aka MAXQUOTAS value. This is no good because in some places it is not obvious to understand what does this digit represent. Let's introduce new macro with self descriptive name. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Acked-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-12-08ext4: ext4_get_reserved_space() must return bytes instead of blocksDmitry Monakhov
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Acked-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-12-08ext4: remove blocks from inode prealloc list on failureCurt Wohlgemuth
This fixes a leak of blocks in an inode prealloc list if device failures cause ext4_mb_mark_diskspace_used() to fail. Signed-off-by: Curt Wohlgemuth <curtw@google.com> Acked-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-12-08ext4: wait for log to commit when umountingJosef Bacik
There is a potential race when a transaction is committing right when the file system is being umounting. This could reduce in a race because EXT4_SB(sb)->s_group_info could be freed in ext4_put_super before the commit code calls a callback so the mballoc code can release freed blocks in the transaction, resulting in a panic trying to access the freed s_group_info. The fix is to wait for the transaction to finish committing before we shutdown the multiblock allocator. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-12-08ext4: Avoid data / filesystem corruption when write fails to copy dataJan Kara
When ext4_write_begin fails after allocating some blocks or generic_perform_write fails to copy data to write, we truncate blocks already instantiated beyond i_size. Although these blocks were never inside i_size, we have to truncate the pagecache of these blocks so that corresponding buffers get unmapped. Otherwise subsequent __block_prepare_write (called because we are retrying the write) will find the buffers mapped, not call ->get_block, and thus the page will be backed by already freed blocks leading to filesystem and data corruption. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-12-07ext4: Use ext4 file system driver for ext2/ext3 file system mountsTheodore Ts'o
Add a new config option, CONFIG_EXT4_USE_FOR_EXT23 which if enabled, will cause ext4 to be used for either ext2 or ext3 file system mounts when ext2 or ext3 is not enabled in the configuration. This allows minimalist kernel fanatics to drop to file system drivers from their compiled kernel with out losing functionality. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-12-07ext4: Return the PTR_ERR of the correct pointer in setup_new_group_blocks()Roel Kluin
Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-12-04tree-wide: fix assorted typos all over the placeAndré Goddard Rosa
That is "success", "unknown", "through", "performance", "[re|un]mapping" , "access", "default", "reasonable", "[con]currently", "temperature" , "channel", "[un]used", "application", "example","hierarchy", "therefore" , "[over|under]flow", "contiguous", "threshold", "enough" and others. Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-12-04tree-wide: fix typos "offest" -> "offset"Uwe Kleine-König
This patch was generated by git grep -E -i -l 'offest' | xargs -r perl -p -i -e 's/offest/offset/' Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-11-24ext4: remove unused parameter wbc from __ext4_journalled_writepage()Wu Fengguang
CC: Jan Kara <jack@suse.cz> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-11-24ext4: move_extent_per_page() cleanupAkira Fujita
Integrate duplicate lines (acquire/release semaphore and invalidate extent cache in move_extent_per_page()) into mext_replace_branches(), to reduce source and object code size. Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-11-24ext4: initialize moved_len before calling ext4_move_extents()Kazuya Mio
The move_extent.moved_len is used to pass back the number of exchanged blocks count to user space. Currently the caller must clear this field; but we spend more code space checking for this requirement than simply zeroing the field ourselves, so let's just make life easier for everyone all around. Signed-off-by: Kazuya Mio <k-mio@sx.jp.nec.com> Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-11-24ext4: Fix double-free of blocks with EXT4_IOC_MOVE_EXTAkira Fujita
At the beginning of ext4_move_extent(), we call ext4_discard_preallocations() to discard inode PAs of orig and donor inodes. But in the following case, blocks can be double freed, so move ext4_discard_preallocations() to the end of ext4_move_extents(). 1. Discard inode PAs of orig and donor inodes with ext4_discard_preallocations() in ext4_move_extents(). orig : [ DATA1 ] donor: [ DATA2 ] 2. While data blocks are exchanging between orig and donor inodes, new inode PAs is created to orig by other process's block allocation. (Since there are semaphore gaps in ext4_move_extents().) And new inode PAs is used partially (2-1). 2-1 Create new inode PAs to orig inode orig : [ DATA1 | used PA1 | free PA1 ] donor: [ DATA2 ] 3. Donor inode which has old orig inode's blocks is deleted after EXT4_IOC_MOVE_EXT finished (3-1, 3-2). So the block bitmap corresponds to old orig inode's blocks are freed. 3-1 After EXT4_IOC_MOVE_EXT finished orig : [ DATA2 | free PA1 ] donor: [ DATA1 | used PA1 ] 3-2 Delete donor inode orig : [ DATA2 | free PA1 ] donor: [ FREE SPACE(DATA1) | FREE SPACE(used PA1) ] 4. The double-free of blocks is occurred, when close() is called to orig inode. Because ext4_discard_preallocations() for orig inode frees used PA1 and free PA1, though used PA1 is already freed in 3. 4-1 Double-free of blocks is occurred orig : [ DATA2 | FREE SPACE(free PA1) ] donor: [ FREE SPACE(DATA1) | DOUBLE FREE(used PA1) ] Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-11-22ext4: use ext4_data_block_valid() in ext4_free_blocks()Theodore Ts'o
The block validity framework does a more comprehensive set of checks, and it saves object code space to use the ext4_data_block_valid() than the limited open-coded version that had been in ext4_free_blocks(). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-11-22ext4: add check for wraparound in ext4_data_block_valid()Theodore Ts'o
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-11-23ext4: call ext4_forget() from ext4_free_blocks()Theodore Ts'o
Add the facility for ext4_forget() to be called from ext4_free_blocks(). This simplifies the code in a large number of places, and centralizes most of the work of calling ext4_forget() into a single place. Also fix a bug in the extents migration code; it wasn't calling ext4_forget() when releasing the indirect blocks during the conversion. As a result, if the system cashed during or shortly after the extents migration, and the released indirect blocks get reused as data blocks, the journal replay would corrupt the data blocks. With this new patch, fixing this bug was as simple as adding the EXT4_FREE_BLOCKS_FORGET flags to the call to ext4_free_blocks(). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
2009-11-22ext4: fold ext4_free_blocks() and ext4_mb_free_blocks()Theodore Ts'o
ext4_mb_free_blocks() is only called by ext4_free_blocks(), and the latter function doesn't really do much. So merge the two functions together, such that ext4_free_blocks() is now found in fs/ext4/mballoc.c. This saves about 200 bytes of compiled text space. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-11-22ext4: fold ext4_journal_forget() into ext4_forget()Theodore Ts'o
Convert the last two callers of ext4_journal_forget() to use ext4_forget() instead, and then fold ext4_journal_forget() into ext4_forget(). This reduces are code complexity and shortens our call stack. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-11-24ext4: fold ext4_journal_revoke() into ext4_forget()Theodore Ts'o
The only caller of ext4_journal_revoke() is ext4_forget(), so we can fold ext4_journal_revoke() into ext4_forget() to simplify the code and shorten the call stack. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-11-22ext4: move ext4_forget() to ext4_jbd2.cTheodore Ts'o
The ext4_forget() function better belongs in ext4_jbd2.c. This will allow us to do some cleanup of the ext4_journal_revoke() and ext4_journal_forget() functions, as well as giving us better error reporting since we can report the caller of ext4_forget() when things go wrong. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>