aboutsummaryrefslogtreecommitdiff
path: root/fs/ext4/balloc.c
AgeCommit message (Collapse)Author
2008-01-29ext4: Add multi block allocator for ext4Alex Tomas
Signed-off-by: Alex Tomas <alex@clusterfs.com> Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-01-28ext4: fix up EXT4FS_DEBUG buildsEric Sandeen
Builds with EXT4FS_DEBUG defined (to enable ext4_debug()) fail without these changes. Clean up some format warnings too. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com>
2008-01-28ext4: Convert truncate_mutex to read write semaphore.Aneesh Kumar K.V
We are currently taking the truncate_mutex for every read. This would have performance impact on large CPU configuration. Convert the lock to read write semaphore and take read lock when we are trying to read the file. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
2008-01-28ext4: add block bitmap validationAneesh Kumar K.V
When a new block bitmap is read from disk in read_block_bitmap() there are a few bits that should ALWAYS be set. In particular, the blocks given corresponding to block bitmap, inode bitmap and inode tables. Validate the block bitmap against these blocks. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
2008-01-28ext4: Return after ext4_error in case of failuresAneesh Kumar K.V
This fix some instances where we were continuing after calling ext4_error. ext4_error call panic only if errors=panic mount option is set. So we need to make sure we return correctly after ext4_error call Reported by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
2008-01-28ext4: add ext4_group_t, and change all group variables to this type.Avantika Mathur
In many places variables for block group are of type int, which limits the maximum number of block groups to 2^31. Each block group can have up to 2^15 blocks, with a 4K block size, and the max filesystem size is limited to 2^31 * (2^15 * 2^12) = 2^58 -- or 256 PB This patch introduces a new type ext4_group_t, of type unsigned long, to represent block group numbers in ext4. All occurrences of block group variables are converted to type ext4_group_t. Signed-off-by: Avantika Mathur <mathur@us.ibm.com>
2007-11-13Revert "ext2/ext3/ext4: add block bitmap validation"Linus Torvalds
This reverts commit 7c9e69faa28027913ee059c285a5ea8382e24b5d, fixing up conflicts in fs/ext4/balloc.c manually. The cost of doing the bitmap validation on each lookup - even when the bitmap is cached - is absolutely prohibitive. We could, and probably should, do it only when adding the bitmap to the buffer cache. However, right now we are better off just reverting it. Peter Zijlstra measured the cost of this extra validation as a 85% decrease in cached iozone, and while I had a patch that took it down to just 17% by not being _quite_ so stupid in the validation, it was still a big slowdown that could have been avoided by just doing it right. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Cc: Andreas Dilger <adilger@clusterfs.com> Cc: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17ext4: Convert bg_inode_bitmap and bg_inode_tableAneesh Kumar K.V
Convert bg_inode_bitmap and bg_inode_table to bg_inode_bitmap_lo and bg_inode_table_lo. This helps in finding BUGs due to direct partial access of these split 64 bit values Also fix one direct partial access Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
2007-10-17Ext4: Uninitialized Block GroupsAndreas Dilger
In pass1 of e2fsck, every inode table in the fileystem is scanned and checked, regardless of whether it is in use. This is this the most time consuming part of the filesystem check. The unintialized block group feature can greatly reduce e2fsck time by eliminating checking of uninitialized inodes. With this feature, there is a a high water mark of used inodes for each block group. Block and inode bitmaps can be uninitialized on disk via a flag in the group descriptor to avoid reading or scanning them at e2fsck time. A checksum of each group descriptor is used to ensure that corruption in the group descriptor's bit flags does not cause incorrect operation. The feature is enabled through a mkfs option mke2fs /dev/ -O uninit_groups A patch adding support for uninitialized block groups to e2fsprogs tools has been posted to the linux-ext4 mailing list. The patches have been stress tested with fsstress and fsx. In performance tests testing e2fsck time, we have seen that e2fsck time on ext3 grows linearly with the total number of inodes in the filesytem. In ext4 with the uninitialized block groups feature, the e2fsck time is constant, based solely on the number of used inodes rather than the total inode count. Since typical ext4 filesystems only use 1-10% of their inodes, this feature can greatly reduce e2fsck time for users. With performance improvement of 2-20 times, depending on how full the filesystem is. The attached graph shows the major improvements in e2fsck times in filesystems with a large total inode count, but few inodes in use. In each group descriptor if we have EXT4_BG_INODE_UNINIT set in bg_flags: Inode table is not initialized/used in this group. So we can skip the consistency check during fsck. EXT4_BG_BLOCK_UNINIT set in bg_flags: No block in the group is used. So we can skip the block bitmap verification for this group. We also add two new fields to group descriptor as a part of uninitialized group patch. __le16 bg_itable_unused; /* Unused inodes count */ __le16 bg_checksum; /* crc16(sb_uuid+group+desc) */ bg_itable_unused: If we have EXT4_BG_INODE_UNINIT not set in bg_flags then bg_itable_unused will give the offset within the inode table till the inodes are used. This can be used by fsck to skip list of inodes that are marked unused. bg_checksum: Now that we depend on bg_flags and bg_itable_unused to determine the block and inode usage, we need to make sure group descriptor is not corrupt. We add checksum to group descriptor to detect corruption. If the descriptor is found to be corrupt, we mark all the blocks and inodes in the group used. Signed-off-by: Avantika Mathur <mathur@us.ibm.com> Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
2007-10-17ext2/ext3/ext4: add block bitmap validationAneesh Kumar K.V
When a new block bitmap is read from disk in read_block_bitmap() there are a few bits that should ALWAYS be set. In particular, the blocks given by ext4_blk_bitmap, ext4_inode_bitmap and ext4_inode_table. Validate the block bitmap against these blocks. [akpm@linux-foundation.org: cleanups] Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Acked-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17lib: percpu_counter_subPeter Zijlstra
Hugh spotted that some code does: percpu_counter_add(&counter, -unsignedlong) which, when the amount argument is of type s32, sort-of works thanks to two's-complement. However when we'd change the type to s64 this breaks on 32bit machines, because the promotion rules zero extend the unsigned number. Provide percpu_counter_sub() to hide the s64 cast. That is: percpu_counter_sub(&counter, foo) is equal to: percpu_counter_add(&counter, -(s64)foo); Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17lib: percpu_counter_addPeter Zijlstra
s/percpu_counter_mod/percpu_counter_add/ Because its a better name, _mod implies modulo. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-18jbd2: Fix CONFIG_JBD_DEBUG ifdef to be CONFIG_JBD2_DEBUGJose R. Santos
When the JBD code was forked to create the new JBD2 code base, the references to CONFIG_JBD_DEBUG where never changed to CONFIG_JBD2_DEBUG. This patch fixes that. Signed-off-by: Jose R. Santos <jrs@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2007-07-16mistaken ext4_inode_bitmap for ext4_block_bitmapToshiyuki Okajima
In ext4_new_blocks(), one of two ext4_block_bitmap() calls should be ext4_inode_bitmap() call. It is not harmful in normal processing, but it should be fixed. Signed-off-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-31EXT4: Fix whitespaceDave Kleikamp
Replace a lot of spaces with tabs Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2007-02-20[PATCH] ext[234]: update documentationAneesh Kumar K.V
Signed-off-by: "Aneesh Kumar K.V" <aneesh.kumar@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2006-12-07[PATCH] ext4 balloc: fix _with_rsv freezeHugh Dickins
Port fix to the off-by-one in find_next_usable_block's memscan from ext2 to ext4; but it didn't cause a serious problem for ext4 because the additional ext4_test_allocatable check rescued it from the error. [akpm@osdl.org: build fix] Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] ext4 balloc: use io_error labelHugh Dickins
ext4_new_blocks has a nice io_error label for setting -EIO, so goto that in the one place that doesn't already use it. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] ext4 balloc: say rb_entry not list_entryHugh Dickins
The reservations tree is an rb_tree not a list, so it's less confusing to use rb_entry() than list_entry() - though they're both just container_of(). Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] ext4 balloc: fix off-by-one against rsv_endHugh Dickins
rsv_end is the last block within the reservation, so alloc_new_reservation should accept start_block == rsv_end as success. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] ext4 balloc: fix off-by-one against grp_goalHugh Dickins
grp_goal 0 is a genuine goal (unlike -1), so ext4_try_to_allocate_with_rsv should treat it as such. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] ext4 balloc: reset windowsz when fullHugh Dickins
ext4_new_blocks should reset the reservation window size to 0 when squeezing the last blocks out of an almost full filesystem, so the retry doesn't skip any groups with less than half that free, reporting ENOSPC too soon. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] ext4: fix reservation extensionMingming Cao
Hugh Dickins wrote: > Not found anything relevant, but I keep noticing these lines > in ext2_try_to_allocate_with_rsv(), ext3 and ext4 similar: > > } else if (grp_goal > 0 && > (my_rsv->rsv_end - grp_goal + 1) < *count) > try_to_extend_reservation(my_rsv, sb, > *count-my_rsv->rsv_end + grp_goal - 1); > > They're wrong, a no-op in most groups, aren't they? rsv_end is an > absolute block number, whereas grp_goal is group-relative, so the > calculation ought to bring in group_first_block? Or I'm confused. > Signed-off-by: Mingming Cao <cmm@us.ibm.com> Cc: "linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-11[PATCH] ext4 64 bit divide fixAndrew Morton
With CONFIG_LBD=n, sector_div() expands to a plain old divide. But ext4 is _not_ passing in a sector_t as the first argument, so... fs/built-in.o: In function `ext4_get_group_no_and_offset': fs/ext4/balloc.c:39: undefined reference to `__umoddi3' fs/ext4/balloc.c:41: undefined reference to `__udivdi3' fs/built-in.o: In function `find_group_orlov': fs/ext4/ialloc.c:278: undefined reference to `__udivdi3' fs/built-in.o: In function `ext4_fill_super': fs/ext4/super.c:1488: undefined reference to `__udivdi3' fs/ext4/super.c:1488: undefined reference to `__umoddi3' fs/ext4/super.c:1594: undefined reference to `__udivdi3' fs/ext4/super.c:1601: undefined reference to `__umoddi3' Fix that up by calling do_div() directly. Also cast the arg to u64. do_div() is only defined on u64, and ext4_fsblk_t is supposed to be opaque. Note especially the changes to find_group_orlov(). It was attempting to do do_div(int, unsigned long long); which is royally screwed up. Switched it to plain old divide. Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-11[PATCH] ext4 uninline ext4_get_group_no_and_offset()Andrew Morton
Way too big to inline. Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-11[PATCH] ext4: move block number hi bitsAlexandre Ratchov
move '_hi' bits of block numbers in the larger part of the block group descriptor structure Signed-off-by: Alexandre Ratchov <alexandre.ratchov@bull.net> Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-11[PATCH] ext4: allow larger descriptor sizeAlexandre Ratchov
make block group descriptor larger. Signed-off-by: Alexandre Ratchov <alexandre.ratchov@bull.net> Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-11[PATCH] ext4: blk_type from sector_t to unsigned long longMingming Cao
Change ext4 in-kernel block type (ext4_fsblk_t) from sector_t to unsigned long long. Remove ext4 block type string micro E3FSBLK, replaced with "%llu" [akpm@osdl.org: build fix] Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-11[PATCH] ext4: 64bit metadataLaurent Vivier
In-kernel super block changes to support >32 bit free blocks numbers. Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: Alexandre Ratchov <alexandre.ratchov@bull.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-11[PATCH] ext4: switch fsblk to sector_tMingming Cao
Redefine ext3 in-kernel filesystem block type (ext3_fsblk_t) from unsigned long to sector_t, to allow kernel to handle >32 bit ext3 blocks. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-11[PATCH] jbd2: enable building of jbd2 and have ext4 use it rather than jbdMingming Cao
Reworked from a patch by Mingming Cao and Randy Dunlap Signed-off-By: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-11[PATCH] ext4: rename ext4 symbols to avoid duplication of ext3 symbolsMingming Cao
Mingming Cao originally did this work, and Shaggy reproduced it using some scripts from her. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-11[PATCH] ext4: initial copy of files from ext3Dave Kleikamp
Start of the ext4 patch series. See Documentation/filesystems/ext4.txt for details. This is a simple copy of the files in fs/ext3 to fs/ext4 and /usr/incude/linux/ext3* to /usr/include/ex4* Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>