aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2008-10-16sysfs: Make dir and name args to sysfs_notify() constTrent Piepho
Because they can be, and because code like this produces a warning if they're not: struct device_attribute dev_attr; sysfs_notify(&kobj, NULL, dev_attr.attr.name); Signed-off-by: Trent Piepho <tpiepho@freescale.com> CC: Neil Brown <neilb@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-10-16sysfs: use ilookup5() instead of ilookup5_nowait()Tejun Heo
As inode creation is protected by sysfs_mutex, ilookup5_nowait() always either fails to find at all or finds one which is fully initialized, so using ilookup5_nowait() or ilookup5() doesn't make any difference. Switch to ilookup5() as it's planned to be removed. This change also makes lookup return value handling a bit simpler. This change was suggested by Al Viro. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Al Viro <viro@hera.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-10-16sysfs: fix deadlockNick Piggin
On Thu, Sep 11, 2008 at 10:27:10AM +0200, Ingo Molnar wrote: > and it's working fine on most boxes. One testbox found this new locking > scenario: > > PM: Adding info for No Bus:vcsa7 > EDAC DEBUG: MC0: i82860_check() > > ======================================================= > [ INFO: possible circular locking dependency detected ] > 2.6.27-rc6-tip #1 > ------------------------------------------------------- > X/4873 is trying to acquire lock: > (&bb->mutex){--..}, at: [<c020ba20>] mmap+0x40/0xa0 > > but task is already holding lock: > (&mm->mmap_sem){----}, at: [<c0125a1e>] sys_mmap2+0x8e/0xc0 > > which lock already depends on the new lock. > > > the existing dependency chain (in reverse order) is: > > -> #1 (&mm->mmap_sem){----}: > [<c017dc96>] validate_chain+0xa96/0xf50 > [<c017ef2b>] __lock_acquire+0x2cb/0x5b0 > [<c017f299>] lock_acquire+0x89/0xc0 > [<c01aa8fb>] might_fault+0x6b/0x90 > [<c040b618>] copy_to_user+0x38/0x60 > [<c020bcfb>] read+0xfb/0x170 > [<c01c09a5>] vfs_read+0x95/0x110 > [<c01c1443>] sys_pread64+0x63/0x80 > [<c012146f>] sysenter_do_call+0x12/0x43 > [<ffffffff>] 0xffffffff > > -> #0 (&bb->mutex){--..}: > [<c017d8b7>] validate_chain+0x6b7/0xf50 > [<c017ef2b>] __lock_acquire+0x2cb/0x5b0 > [<c017f299>] lock_acquire+0x89/0xc0 > [<c0d6f2ab>] __mutex_lock_common+0xab/0x3c0 > [<c0d6f698>] mutex_lock_nested+0x38/0x50 > [<c020ba20>] mmap+0x40/0xa0 > [<c01b111e>] mmap_region+0x14e/0x450 > [<c01b170f>] do_mmap_pgoff+0x2ef/0x310 > [<c0125a3d>] sys_mmap2+0xad/0xc0 > [<c012146f>] sysenter_do_call+0x12/0x43 > [<ffffffff>] 0xffffffff > > other info that might help us debug this: > > 1 lock held by X/4873: > #0: (&mm->mmap_sem){----}, at: [<c0125a1e>] sys_mmap2+0x8e/0xc0 > > stack backtrace: > Pid: 4873, comm: X Not tainted 2.6.27-rc6-tip #1 > [<c017cd09>] print_circular_bug_tail+0x79/0xc0 > [<c017d8b7>] validate_chain+0x6b7/0xf50 > [<c017a5b5>] ? trace_hardirqs_off_caller+0x15/0xb0 > [<c017ef2b>] __lock_acquire+0x2cb/0x5b0 > [<c017f299>] lock_acquire+0x89/0xc0 > [<c020ba20>] ? mmap+0x40/0xa0 > [<c0d6f2ab>] __mutex_lock_common+0xab/0x3c0 > [<c020ba20>] ? mmap+0x40/0xa0 > [<c0d6f698>] mutex_lock_nested+0x38/0x50 > [<c020ba20>] ? mmap+0x40/0xa0 > [<c020ba20>] mmap+0x40/0xa0 > [<c01b111e>] mmap_region+0x14e/0x450 > [<c01afb88>] ? arch_get_unmapped_area_topdown+0xf8/0x160 > [<c01b170f>] do_mmap_pgoff+0x2ef/0x310 > [<c0125a3d>] sys_mmap2+0xad/0xc0 > [<c012146f>] sysenter_do_call+0x12/0x43 > [<c0120000>] ? __switch_to+0x130/0x220 > ======================= > evbug.c: Event. Dev: input3, Type: 20, Code: 0, Value: 500 > warning: `sudo' uses deprecated v2 capabilities in a way that may be insecure. > > i've attached the config. > > at first sight it looks like a genuine bug in fs/sysfs/bin.c? Yes, it is a real bug by the looks. bin.c takes bb->mutex under mmap_sem when it is mmapped, and then does its copy_*_user under bb->mutex too. Here is a basic fix for the sysfs lor. From: Nick Piggin <npiggin@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-10-16sysfs: Support sysfs_notify from atomic context with new sysfs_notify_direntNeil Brown
Support sysfs_notify from atomic context with new sysfs_notify_dirent sysfs_notify currently takes sysfs_mutex. This means that it cannot be called in atomic context. sysfs_mutex is sometimes held over a malloc (sysfs_rename_dir) so it can block on low memory. In md I want to be able to notify on a sysfs attribute from atomic context, and I don't want to block on low memory because I could be in the writeout path for freeing memory. So: - export the "sysfs_dirent" structure along with sysfs_get, sysfs_put and sysfs_get_dirent so I can get the sysfs_dirent that I want to notify on and hold it in an md structure. - split sysfs_notify_dirent out of sysfs_notify so the sysfs_dirent can be notified on with no blocking (just a spinlock). Signed-off-by: Neil Brown <neilb@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-10-16device create: misc: convert device_create_drvdata to device_createGreg Kroah-Hartman
Now that device_create() has been audited, rename things back to the original call to be sane. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-10-16sysfs: crash debuggingAndrew Morton
Print the name of the last-accessed sysfs file when we oops, to help track down oopses which occur in sysfs store/read handlers. Because these oopses tend to not leave any trace of the offending code in the stack traces. Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-10-15xfs: fix remount rw with unrecognized optionsChristoph Hellwig
When we skip unrecognized options in xfs_fs_remount we should just break out of the switch and not return because otherwise we may skip clearing the xfs-internal read-only flag. This will only show up on some operations like touch because most read-only checks are done by the VFS which thinks this filesystem is r/w. Eventually we should replace the XFS read-only flag with a helper that always checks the VFS flag to make sure they can never get out of sync. Bug reported and fix verified by Marcel Beister on #xfs. Bug fix verified by updated xfstests/189. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Timothy Shimmin <tes@sgi.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-14ocfs2: fix build errorMark Fasheh
I merged the latest ocfs2_read_blocks() changes in xattr.c wrong. This makes Ocfs2 compile again. Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-14Merge branch 'upstream-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: (56 commits) ocfs2: Make cached block reads the common case. ocfs2: Kill the last naked wait_on_buffer() for cached reads. ocfs2: Move ocfs2_bread() into dir.c ocfs2: Simplify ocfs2_read_block() ocfs2: Require an inode for ocfs2_read_block(s)(). ocfs2: Separate out sync reads from ocfs2_read_blocks() ocfs2: Refactor xattr list and remove ocfs2_xattr_handler(). ocfs2: Calculate EA hash only by its suffix. ocfs2: Move trusted and user attribute support into xattr.c ocfs2: Uninline ocfs2_xattr_name_hash() ocfs2: Don't check for NULL before brelse() ocfs2: use smaller counters in ocfs2_remove_xattr_clusters_from_cache ocfs2: Documentation update for user_xattr / nouser_xattr mount options ocfs2: make la_debug_mutex static ocfs2: Remove pointless !! ocfs2: Add empty bucket support in xattr. ocfs2/xattr.c: Fix a bug when inserting xattr. ocfs2: Add xattr mount option in ocfs2_show_options() ocfs2: Switch over to JBD2. ocfs2: Add the 'inode64' mount option. ...
2008-10-14Merge branch 'for-2.6.28' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
* 'for-2.6.28' of git://linux-nfs.org/~bfields/linux: (59 commits) svcrdma: Fix IRD/ORD polarity svcrdma: Update svc_rdma_send_error to use DMA LKEY svcrdma: Modify the RPC reply path to use FRMR when available svcrdma: Modify the RPC recv path to use FRMR when available svcrdma: Add support to svc_rdma_send to handle chained WR svcrdma: Modify post recv path to use local dma key svcrdma: Add a service to register a Fast Reg MR with the device svcrdma: Query device for Fast Reg support during connection setup svcrdma: Add FRMR get/put services NLM: Remove unused argument from svc_addsock() function NLM: Remove "proto" argument from lockd_up() NLM: Always start both UDP and TCP listeners lockd: Remove unused fields in the nlm_reboot structure lockd: Add helper to sanity check incoming NOTIFY requests lockd: change nlmclnt_grant() to take a "struct sockaddr *" lockd: Adjust nlmsvc_lookup_host() to accomodate AF_INET6 addresses lockd: Adjust nlmclnt_lookup_host() signature to accomodate non-AF_INET lockd: Support non-AF_INET addresses in nlm_lookup_host() NLM: Convert nlm_lookup_host() to use a single argument svcrdma: Add Fast Reg MR Data Types ...
2008-10-14ocfs2: Make cached block reads the common case.Joel Becker
ocfs2_read_blocks() currently requires the CACHED flag for cached I/O. However, that's the common case. Let's flip it around and provide an IGNORE_CACHE flag for the special users. This has the added benefit of cleaning up the code some (ignore_cache takes on its special meaning earlier in the loop). Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-14ocfs2: Kill the last naked wait_on_buffer() for cached reads.Joel Becker
ocfs2's cached buffer I/O goes through ocfs2_read_block(s)(). dir.c had a naked wait_on_buffer() to wait for some readahead, but it should use ocfs2_read_block() instead. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-14ocfs2: Move ocfs2_bread() into dir.cJoel Becker
dir.c is the only place using ocfs2_bread(), so let's make it static to that file. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-14ocfs2: Simplify ocfs2_read_block()Joel Becker
More than 30 callers of ocfs2_read_block() pass exactly OCFS2_BH_CACHED. Only six pass a different flag set. Rather than have every caller care, let's make ocfs2_read_block() take no flags and always do a cached read. The remaining six places can call ocfs2_read_blocks() directly. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-14ocfs2: Require an inode for ocfs2_read_block(s)().Joel Becker
Now that synchronous readers are using ocfs2_read_blocks_sync(), all callers of ocfs2_read_blocks() are passing an inode. Use it unconditionally. Since it's there, we don't need to pass the ocfs2_super either. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-14ocfs2: Separate out sync reads from ocfs2_read_blocks()Joel Becker
The ocfs2_read_blocks() function currently handles sync reads, cached, reads, and sometimes cached reads. We're going to add some functionality to it, so first we should simplify it. The uncached, synchronous reads are much easer to handle as a separate function, so we instroduce ocfs2_read_blocks_sync(). Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Refactor xattr list and remove ocfs2_xattr_handler().Tao Ma
According to Christoph Hellwig's advice, we really don't need a ->list to handle one xattr's list. Just a map from index to xattr prefix is enough. And I also refactor the old list method with the reference from fs/xfs/linux-2.6/xfs_xattr.c and the xattr list method in btrfs. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Calculate EA hash only by its suffix.Tao Ma
According to Christoph Hellwig's advice, the hash value of EA is only calculated by its suffix. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Move trusted and user attribute support into xattr.cMark Fasheh
Per Christoph Hellwig's suggestion - don't split these up. It's not like we gained much by having the two tiny files around. Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Uninline ocfs2_xattr_name_hash()Mark Fasheh
This is too big to be inlined. Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Don't check for NULL before brelse()Mark Fasheh
This is pointless as brelse() already does the check. Signed-off-by: Mark Fasheh
2008-10-13ocfs2: use smaller counters in ocfs2_remove_xattr_clusters_from_cacheMark Fasheh
i and b_len don't really need to be u64's. Xattr extent lengths should be limited by the VFS, and then the size of our on-disk length field. Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: make la_debug_mutex staticMark Fasheh
It can also be moved into ocfs2_la_debug_read(). Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Remove pointless !!Mark Fasheh
ocfs2_stack_supports_plocks() doesn't need this to properly return a zero or one value. Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Add empty bucket support in xattr.Tao Ma
As Mark mentioned, it may be time-consuming when we remove the empty xattr bucket, so this patch try to let empty bucket exist in xattr operation. The modification includes: 1. Remove the functin of bucket and extent record deletion during xattr delete. 2. In xattr set: 1) Don't clean the last entry so that if the bucket is empty, the hash value of the bucket is the hash value of the entry which is deleted last. 2) During insert, if we meet with an empty bucket, just use the 1st entry. 3. In binary search of xattr bucket, use the bucket hash value(which stored in the 1st xattr entry) to find the right place. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2/xattr.c: Fix a bug when inserting xattr.Tao Ma
During the process of xatt insertion, we use binary search to find the right place and "low" is set to it. But when there is one xattr which has the same name hash as the inserted one, low is the wrong value. So set it to the right position. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Add xattr mount option in ocfs2_show_options()Sunil Mushran
Patch adds check for [no]user_xattr in ocfs2_show_options() that completes the list of all mount options. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Switch over to JBD2.Joel Becker
ocfs2 wants JBD2 for many reasons, not the least of which is that JBD is limiting our maximum filesystem size. It's a pretty trivial change. Most functions are just renamed. The only functional change is moving to Jan's inode-based ordered data mode. It's better, too. Because JBD2 reads and writes JBD journals, this is compatible with any existing filesystem. It can even interact with JBD-based ocfs2 as long as the journal is formated for JBD. We provide a compatibility option so that paranoid people can still use JBD for the time being. This will go away shortly. [ Moved call of ocfs2_begin_ordered_truncate() from ocfs2_delete_inode() to ocfs2_truncate_for_delete(). --Mark ] Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Add the 'inode64' mount option.Joel Becker
Now that ocfs2 limits inode numbers to 32bits, add a mount option to disable the limit. This parallels XFS. 64bit systems can handle the larger inode numbers. [ Added description of inode64 mount option in ocfs2.txt. --Mark ] Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Limit inode allocation to 32bits.Joel Becker
ocfs2 inode numbers are block numbers. For any filesystem with less than 2^32 blocks, this is not a problem. However, when ocfs2 starts using JDB2, it will be able to support filesystems with more than 2^32 blocks. This would result in inode numbers higher than 2^32. The problem is that stat(2) can't handle those numbers on 32bit machines. The simple solution is to have ocfs2 allocate all inodes below that boundary. The suballoc code is changed to honor an optional block limit. Only the inode suballocator sets that limit - all other allocations stay unlimited. The biggest trick is to grow the inode suballocator beneath that limit. There's no point in allocating block groups that are above the limit, then rejecting their elements later on. We want to prevent the inode allocator from ever having block groups above the limit. This involves a little gyration with the local alloc code. If the local alloc window is above the limit, it signals the caller to try the global bitmap but does not disable the local alloc file (which can be used for other allocations). [ Minor cleanup - removed an ML_NOTICE comment. --Mark ] Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Resolve deadlock in ocfs2_xattr_free_block.Tao Ma
In ocfs2_xattr_free_block, we take a cluster lock on xb_alloc_inode while we have a transaction open. This will deadlock the downconvert thread, so fix it. We can clean up how xattr blocks are removed while here - this patch also moves the mechanism of releasing xattr block (including both value, xattr tree and xattr block) into this function. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: bug-fix for journal extend in xattr.Tao Ma
In ocfs2_extend_trans, when we can't extend the current transaction, it will commit current transaction and restart a new one. So if the previous credits we have allocated aren't used(the block isn't dirtied before our extend), we will not have enough credits for any future operation(it will cause jbd complain and bug out). So check this and re-extend it. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Change ocfs2_get_*_extent_tree() to ocfs2_init_*_extent_tree()Joel Becker
The original get/put_extent_tree() functions held a reference on et_root_bh. However, every single caller already has a safe reference, making the get/put cycle irrelevant. We change ocfs2_get_*_extent_tree() to ocfs2_init_*_extent_tree(). It no longer gets a reference on et_root_bh. ocfs2_put_extent_tree() is removed. Callers now have a simpler init+use pattern. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Comment struct ocfs2_extent_tree_operations.Joel Becker
struct ocfs2_extent_tree_operations provides methods for the different on-disk btrees in ocfs2. Describing what those methods do is probably a good idea. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Make ocfs2_extent_tree the first-class representation of a tree.Joel Becker
We now have three different kinds of extent trees in ocfs2: inode data (dinode), extended attributes (xattr_tree), and extended attribute values (xattr_value). There is a nice abstraction for them, ocfs2_extent_tree, but it is hidden in alloc.c. All the calling functions have to pick amongst a varied API and pass in type bits and often extraneous pointers. A better way is to make ocfs2_extent_tree a first-class object. Everyone converts their object to an ocfs2_extent_tree() via the ocfs2_get_*_extent_tree() calls, then uses the ocfs2_extent_tree for all tree calls to alloc.c. This simplifies a lot of callers, making for readability. It also provides an easy way to add additional extent tree types, as they only need to be defined in alloc.c with a ocfs2_get_<new>_extent_tree() function. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Add an insertion check to ocfs2_extent_tree_operations.Joel Becker
A couple places check an extent_tree for a valid inode. We move that out to add an eo_insert_check() operation. It can be called from ocfs2_insert_extent() and elsewhere. We also have the wrapper calls ocfs2_et_insert_check() and ocfs2_et_sanity_check() ignore NULL ops. That way we don't have to provide useless operations for xattr types. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Create specific get_extent_tree functions.Joel Becker
A caller knows what kind of extent tree they have. There's no reason they have to call ocfs2_get_extent_tree() with a NULL when they could just as easily call a specific function to their type of extent tree. Introduce ocfs2_dinode_get_extent_tree(), ocfs2_xattr_tree_get_extent_tree(), and ocfs2_xattr_value_get_extent_tree(). They only take the necessary arguments, calling into the underlying __ocfs2_get_extent_tree() to do the real work. __ocfs2_get_extent_tree() is the old ocfs2_get_extent_tree(), but without needing any switch-by-type logic. ocfs2_get_extent_tree() is now a wrapper around the specific calls. It exists because a couple alloc.c functions can take et_type. This will go later. Another benefit is that ocfs2_xattr_value_get_extent_tree() can take a struct ocfs2_xattr_value_root* instead of void*. This gives us typechecking where we didn't have it before. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Determine an extent tree's max_leaf_clusters in an et_op.Joel Becker
Provide an optional extent_tree_operation to specify the max_leaf_clusters of an ocfs2_extent_tree. If not provided, the value is 0 (unlimited). Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Use struct ocfs2_extent_tree in ocfs2_num_free_extents().Joel Becker
ocfs2_num_free_extents() re-implements the logic of ocfs2_get_extent_tree(). Now that ocfs2_get_extent_tree() does not allocate, let's use it in ocfs2_num_free_extents() to simplify the code. The inode validation code in ocfs2_num_free_extents() is not needed. All callers are passing in pre-validated inodes. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Provide the get_root_el() method to ocfs2_extent_tree_operations.Joel Becker
The root_el of an ocfs2_extent_tree needs to be calculated from et->et_object. Make it an operation on et->et_ops. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Make 'private' into 'object' on ocfs2_extent_tree.Joel Becker
The 'private' pointer was a way to store off xattr values, which don't live at a set place in the bh. But the concept of "the object containing the extent tree" is much more generic. For an inode it's the struct ocfs2_dinode, for an xattr value its the value. Let's save off the 'object' at all times. If NULL is passed to ocfs2_get_extent_tree(), 'object' is set to bh->b_data; Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Make ocfs2_extent_tree get/put instead of alloc.Joel Becker
Rather than allocating a struct ocfs2_extent_tree, just put it on the stack. Fill it with ocfs2_get_extent_tree() and drop it with ocfs2_put_extent_tree(). Now the callers don't have to ENOMEM, yet still safely ref the root_bh. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Prefix the ocfs2_extent_tree structure.Joel Becker
The members of the ocfs2_extent_tree structure gain a prefix of 'et_'. All users are updated. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Prefix the extent tree operations structure.Joel Becker
The ocfs2_extent_tree_operations structure gains a field prefix on its members. The ->eo_sanity_check() operation gains a wrapper function for completeness. All of the extent tree operation wrappers gain a consistent name (ocfs2_et_*()). Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: fix printk format warningsMark Fasheh
This patch fixes the following build warnings: fs/ocfs2/xattr.c: In function 'ocfs2_half_xattr_bucket': fs/ocfs2/xattr.c:3282: warning: format '%d' expects type 'int', but argument 7 has type 'long int' fs/ocfs2/xattr.c:3282: warning: format '%d' expects type 'int', but argument 8 has type 'long int' fs/ocfs2/xattr.c:3282: warning: format '%d' expects type 'int', but argument 7 has type 'long int' fs/ocfs2/xattr.c:3282: warning: format '%d' expects type 'int', but argument 8 has type 'long int' fs/ocfs2/xattr.c:3282: warning: format '%d' expects type 'int', but argument 7 has type 'long int' fs/ocfs2/xattr.c:3282: warning: format '%d' expects type 'int', but argument 8 has type 'long int' fs/ocfs2/xattr.c: In function 'ocfs2_xattr_set_entry_in_bucket': fs/ocfs2/xattr.c:4092: warning: format '%d' expects type 'int', but argument 6 has type 'size_t' fs/ocfs2/xattr.c:4092: warning: format '%d' expects type 'int', but argument 6 has type 'size_t' fs/ocfs2/xattr.c:4092: warning: format '%d' expects type 'int', but argument 6 has type 'size_t' Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Add incompatible flag for extended attributeTiger Yang
This patch adds the s_incompat flag for extended attribute support. This helps us ensure that older versions of Ocfs2 or ocfs2-tools will not be able to mount a volume with xattr support. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Delete all xattr buckets during inode removalTao Ma
In inode removal, we need to iterate all the buckets, remove any externally-stored EA values and delete the xattr buckets. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Enable xattr set in index btreeTao Ma
Where the previous patches added the ability of list/get xattr in buckets for ocfs2, this patch enables ocfs2 to store large numbers of EAs. The original design doc is written by Mark Fasheh, and it can be found in http://oss.oracle.com/osswiki/OCFS2/DesignDocs/IndexedEATrees. I only had to make small modifications to it. First, because the bucket size is 4K, a new field named xh_free_start is added in ocfs2_xattr_header to indicate the next valid name/value offset in a bucket. It is used when we store new EA name/value. With this field, we can find the place more quickly and what's more, we don't need to sort the name/value every time to let the last entry indicate the next unused space. This makes the insert operation more efficient for blocksizes smaller than 4k. Because of the new xh_free_start, another field named as xh_name_value_len is also added in ocfs2_xattr_header. It records the total length of all the name/values in the bucket. We need this so that we can check it and defragment the bucket if there is not enough contiguous free space. An xattr insertion looks like this: 1. xattr_index_block_find: find the right bucket by the name_hash, say bucketA. 2. check whether there is enough space in bucketA. If yes, insert it directly and modify xh_free_start and xh_name_value_len accordingly. If not, check xh_name_value_len to see whether we can store this by defragment the bucket. If yes, defragment it and go on insertion. 3. If defragement doesn't work, check whether there is new empty bucket in the clusters within this extent record. If yes, init the new bucket and move all the buckets after bucketA one by one to the next bucket. Move half of the entries in bucketA to the next bucket and go on insertion. 4. If there is no new bucket, grow the extent tree. As for xattr deletion, we will delete an xattr bucket when all it's xattrs are removed and move all the buckets after it to the previous one. When all the xattr buckets in an extend record are freed, free this extend records from ocfs2_xattr_tree. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Optionally limit extent size in ocfs2_insert_extent()Tao Ma
In xattr bucket, we want to limit the maximum size of a btree leaf, otherwise we'll lose the benefits of hashing because we'll have to search large leaves. So add a new field in ocfs2_extent_tree which indicates the maximum leaf cluster size we want so that we can prevent ocfs2_insert_extent() from merging the leaf record even if it is contiguous with an adjacent record. Other btree types are not affected by this change. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Add xattr lookup code xattr btreesTao Ma
Add code to lookup a given extended attribute in the xattr btree. Lookup follows this general scheme: 1. Use ocfs2_xattr_get_rec to find the xattr extent record 2. Find the xattr bucket within the extent which may contain this xattr 3. Iterate the bucket to find the xattr. In ocfs2_xattr_block_get(), we need to recalcuate the block offset and name offset for the right position of name/value. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>