aboutsummaryrefslogtreecommitdiff
path: root/fs/ocfs2/file.c
AgeCommit message (Collapse)Author
2008-11-10ocfs2: truncate outstanding block after direct io failureDmitri Monakhov
Signed-off-by: Dmitri Monakhov <dmonakhov@openvz.org> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: Joel Becker <Joel.Becker@oracle.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-11-10ocfs2: Fix check of return value of ocfs2_start_trans()Jan Kara
On failure, ocfs2_start_trans() returns values like ERR_PTR(-ENOMEM). Thus checks for !handle are wrong. Fix them to use IS_ERR(). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-30fs: remove prepare_write/commit_writeNick Piggin
Nothing uses prepare_write or commit_write. Remove them from the tree completely. [akpm@linux-foundation.org: schedule simple_prepare_write() for unexporting] Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-14ocfs2: Simplify ocfs2_read_block()Joel Becker
More than 30 callers of ocfs2_read_block() pass exactly OCFS2_BH_CACHED. Only six pass a different flag set. Rather than have every caller care, let's make ocfs2_read_block() take no flags and always do a cached read. The remaining six places can call ocfs2_read_blocks() directly. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-14ocfs2: Require an inode for ocfs2_read_block(s)().Joel Becker
Now that synchronous readers are using ocfs2_read_blocks_sync(), all callers of ocfs2_read_blocks() are passing an inode. Use it unconditionally. Since it's there, we don't need to pass the ocfs2_super either. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Don't check for NULL before brelse()Mark Fasheh
This is pointless as brelse() already does the check. Signed-off-by: Mark Fasheh
2008-10-13ocfs2: Switch over to JBD2.Joel Becker
ocfs2 wants JBD2 for many reasons, not the least of which is that JBD is limiting our maximum filesystem size. It's a pretty trivial change. Most functions are just renamed. The only functional change is moving to Jan's inode-based ordered data mode. It's better, too. Because JBD2 reads and writes JBD journals, this is compatible with any existing filesystem. It can even interact with JBD-based ocfs2 as long as the journal is formated for JBD. We provide a compatibility option so that paranoid people can still use JBD for the time being. This will go away shortly. [ Moved call of ocfs2_begin_ordered_truncate() from ocfs2_delete_inode() to ocfs2_truncate_for_delete(). --Mark ] Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Change ocfs2_get_*_extent_tree() to ocfs2_init_*_extent_tree()Joel Becker
The original get/put_extent_tree() functions held a reference on et_root_bh. However, every single caller already has a safe reference, making the get/put cycle irrelevant. We change ocfs2_get_*_extent_tree() to ocfs2_init_*_extent_tree(). It no longer gets a reference on et_root_bh. ocfs2_put_extent_tree() is removed. Callers now have a simpler init+use pattern. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Make ocfs2_extent_tree the first-class representation of a tree.Joel Becker
We now have three different kinds of extent trees in ocfs2: inode data (dinode), extended attributes (xattr_tree), and extended attribute values (xattr_value). There is a nice abstraction for them, ocfs2_extent_tree, but it is hidden in alloc.c. All the calling functions have to pick amongst a varied API and pass in type bits and often extraneous pointers. A better way is to make ocfs2_extent_tree a first-class object. Everyone converts their object to an ocfs2_extent_tree() via the ocfs2_get_*_extent_tree() calls, then uses the ocfs2_extent_tree for all tree calls to alloc.c. This simplifies a lot of callers, making for readability. It also provides an easy way to add additional extent tree types, as they only need to be defined in alloc.c with a ocfs2_get_<new>_extent_tree() function. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Add extended attribute supportTiger Yang
This patch implements storing extended attributes both in inode or a single external block. We only store EA's in-inode when blocksize > 512 or that inode block has free space for it. When an EA's value is larger than 80 bytes, we will store the value via b-tree outside inode or block. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Add extent tree operation for xattr value btreesTao Ma
Add some thin wrappers around ocfs2_insert_extent() for each of the 3 different btree types, ocfs2_inode_insert_extent(), ocfs2_xattr_value_insert_extent() and ocfs2_xattr_tree_insert_extent(). The last is for the xattr index btree, which will be used in a followup patch. All the old callers in file.c etc will call ocfs2_dinode_insert_extent(), while the other two handle the xattr issue. And the init of extent tree are handled by these functions. When storing xattr value which is too large, we will allocate some clusters for it and here ocfs2_extent_list and ocfs2_extent_rec will also be used. In order to re-use the b-tree operation code, a new parameter named "private" is added into ocfs2_extent_tree and it is used to indicate the root of ocfs2_exent_list. The reason is that we can't deduce the root from the buffer_head now. It may be in an inode, an ocfs2_xattr_block or even worse, in any place in an ocfs2_xattr_bucket. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Make high level btree extend code genericTao Ma
Factor out the non-inode specifics of ocfs2_do_extend_allocation() into a more generic function, ocfs2_do_cluster_allocation(). ocfs2_do_extend_allocation calls ocfs2_do_cluster_allocation() now, but the latter can be used for other btree types as well. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Abstract ocfs2_extent_tree in b-tree operations.Tao Ma
In the old extent tree operation, we take the hypothesis that we are using the ocfs2_extent_list in ocfs2_dinode as the tree root. As xattr will also use ocfs2_extent_list to store large value for a xattr entry, we refactor the tree operation so that xattr can use it directly. The refactoring includes 4 steps: 1. Abstract set/get of last_eb_blk and update_clusters since they may be stored in different location for dinode and xattr. 2. Add a new structure named ocfs2_extent_tree to indicate the extent tree the operation will work on. 3. Remove all the use of fe_bh and di, use root_bh and root_el in extent tree instead. So now all the fe_bh is replaced with et->root_bh, el with root_el accordingly. 4. Make ocfs2_lock_allocators generic. Now it is limited to be only used in file extend allocation. But the whole function is useful when we want to store large EAs. Note: This patch doesn't touch ocfs2_commit_truncate() since it is not used for anything other than truncate inode data btrees. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Use ocfs2_extent_list instead of ocfs2_dinode.Tao Ma
ocfs2_extend_meta_needed(), ocfs2_calc_extend_credits() and ocfs2_reserve_new_metadata() are all useful for extent tree operations. But they are all limited to an inode btree because they use a struct ocfs2_dinode parameter. Change their parameter to struct ocfs2_extent_list (the part of an ocfs2_dinode they actually use) so that the xattr btree code can use these functions. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Modify ocfs2_num_free_extents for future xattr usage.Tao Ma
ocfs2_num_free_extents() is used to find the number of free extent records in an inode btree. Hence, it takes an "ocfs2_dinode" parameter. We want to use this for extended attribute trees in the future, so genericize the interface the take a buffer head. A future patch will allow that buffer_head to contain any structure rooting an ocfs2 btree. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: POSIX file locks supportMark Fasheh
This is actually pretty easy since fs/dlm already handles the bulk of the work. The Ocfs2 userspace cluster stack module already uses fs/dlm as the underlying lock manager, so I only had to add the right calls. Cluster-aware POSIX locks ("plocks") can be turned off by the same means at UNIX locks - mount with 'noflocks', or create a local-only Ocfs2 volume. Internally, the file system uses two sets of file_operations, depending on whether cluster aware plocks is required. This turns out to be easier than implementing local-only versions of ->lock. Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-03ocfs2: fiemap supportMark Fasheh
Plug ocfs2 into ->fiemap. Some portions of ocfs2_get_clusters() had to be refactored so that the extent cache can be skipped in favor of going directly to the on-disk records. This makes it easier for us to determine which extent is the last one in the btree. Also, I'm not sure we want to be caching fiemap lookups anyway as they're not directly related to data read/write. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: ocfs2-devel@oss.oracle.com Cc: linux-fsdevel@vger.kernel.org
2008-07-31[PATCH] ocfs2: Release mutex in error handling codeJulia Lawall
The mutex is released on a successful return, so it would seem that it should be released on an error return as well. The semantic patch finds this problem is as follows: (http://www.emn.fr/x-info/coccinelle/) // <smpl> @@ expression l; @@ mutex_lock(l); ... when != mutex_unlock(l) when any when strict ( if (...) { ... when != mutex_unlock(l) + mutex_unlock(l); return ...; } | mutex_unlock(l); ) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-07-26[PATCH] sanitize ->permission() prototypeAl Viro
* kill nameidata * argument; map the 3 bits in ->flags anybody cares about to new MAY_... ones and pass with the mask. * kill redundant gfs2_iop_permission() * sanitize ecryptfs_permission() * fix remaining places where ->permission() instances might barf on new MAY_... found in mask. The obvious next target in that direction is permission(9) folded fix for nfs_permission() breakage from Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-07-14ocfs2: Silence an error message in ocfs2_file_aio_read()Sunil Mushran
This patch silences an EINVAL error message in ocfs2_file_aio_read() that is always due to a user error. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-04-30ocfs2: Allow uid/gid/perm changes of symlinksSunil Mushran
This patch adds the ability to change attributes of a symlink. Fixes oss bugzilla#963 http://oss.oracle.com/bugzilla/show_bug.cgi?id=963 Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-04-18ocfs2: Convert ocfs2 over to unlocked_ioctlAndi Kleen
As far as I can see there is nothing in ocfs2_ioctl that requires the BKL, so use unlocked_ioctl Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-01-25ocfs2: printf fixesJan Kara
Explicitely convert loff_t to long long in printf. Just for sure... Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25ocfs2: Use generic_file_llseekJan Kara
We should use generic_file_llseek() and not default_llseek() so that s_maxbytes gets properly checked when seeking. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25[PATCH 2/2] ocfs2: cluster aware flock()Mark Fasheh
Hook up ocfs2_flock(), using the new flock lock type in dlmglue.c. A new mount option, "localflocks" is added so that users can revert to old functionality as need be. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25ocfs2: Rename ocfs2_meta_[un]lockMark Fasheh
Call this the "inode_lock" now, since it covers both data and meta data. This patch makes no functional changes. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25ocfs2: Remove data locksMark Fasheh
The meta lock now covers both meta data and data, so this just removes the now-redundant data lock. Combining locks saves us a round of lock mastery per inode and one less lock to ping between nodes during read/write. We don't lose much - since meta locks were always held before a data lock (and at the same level) ordered writeout mode (the default) ensured that flushing for the meta data lock also pushed out data anyways. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-27ocfs2: reverse inline-data truncate argsMark Fasheh
ocfs2_truncate() and ocfs2_remove_inode_range() had reversed their "set i_size" arguments to ocfs2_truncate_inline(). Fix things so that truncate sets i_size, and punching a hole ignores it. This exposed a problem where punching a hole in an inline-data file wasn't updating the page cache, so fix that too. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-06ocfs2: Commit journal on sync writesMark Fasheh
We're missing a meta data commit for extending sync writes. In thoery, write could return with the meta data required to read the data uncommitted to disk. Fix that by detecting an allocating write and forcing a journal commit in the sync case. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-10-16ocfs2: convert to new aopsNick Piggin
Plug ocfs2 into the ->write_begin and ->write_end aops. A bunch of custom code is now gone - the iovec iteration stuff during write and the ocfs2 splice write actor. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-12ocfs2: Write support for inline dataMark Fasheh
This fixes up write, truncate, mmap, and RESVSP/UNRESVP to understand inline inode data. For the most part, the changes to the core write code can be relied on to do the heavy lifting. Any code calling ocfs2_write_begin (including shared writeable mmap) can count on it doing the right thing with respect to growing inline data to an extent tree. Size reducing truncates, including UNRESVP can simply zero that portion of the inode block being removed. Size increasing truncatesm, including RESVP have to be a little bit smarter and grow the inode to an extent tree if necessary. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Reviewed-by: Joel Becker <joel.becker@oracle.com>
2007-10-12ocfs2: move nonsparse hole-filling into ocfs2_write_begin()Mark Fasheh
By doing this, we can remove any higher level logic which has to have knowledge of btree functionality - any callers of ocfs2_write_begin() can now expect it to do anything necessary to prepare the inode for new data. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Reviewed-by: Joel Becker <joel.becker@oracle.com>
2007-09-20ocfs2: Allow smaller allocations during large writesMark Fasheh
The ocfs2 write code loops through a page much like the block code, except that ocfs2 allocation units can be any size, including larger than page size. Typically it's equal to or larger than page size - most kernels run 4k pages, the minimum ocfs2 allocation (cluster) size. Some changes introduced during 2.6.23 changed the way writes to pages are handled, and inadvertantly broke support for > 4k page size. Instead of just writing one cluster at a time, we now handle the whole page in one pass. This means that multiple (small) seperate allocations might happen in the same pass. The allocation code howver typically optimizes by getting the maximum which was reserved. This triggered a BUG_ON in the extend code where it'd ask for a single bit (for one part of a > 4k page) and get back more than it asked for. Fix this by providing a variant of the high level allocation function which allows the caller to specify a maximum. The traditional function remains and just calls the new one with a maximum determined from the initial reservation. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-09-11ocfs2: Fix calculation of i_blocks during truncateMark Fasheh
We were setting i_blocks too early - before truncating any allocation. Correct things to set i_blocks after the allocation change. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-08-09ocfs2: check ia_size limits in setattrMark Fasheh
We have to manually check the requested truncate size as the check in vmtruncate() comes too late for Ocfs2. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-08-09ocfs2: Fix some casting errors related to file writesMark Fasheh
ocfs2_align_clusters_to_page_index() needs to cast the clusters shift to pgoff_t and ocfs2_file_buffered_write() needs loff_t when calculating destination start for memcpy. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-08-09ocfs2: use s_maxbytes directly in ocfs2_change_file_space()Mark Fasheh
There's no need to recalculate things via ocfs2_max_file_offset() as we've already done that to fill s_maxbytes, so use that instead. We can also un-export ocfs2_max_file_offset() then. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-08-09ocfs2: Restrict inode changes in ocfs2_update_inode_atime()Mark Fasheh
ocfs2_update_inode_atime() calls ocfs2_mark_inode_dirty() to push changes from the struct inode into the ocfs2 disk inode. The problem is, ocfs2_mark_inode_dirty() might change other fields, depending on what happened to the struct inode. Since we don't always have locking to serialize changes to other fields (like i_size, etc), just fix things up to only touch the atime field. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-24ocfs2: bad kunmap_atomic()Jens Axboe
kunmap_atomic() takes the virtual address, not the mapped page as argument. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Cc: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19ocfs2: ->fallocate() supportMark Fasheh
Plug ocfs2 into the ->fallocate() callback. This just re-uses the existing preallocation code. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-17arch/i386/* fs/* ipc/*: mark variables with uninitialized_var()Jeff Garzik
Mark variables with uninitialized_var() if such a warning appears, and analysis proves that the var is initialized properly on all paths it is used. Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-07-10ocfs2: Support xfs style space reservation ioctlsMark Fasheh
We re-use the RESVSP/UNRESVSP ioctls from xfs which allow the user to allocate and deallocate regions to a file without zeroing data or changing i_size. Though renamed, the structure passed in from user is identical to struct xfs_flock64. The three fields that are actually used right now are l_whence, l_start and l_len. This should get ocfs2 immediate compatibility with userspace software using the pre-existing xfs ioctls. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10ocfs2: support for removing file regionsMark Fasheh
Provide an internal interface for the removal of arbitrary file regions. ocfs2_remove_inode_range() takes a byte range within a file and will remove existing extents within that range. Partial clusters will be zeroed so that any read from within the region will return zeros. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10ocfs2: update truncate handling of partial clustersMark Fasheh
The partial cluster zeroing code used during truncate usually assumes that the rightmost byte in the range to be zeroed lies on a cluster boundary. This makes sense for truncate, but punching holes might require zeroing on non-aligned rightmost boundaries. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10ocfs2: Support creation of unwritten extentsMark Fasheh
This can now be trivially supported with re-use of our existing extend code. ocfs2_allocate_unwritten_extents() takes a start offset and a byte length and iterates over the inode, adding extents (marked as unwritten) until len is reached. Existing extents are skipped over. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10ocfs2: support writing of unwritten extentsMark Fasheh
Update the write code to detect when the user is asking to write to an unwritten extent. Like writing to a hole, we must zero the region between the write and the cluster boundaries. Most of the existing cluster zeroing logic can be re-used with some additional checks for the unwritten flag on extent records. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10ocfs2: shared writeable mmapMark Fasheh
Implement cluster consistent shared writeable mappings using the ->page_mkwrite() callback. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10ocfs2: rework ocfs2_buffered_write_cluster()Mark Fasheh
Use some ideas from the new-aops patch series and turn ocfs2_buffered_write_cluster() into a 2 stage operation with the caller copying data in between. The code now understands multiple cluster writes as a result of having to deal with a full page write for greater than 4k pages. This sets us up to easily call into the write path during ->page_mkwrite(). Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10ocfs2: take ip_alloc_sem during entire truncateMark Fasheh
Use of the alloc sem during truncate was too narrow - we want to protect the i_size change and page truncation against mmap now. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10pipe: change the ->pin() operation to ->confirm()Jens Axboe
The name 'pin' was badly chosen, it doesn't pin a pipe buffer in the most commonly used sense in the kernel. So change the name to 'confirm', after debating this issue with Hugh Dickins a bit. A good return from ->confirm() means that the buffer is really there, and that the contents are good. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>