aboutsummaryrefslogtreecommitdiff
path: root/fs/ocfs2/inode.h
AgeCommit message (Collapse)Author
2009-04-03ocfs2: fix rare stale inode errors when exporting via nfswengang wang
For nfs exporting, ocfs2_get_dentry() returns the dentry for fh. ocfs2_get_dentry() may read from disk when the inode is not in memory, without any cross cluster lock. this leads to the file system loading a stale inode. This patch fixes above problem. Solution is that in case of inode is not in memory, we get the cluster lock(PR) of alloc inode where the inode in question is allocated from (this causes node on which deletion is done sync the alloc inode) before reading out the inode itsself. then we check the bitmap in the group (the inode in question allcated from) to see if the bit is clear. if it's clear then it's stale. if the bit is set, we then check generation as the existing code does. We have to read out the inode in question from disk first to know its alloc slot and allot bit. And if its not stale we read it out using ocfs2_iget(). The second read should then be from cache. And also we have to add a per superblock nfs_sync_lock to cover the lock for alloc inode and that for inode in question. this is because ocfs2_get_dentry() and ocfs2_delete_inode() lock on them in reverse order. nfs_sync_lock is locked in EX mode in ocfs2_get_dentry() and in PR mode in ocfs2_delete_inode(). so that mutliple ocfs2_delete_inode() can run concurrently in normal case. [mfasheh@suse.com: build warning fixes and comment cleanups] Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-04-03ocfs2: Optimize inode allocation by remembering last groupTao Ma
In ocfs2, the inode block search looks for the "emptiest" inode group to allocate from. So if an inode alloc file has many equally (or almost equally) empty groups, new inodes will tend to get spread out amongst them, which in turn can put them all over the disk. This is undesirable because directory operations on conceptually "nearby" inodes force a large number of seeks. So we add ip_last_used_group in core directory inodes which records the last used allocation group. Another field named ip_last_used_slot is also added in case inode stealing happens. When claiming new inode, we passed in directory's inode so that the allocation can use this information. For more details, please see http://oss.oracle.com/osswiki/OCFS2/DesignDocs/InodeAllocationStrategy. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05ocfs2: Implementation of local and global quota file handlingJan Kara
For each quota type each node has local quota file. In this file it stores changes users have made to disk usage via this node. Once in a while this information is synced to global file (and thus with other nodes) so that limits enforcement at least aproximately works. Global quota files contain all the information about usage and limits. It's mostly handled by the generic VFS code (which implements a trie of structures inside a quota file). We only have to provide functions to convert structures from on-disk format to in-memory one. We also have to provide wrappers for various quota functions starting transactions and acquiring necessary cluster locks before the actual IO is really started. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05ocfs2: Wrap inode block reads in a dedicated function.Joel Becker
The ocfs2 code currently reads inodes off disk with a simple ocfs2_read_block() call. Each place that does this has a different set of sanity checks it performs. Some check only the signature. A couple validate the block number (the block read vs di->i_blkno). A couple others check for VALID_FL. Only one place validates i_fs_generation. A couple check nothing. Even when an error is found, they don't all do the same thing. We wrap inode reading into ocfs2_read_inode_block(). This will validate all the above fields, going readonly if they are invalid (they never should be). ocfs2_read_inode_block_full() is provided for the places that want to pass read_block flags. Every caller is passing a struct inode with a valid ip_blkno, so we don't need a separate blkno argument either. We will remove the validation checks from the rest of the code in a later commit, as they are no longer necessary. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-14ocfs2: Move ocfs2_bread() into dir.cJoel Becker
dir.c is the only place using ocfs2_bread(), so let's make it static to that file. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Switch over to JBD2.Joel Becker
ocfs2 wants JBD2 for many reasons, not the least of which is that JBD is limiting our maximum filesystem size. It's a pretty trivial change. Most functions are just renamed. The only functional change is moving to Jan's inode-based ordered data mode. It's better, too. Because JBD2 reads and writes JBD journals, this is compatible with any existing filesystem. It can even interact with JBD-based ocfs2 as long as the journal is formated for JBD. We provide a compatibility option so that paranoid people can still use JBD for the time being. This will go away shortly. [ Moved call of ocfs2_begin_ordered_truncate() from ocfs2_delete_inode() to ocfs2_truncate_for_delete(). --Mark ] Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Add extended attribute supportTiger Yang
This patch implements storing extended attributes both in inode or a single external block. We only store EA's in-inode when blocksize > 512 or that inode block has free space for it. When an EA's value is larger than 80 bytes, we will store the value via b-tree outside inode or block. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-01-25ocfs2: Silence false lockdep warningsJan Kara
Create separate lockdep lock classes for system file's i_mutexes. They are used to guard allocations and similar things and thus rank differently than i_mutex of a regular file or directory. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25ocfs2: Rename ocfs2_meta_[un]lockMark Fasheh
Call this the "inode_lock" now, since it covers both data and meta data. This patch makes no functional changes. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25ocfs2: Remove data locksMark Fasheh
The meta lock now covers both meta data and data, so this just removes the now-redundant data lock. Combining locks saves us a round of lock mastery per inode and one less lock to ping between nodes during read/write. We don't lose much - since meta locks were always held before a data lock (and at the same level) ordered writeout mode (the default) ensured that flushing for the meta data lock also pushed out data anyways. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-10-12ocfs2: Structure updates for inline dataMark Fasheh
Add the disk, network and memory structures needed to support data in inode. Struct ocfs2_inline_data is defined and embedded in ocfs2_dinode for storing inline data. A new inode field, i_dyn_features, is added to facilitate tracking of dynamic inode state. Since it will be used often, we want to mirror it on ocfs2_inode_info, and transfer it via the meta data lvb. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Reviewed-by: Joel Becker <joel.becker@oracle.com>
2007-05-02[PATCH] Copy i_flags to ocfs2 inode flags on writeJan Kara
Propagate flags such as S_APPEND, S_IMMUTABLE, etc. from i_flags into ocfs2-specific ip_attr. Hence, when someone sets these flags via a different interface than ioctl, they are stored correctly. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26ocfs2: Cache extent recordsMark Fasheh
The extent map code was ripped out earlier because of an inability to deal with holes. This patch adds back a simpler caching scheme requiring far less code. Our old extent map caching was designed back when meta data block caching in Ocfs2 didn't work very well, resulting in many disk reads. These days our metadata caching is much better, resulting in no un-necessary disk reads. As a result, extent caching doesn't have to be as fancy, nor does it have to cache as many extents. Keeping the last 3 extents seen should be sufficient to give us a small performance boost on some streaming workloads. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26ocfs2: Fix up i_blocks calculation to know about holesMark Fasheh
Older file systems which didn't support holes did a dumb calculation of i_blocks based on i_size. This is no longer accurate, so fix things up to take actual allocation into account. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26ocfs2: temporarily remove extent map cachingMark Fasheh
The code in extent_map.c is not prepared to deal with a subtree being rotated between lookups. This can happen when filling holes in sparse files. Instead of a lengthy patch to update the code (which would likely lose the benefit of caching subtree roots), we remove most of the algorithms and implement a simple path based lookup. A less ambitious extent caching scheme will be added in a later patch. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26ocfs2: remove unused codeTiger Yang
Remove node messaging code that becomes unused with the delete inode vote removal. [Removed even more cruft which I spotted during review --Mark] Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26ocfs2: Remove delete inode voteTiger Yang
Ocfs2 currently does cluster-wide node messaging to check the open state of an inode during delete. This patch removes that mechanism in favor of an inode cluster lock which is taken at shared read when an inode is first read and dropped in clear_inode(). This allows a deleting node to test the liveness of an inode by attempting to take an exclusive lock. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-07[PATCH] slab: remove kmem_cache_tChristoph Lameter
Replace all uses of kmem_cache_t with struct kmem_cache. The patch was generated using the following script: #!/bin/sh # # Replace one string by another in all the kernel sources. # set -e for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do quilt add $file sed -e "1,\$s/$1/$2/g" $file >/tmp/$$ mv /tmp/$$ $file quilt refresh done The script was run like this sh replace kmem_cache_t "struct kmem_cache" Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-01ocfs2: Remove struct ocfs2_journal_handle in favor of handle_tMark Fasheh
This is mostly a search and replace as ocfs2_journal_handle is now no more than a container for a handle_t pointer. ocfs2_commit_trans() becomes very straight forward, and we remove some out of date comments / code. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01ocfs2: remove unused ocfs2_handle_add_inode()Mark Fasheh
We can also delete the unused infrastructure which was once in place to support this functionality. ocfs2_inode_private loses ip_handle and ip_handle_list. ocfs2_journal_handle loses handle_list. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-24ocfs2: Remove i_generation from inode lock namesMark Fasheh
OCFS2 puts inode meta data in the "lock value block" provided by the DLM. Typically, i_generation is encoded in the lock name so that a deleted inode on and a new one in the same block don't share the same lvb. Unfortunately, that scheme means that the read in ocfs2_read_locked_inode() is potentially thrown away as soon as the meta data lock is taken - we cannot encode the lock name without first knowing i_generation, which requires a disk read. This patch encodes i_generation in the inode meta data lvb, and removes the value from the inode meta data lock name. This way, the read can be covered by a lock, and at the same time we can distinguish between an up to date and a stale LVB. This will help cold-cache stat(2) performance in particular. Since this patch changes the protocol version, we take the opportunity to do a minor re-organization of two of the LVB fields. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-20ocfs2: add ext2 attributesHerbert Poetzl
Support immutable, and other attributes. Some renaming and other minor fixes done by myself. Signed-off-by: Herbert Poetzl <herbert@13thfloor.at> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-28[PATCH] mark address_space_operations constChristoph Hellwig
Same as with already do with the file operations: keep them in .rodata and prevents people from doing runtime patching. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Steven French <sfrench@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-03[OCFS2] Make ip_io_sem a mutexMark Fasheh
ip_io_sem is now ip_io_mutex. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-01-03[PATCH] OCFS2: The Second Oracle Cluster FilesystemMark Fasheh
The OCFS2 file system module. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>