aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2007-04-27Merge branch 'for-linus' of git://git.infradead.org/ubi-2.6Linus Torvalds
* 'for-linus' of git://git.infradead.org/ubi-2.6: UBI: remove unused variable UBI: add me to MAINTAINERS JFFS2: add UBI support UBI: Unsorted Block Images
2007-04-27Merge branch 'upstream-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: (27 commits) ocfs2: Cache extent records ocfs2: Remember rw lock level during direct io ocfs2: Fix up i_blocks calculation to know about holes ocfs2: Fix extent lookup to return true size of holes ocfs2: Read from an unwritten extent returns zeros ocfs2: make room for unwritten extents flag ocfs2: Use own splice write actor ocfs2: Use do_sync_mapping_range() in ocfs2_zero_tail_for_truncate() [PATCH] Turn do_sync_file_range() into do_sync_mapping_range() ocfs2: zero tail of sparse files on truncate ocfs2: Teach ocfs2_get_block() about holes ocfs2: remove ocfs2_prepare_write() and ocfs2_commit_write() ocfs2: teach ocfs2_file_aio_write() about sparse files ocfs2: Turn off shared writeable mmap for local files systems with holes. ocfs2: abstract out allocation locking ocfs2: teach extend/truncate about sparse files ocfs2: temporarily remove extent map caching ocfs2: sparse b-tree support ocfs2: small cleanup of ocfs2_request_delete() ocfs2: remove unused code ...
2007-04-27JFFS2: add UBI supportArtem Bityutskiy
This patch make JFFS2 able to work with UBI volumes via the emulated MTD devices which are directly mapped to these volumes. Signed-off-by: Artem Bityutskiy <dedekind@infradead.org>
2007-04-26[AFS]: Eliminate cmpxchg() usage in vlocation code.David S. Miller
cmpxchg() is not available on every processor so can't be used in generic code. Replace with spinlock protection on the ->state changes, wakeups, and wait loops. Add what appears to be a missing wakeup on transition to AFS_VL_VALID state in afs_vlocation_updater(). Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-26[AFS]: Fix u64 printing in debug logging.David S. Miller
Need 'unsigned long long' casts to quiet warnings on 64-bit platforms when using %ll on a u64. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-26[AFS]: Add "directory write" support.David Howells
Add support for the create, link, symlink, unlink, mkdir, rmdir and rename VFS operations to the in-kernel AFS filesystem. Also: (1) Fix dentry and inode revalidation. d_revalidate should only look at state of the dentry. Revalidation of the contents of an inode pointed to by a dentry is now separate. (2) Fix afs_lookup() to hash negative dentries as well as positive ones. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-26[AFS]: Implement the CB.InitCallBackState3 operation.David Howells
Implement the CB.InitCallBackState3 operation for the fileserver to call. This reduces the amount of network traffic because if this op is aborted, the fileserver will then attempt an CB.InitCallBackState operation. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-26[AFS]: Add support for the CB.GetCapabilities operation.David Howells
Add support for the CB.GetCapabilities operation with which the fileserver can ask the client for the following information: (1) The list of network interfaces it has available as IPv4 address + netmask plus the MTUs. (2) The client's UUID. (3) The extended capabilities of the client, for which the only current one is unified error mapping (abort code interpretation). To support this, the patch adds the following routines to AFS: (1) A function to iterate through all the network interfaces using RTNETLINK to extract IPv4 addresses and MTUs. (2) A function to iterate through all the network interfaces using RTNETLINK to pull out the MAC address of the lowest index interface to use in UUID construction. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-26[AFS]: Add security support.David Howells
Add security support to the AFS filesystem. Kerberos IV tickets are added as RxRPC keys are added to the session keyring with the klog program. open() and other VFS operations then find this ticket with request_key() and either use it immediately (eg: mkdir, unlink) or attach it to a file descriptor (open). Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-26[AFS]: Handle multiple mounts of an AFS superblock correctly.David Howells
Handle multiple mounts of an AFS superblock correctly, checking to see whether the superblock is already initialised after calling sget() rather than just unconditionally stamping all over it. Also delete the "silent" parameter to afs_fill_super() as it's not used and can, in any case, be obtained from sb->s_flags. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-26[AF_RXRPC]: Delete the old RxRPC code.David Howells
Delete the old RxRPC code as it's now no longer used. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-26[AF_RXRPC]: Make the in-kernel AFS filesystem use AF_RXRPC.David Howells
Make the in-kernel AFS filesystem use AF_RXRPC instead of the old RxRPC code. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-26[AFS]: Clean up the AFS sourcesDavid Howells
Clean up the AFS sources. Also remove references to AFS keys. RxRPC keys are used instead. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-26ocfs2: Cache extent recordsMark Fasheh
The extent map code was ripped out earlier because of an inability to deal with holes. This patch adds back a simpler caching scheme requiring far less code. Our old extent map caching was designed back when meta data block caching in Ocfs2 didn't work very well, resulting in many disk reads. These days our metadata caching is much better, resulting in no un-necessary disk reads. As a result, extent caching doesn't have to be as fancy, nor does it have to cache as many extents. Keeping the last 3 extents seen should be sufficient to give us a small performance boost on some streaming workloads. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26ocfs2: Remember rw lock level during direct ioMark Fasheh
Cluster locking might have been redone because a direct write won't complete, so this needs to be reflected in the iocb. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26ocfs2: Fix up i_blocks calculation to know about holesMark Fasheh
Older file systems which didn't support holes did a dumb calculation of i_blocks based on i_size. This is no longer accurate, so fix things up to take actual allocation into account. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26ocfs2: Fix extent lookup to return true size of holesMark Fasheh
Initially, we had wired things to return a size '1' of holes. Cook up a small amount of code to find the next extent and calculate the number of clusters between the virtual offset and the next allocated extent. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26ocfs2: Read from an unwritten extent returns zerosMark Fasheh
Return an optional extent flags field from our lookup functions and wire up callers to treat unwritten regions as holes for the purpose of returning zeros to the user. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26ocfs2: make room for unwritten extents flagMark Fasheh
Due to the size of our group bitmaps, we'll never have a leaf node extent record with more than 16 bits worth of clusters. Split e_clusters up so that leaf nodes can get a flags field where we can mark unwritten extents. Interior nodes whose length references all the child nodes beneath it can't split their e_clusters field, so we use a union to preserve sizing there. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26ocfs2: Use own splice write actorMark Fasheh
We need to fill holes during a splice write. Provide our own splice write actor which can call ocfs2_file_buffered_write() with a splice-specific callback. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26ocfs2: Use do_sync_mapping_range() in ocfs2_zero_tail_for_truncate()Mark Fasheh
Do this instead of filemap_fdatawrite() - this way we sync only the range between i_size and the cluster boundary. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26[PATCH] Turn do_sync_file_range() into do_sync_mapping_range()Mark Fasheh
do_sync_file_range() accepts a file * from which it takes an address_space to sync. Abstract out the bulk of the function into do_sync_mapping_range() which takes the address_space directly. This way callers who want to sync an address_space directly can take advantage of the functionality provided. do_sync_file_range() is preserved as a small wrapper around do_sync_mapping_range(). Ocfs2 in particular would like to use this to initiate a sync of a specific inode range during truncate, where a file * may not be available. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-04-26ocfs2: zero tail of sparse files on truncateMark Fasheh
Since we don't zero on extend anymore, truncate needs to be fixed up to zero the part of a file between i_size and and end of it's cluster. Otherwise a subsequent extend could expose bad data. This introduced a new helper, which can be used in ocfs2_write(). Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26ocfs2: Teach ocfs2_get_block() about holesMark Fasheh
ocfs2_get_block() didn't understand sparse files, fix that. Also remove some code that isn't really useful anymore. We can fix up ocfs2_direct_IO_get_blocks() at the same time. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26ocfs2: remove ocfs2_prepare_write() and ocfs2_commit_write()Mark Fasheh
These are no longer used, and can't handle file systems with sparse file allocation. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26ocfs2: teach ocfs2_file_aio_write() about sparse filesMark Fasheh
Unfortunately, ocfs2 can no longer make use of generic_file_aio_write_nlock() because allocating writes will require zeroing of pages adjacent to the I/O for cluster sizes greater than page size. Implement a custom file write here, which can order page locks for zeroing. This also has the advantage that cluster locks can easily be ordered outside of the page locks. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26ocfs2: Turn off shared writeable mmap for local files systems with holes.Mark Fasheh
This will be turned back on once we can do allocation in ->page_mkwrite(). Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26ocfs2: abstract out allocation lockingMark Fasheh
Right now, file allocation for ocfs2 is done within ocfs2_extend_file(), which is either called from ->setattr() (for an i_size change), or at the top of ocfs2_file_aio_write(). Inodes on file systems with sparse file support will want to do their allocation during the actual write call. In either case the cluster locking decisions are the same. We abstract out that code into a new function, ocfs2_lock_allocators() which will be used by a later patch to enable writing to sparse files. This also provides a nice cleanup of ocfs2_extend_allocation(). Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26ocfs2: teach extend/truncate about sparse filesMark Fasheh
For ocfs2_truncate_file(), we eliminate the "simple" truncate case which no longer exists since i_size is not tied to i_clusters. In ocfs2_extend_file(), we skip the allocation / page zeroing code for file systems which understand sparse files. The core truncate code is changed to do a bottom up tree traversal. This gets abstracted out into it's own function. To make things more readable, most of the special case handling for in-inode extents from ocfs2_do_truncate() is also removed. Though write support for sparse files comes in a later patch, we at least update ocfs2_prepare_inode_for_write() to skip allocation for sparse files. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26ocfs2: temporarily remove extent map cachingMark Fasheh
The code in extent_map.c is not prepared to deal with a subtree being rotated between lookups. This can happen when filling holes in sparse files. Instead of a lengthy patch to update the code (which would likely lose the benefit of caching subtree roots), we remove most of the algorithms and implement a simple path based lookup. A less ambitious extent caching scheme will be added in a later patch. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26ocfs2: sparse b-tree supportMark Fasheh
Introduce tree rotations into the b-tree code. This will allow ocfs2 to support sparse files. Much of the added code is designed to be generic (in the ocfs2 sense) so that it can later be re-used to implement large extended attributes. This patch only adds the rotation code and does minimal updates to callers of the extent api. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26ocfs2: small cleanup of ocfs2_request_delete()Mark Fasheh
There are two checks in there (one for inode newness, one for other mounted nodes) which are unnecessary, so remove them. The DLM will allow the trylock in either case without any messaging overhead. Removing these makes ocfs2_request_delete() a one liner function, so just move the trylock out one level into ocfs2_query_inode_wipe(). Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26ocfs2: remove unused codeTiger Yang
Remove node messaging code that becomes unused with the delete inode vote removal. [Removed even more cruft which I spotted during review --Mark] Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26ocfs2: Remove delete inode voteTiger Yang
Ocfs2 currently does cluster-wide node messaging to check the open state of an inode during delete. This patch removes that mechanism in favor of an inode cluster lock which is taken at shared read when an inode is first read and dropped in clear_inode(). This allows a deleting node to test the liveness of an inode by attempting to take an exclusive lock. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26ocfs2: filter more error printsMark Fasheh
We don't want to print anything at all in ocfs2_lookup() when getting an error from ocfs2_iget() - it could be something as innocuous as a signal being detected in the dlm. ocfs2_permission() should filter on -ENOENT which ocfs2_meta_lock() can return if the inode was deleted on another node. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26ocfs2: Replace panic() with emergency_restart() when fencingSunil Mushran
We have noticed panic() hanging leading us to a situation in which the node, while otherwise dead, is still disk heartbeating. This leads to a hung cluster as the other nodes are waiting for this node to stop disk heartbeating. This situation is only resolved by power resetting the box. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26ocfs2: Silence compiler warningsSunil Mushran
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26ocfs2: Local mounts should skip inode updatesMark Fasheh
We don't want the extent map and uptodate cache destruction in ocfs2_meta_lock_update() on a local mount, so skip that. This fixes several bugs with uptodate being cleared on buffers and extent maps being corrupted. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26ocfs2_dlm: Call cond_resched_lock() once per hash bucket scanSunil Mushran
In dlm_migrate_all_locks(), we currently call cond_resched_lock() after processing each lockres in a hash bucket. Move it outside the loop so as to call it only after the entire hash bucket has been processed. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26ocfs2_dlm: fix race in dlm_remaster_locksSrinivas Eeda
There is a possibility that dlm_remaster_locks could overwride node->state with DLM_RECO_NODE_DATA_REQUESTED after dlm_reco_data_done_handler sets the node->state to DLM_RECO_NODE_DATA_DONE. This could lead to recovery getting stuck and requires a cluster reboot. Synchronize with dlm_reco_state_lock spinlock. Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-25[NETLINK]: Switch cb_lock spinlock to mutex and allow to override itPatrick McHardy
Switch cb_lock to mutex and allow netlink kernel users to override it with a subsystem specific mutex for consistent locking in dump callbacks. All netlink_dump_start users have been audited not to rely on any side-effects of the previously used spinlock. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[NETLINK]: Introduce nlmsg_hdr() helperArnaldo Carvalho de Melo
For the common "(struct nlmsghdr *)skb->data" sequence, so that we reduce the number of direct accesses to skb->data and for consistency with all the other cast skb member helpers. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[NET]: Introduce SIOCGSTAMPNS ioctl to get timestamps with nanosec resolutionEric Dumazet
Now network timestamps use ktime_t infrastructure, we can add a new ioctl() SIOCGSTAMPNS command to get timestamps in 'struct timespec'. User programs can thus access to nanosecond resolution. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> CC: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-24reiserfs: fix xattr root locking/refcount bugJeff Mahoney
The listxattr() and getxattr() operations are only protected by a read lock. As a result, if either of these operations run in parallel, a race condition exists where the xattr_root will end up being cached twice, which results in the leaking of a reference and a BUG() on umount. This patch refactors get_xa_root(), __get_xa_root(), and create_xa_root(), into one get_xa_root() function that takes the appropriate locking around the entire critical section. Reported, diagnosed and tested by Andrea Righi <a.righi@cineca.it> Signed-off-by: Jeff Mahoney <jeffm@suse.com> Cc: Andrea Righi <a.righi@cineca.it> Cc: "Vladimir V. Saveliev" <vs@namesys.com> Cc: Edward Shishkin <edward@namesys.com> Cc: Alex Zarochentsev <zam@namesys.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-24v9fs: don't use primary fid when removing fileLatchesar Ionkov
v9fs_insert uses v9fs_fid_lookup (which also locks the fid) to get the primary fid associated with the dentry and destroys the v9fs_fid struct after removing the file. If another process called v9fs_fid_lookup on the same dentry, it may wait undefinitely for the fid's lock (as the struct is freed). This patch changes v9fs_remove to use a cloned fid, so the primary fid is not locked and freed. Signed-off-by: Latchesar Ionkov <lucho@ionkov.net> Cc: Eric Van Hensbergen <ericvh@hera.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-20NFS: Fix race in nfs_set_page_dirtyTrond Myklebust
Protect nfs_set_page_dirty() against races with nfs_inode_add_request. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-20NFS: Fix the 'desynchronized value of nfs_i.ncommit' errorTrond Myklebust
Redirtying a request that is already marked for commit will screw up the accounting for NR_UNSTABLE_NFS as well as nfs_i.ncommit. Ensure that all requests on the commit queue are labelled with the PG_NEED_COMMIT flag, and avoid moving them onto the dirty list inside nfs_page_mark_flush(). Also inline nfs_mark_request_dirty() into nfs_page_mark_flush() for atomicity reasons. Avoid dropping the spinlock until we're done marking the request in the radix tree and have added it to the ->dirty list. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-20NFS: Don't clear PG_writeback until after we've processed unstable writesTrond Myklebust
Ensure that we don't release the PG_writeback lock until after the page has either been redirtied, or queued on the nfs_inode 'commit' list. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-20NFS: clean up the unstable write codeTrond Myklebust
Get rid of the inlined #ifdefs. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-17ufs proper handling of zero link caseEvgeniy Dushistov
This patch should fix or partly fix this bug: http://bugzilla.kernel.org/show_bug.cgi?id=8276 The problem is: - if we see "zero link case" during reading inode operation, we call ufs_error(which remount fs readonly), but not "mark" inode as bad (1) - in readonly case we do not fill some data structures, which are used in read and write case (2) - VFS call ufs_delete_inode if link count is zero (3) so (1)->(3)->(2) cause oops, this patch should fix such scenario Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Cc: Jim Paris <jim@jtan.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>