aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2006-06-25[PATCH] ufs: unlock_super without lockEvgeniy Dushistov
ufs_free_blocks function looks now in so way: if (err) goto failed; lock_super(); failed: unlock_super(); So if error happen we'll unlock not locked super. Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25[PATCH] ufs: i_blocks wrong countEvgeniy Dushistov
At now UFS code uses DQUOT_* mechanism, but it also update inode->i_blocks manually, this cause wrong i_blocks value. Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25[PATCH] ufs: little directory lookup optimizationEvgeniy Dushistov
This patch make little optimization of ufs_find_entry like "ext2" does. Save number of page and reuse it again in the next call. Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25[PATCH] ufs: easy debugEvgeniy Dushistov
Currently to turn on debug mode "user" has to edit ~10 files, to turn off he has to do it again. This patch introduce such changes: 1)turn on(off) debug messages via ".config" 2)remove unnecessary duplication of code 3)make "UFSD" macros more similar to function 4)fix some compiler warnings Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25[PATCH] ufs: Unmark CONFIG_UFS_FS_WRITE as BROKENEvgeniy Dushistov
To find new bugs, I suggest revert this patch: http://lkml.org/lkml/2006/1/31/275 in -mm tree. So others can test "write support" of UFS. Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25[PATCH] ufs: not usual amounts of fragments per blockEvgeniy Dushistov
The writing to UFS file system with block/fragment!=8 may cause bogus behaviour. The problem in "ufs_bitmap_search" function, which doesn't work correctly in "block/fragment!=8" case. The idea is stolen from BSD code. Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25[PATCH] ufs: wrong type castEvgeniy Dushistov
There are two ugly macros in ufs code: #define UCPI_UBH ((struct ufs_buffer_head *)ucpi) #define USPI_UBH ((struct ufs_buffer_head *)uspi) when uspi looks like struct { struct ufs_buffer_head ; } and USPI_UBH has some sence, ucpi looks like struct { struct not_ufs_buffer_head; } To prevent bugs in future, this patch convert macros to inline function and fix "ucpi" structure. Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25[PATCH] ufs: directory and page cache: from blocks to pagesEvgeniy Dushistov
Change function in fs/ufs/dir.c and fs/ufs/namei.c to work with pages instead of straight work with blocks. It fixed such bugs: * for i in `seq 1 1000`; do touch $i; done - crash system * mkdir create directory without "." and ".." entries Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25[PATCH] ufs: directory and page cache: install aopsEvgeniy Dushistov
This series of patches finished "bugs fixing" mentioned here http://lkml.org/lkml/2006/1/31/275 . The main bugs: * for i in `seq 1 1000`; do touch $i; done - crash system * mkdir create directory without "." and ".." entries The suggested solution is work with page cache instead of straight work with blocks. Such solution has following advantages * reduce code size and its complexity * some global locks go away * fix bugs The most part of code is stolen from ext2, because of it has similar directory structure. Patches testes with UFS1 and UFS2 file systems. This patch installs i_mapping->a_ops for directory inodes and removes some duplicated code. Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25[PATCH] ufs: change block number on the flyEvgeniy Dushistov
First of all some necessary notes about UFS by it self: To avoid waste of disk space the tail of file consists not from blocks (which is ordinary big enough, 16K usually), it consists from fragments(which is ordinary 2K). When file is growing its tail occupy 1 fragment, 2 fragments... At some stage decision to allocate whole block is made and all fragments are moved to one block. How this situation was handled before: ufs_prepare_write ->block_prepare_write ->ufs_getfrag_block ->... ->ufs_new_fragments: bh = sb_bread bh->b_blocknr = result + i; mark_buffer_dirty (bh); This is wrong solution, because: - it didn't take into consideration that there is another cache: "inode page cache" - because of sb_getblk uses not b_blocknr, (it uses page->index) to find certain block, this breaks sb_getblk. How this situation is handled now: we go though all "page inode cache", if there are no such page in cache we load it into cache, and change b_blocknr. Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25[PATCH] ufs: right block allocationEvgeniy Dushistov
* After block allocation, we map it on the same "address" as 8 others blocks * We nullify block several times: once in ufs/block.c and once in block_*write_full_page, and use different "caches" for this. Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25[PATCH] ufs: ufs_trunc_indirect: infinite cycleEvgeniy Dushistov
Currently, ufs write support have two sets of problems: work with files and work with directories. This series of patches should solve the first problem. This patch is similar to http://lkml.org/lkml/2006/1/17/61 this patch complements it. The situation the same: in ufs_trunc_(not direct), we read block, check if count of links to it is equal to one, if so we finish cycle, if not continue. Because of "count of links" always >=2 this operation cause infinite cycle and hang up the kernel. Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25[PATCH] fs/freevxfs: cleanup of spelling errorsCliff Wickman
Fix of some spelling errors in fs/freevxfs error messages and comments Signed-off-by: Cliff Wickman <cpw@sgi.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] splice: retrieve mapping after locking the pageJens Axboe
Otherwise we could be racing with truncate/mapping removal. Problem found/fixed by Nick Piggin <npiggin@suse.de>, logic rewritten by me. Signed-off-by: Jens Axboe <axboe@suse.de>
2006-06-23[PATCH] Kill PF_SYNCWRITE flagJens Axboe
A process flag to indicate whether we are doing sync io is incredibly ugly. It also causes performance problems when one does a lot of async io and then proceeds to sync it. Part of the io will go out as async, and the other part as sync. This causes a disconnect between the previously submitted io and the synced io. For io schedulers such as CFQ, this will cause us lost merges and suboptimal behaviour in scheduling. Remove PF_SYNCWRITE completely from the fsync/msync paths, and let the O_DIRECT path just directly indicate that the writes are sync by using WRITE_SYNC instead. Signed-off-by: Jens Axboe <axboe@suse.de>
2006-06-23[PATCH] make kernel warn about incorrectly sized partitionsMike Miller
Sometimes partitions claim to be larger than the reported capacity of a disk device. This patch makes the kernel warn about those partitions. We still permit these patitions to be used. Quoting Andries Brouwer <Andries.Brouwer@cwi.nl>: Case 1: The kernel is mistaken about the size of the disk. (There are commands to clip a disk to a certain capacity, there are jumpers to tell a disk that it should report a certain capacity etc. Usually this is because of BIOS bugs. In bad cases the machine will crash in the BIOS and hence fail to boot if the disk reports full capacity.) In such cases actually accessing the blocks of the partition may work fine, or may work fine after running an unclip utility. I wrote "setmax" some years ago precisely for this reason. Case 2: There was a messy partition table (maybe just a rounding error) but the actual filesystem on the partition is contained in the physical disk. Now using the filesystem goes without problem. Case 3: Both partition and filesystem extend beyond the end of the disk. In forensic or debugging situations one often uses a copy of the start of a disk. Now access beyond the end gives an expected I/O error. Signed-off-by: Mike Miller <mike.miller@hp.com> Signed-off-by: Stephen Cameron <steve.cameron@hp.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] JBD: split checkpoint listsJan Kara
Split the checkpoint list of the transaction into two lists. In the first list we keep the buffers that need to be submitted for IO. In the second list are kept buffers that were already submitted and we just have to wait for the IO to complete. This should simplify a handling of checkpoint lists a bit and can eventually be also a performance gain. Signed-off-by: Jan Kara <jack@suse.cz> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: "Stephen C. Tweedie" <sct@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] list: use list_replace_init() instead of list_splice_init()Oleg Nesterov
list_splice_init(list, head) does unneeded job if it is known that list_empty(head) == 1. We can use list_replace_init() instead. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] percpu counter data type changes to suppport more than 2**31 ext3 ↵Mingming Cao
free blocks counter The percpu counter data type are changed in this set of patches to support more users like ext3 who need more than 32 bit to store the free blocks total in the filesystem. - Generic perpcu counters data type changes. The size of the global counter and local counter were explictly specified using s64 and s32. The global counter is changed from long to s64, while the local counter is changed from long to s32, so we could avoid doing 64 bit update in most cases. - Users of the percpu counters are updated to make use of the new percpu_counter_init() routine now taking an additional parameter to allow users to pass the initial value of the global counter. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] binflt_elf: remove more castsJesper Juhl
Remove redundant casts from NEW_AUX_ENT() arguments in fs/binfmt_elf.c Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] binfmt_elf: CodingStyle cleanup and remove some pointless castsJesper Juhl
Do a CodingStyle cleanup of fs/binfmt_elf.c and also remove some pointless casts of kmalloc() return values in the same file. Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] ext3_clear_inode(): avoid kfree(NULL)Andrew Morton
Steven Rostedt <rostedt@goodmis.org> points out that `rsv' here is usually NULL, so we should avoid calling kfree(). Also, fix up some nearby whitespace damage. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] jbd: avoid kfree(NULL)Andrew Morton
There are a couple of places where JBD has to check to see whether an unneeded memory allocation was performed. Usually it _was_ needed, so we end up calling kfree(NULL). We can micro-optimise that by checking the pointer before calling kfree(). Thanks to Steven Rostedt <rostedt@goodmis.org> for identifying this. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] jbd: fix BUG in journal_commit_transaction()Jan Kara
Fix possible assertion failure in journal_commit_transaction() on jh->b_next_transaction == NULL (when we are processing BJ_Forget list and buffer is not jbddirty). !jbddirty buffers can be placed on BJ_Forget list for example by journal_forget() or by __dispose_buffer() - generally such buffer means that it has been freed by this transaction. Freed buffers should not be reallocated until the transaction has committed (that's why we have the assertion there) but they *can* be reallocated when the transaction has already been committed to disk and we are just processing the BJ_Forget list (as soon as we remove b_committed_data from the bitmap bh, ext3 will be able to reallocate buffers freed by the committing transaction). So we have to also count with the case that the buffer has been reallocated and b_next_transaction has been already set. And one more subtle point: it can happen that we manage to reallocate the buffer and also mark it jbddirty. Then we also add the freed buffer to the checkpoint list of the committing trasaction. But that should do no harm. Non-jbddirty buffers should be filed to BJ_Reserved and not BJ_Metadata list. It can actually happen that we refile such buffers during the commit phase when we reallocate in the running transaction blocks deleted in committing transaction (and that can happen if the committing transaction already wrote all the data and is just cleaning up BJ_Forget list). Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: "Stephen C. Tweedie" <sct@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] Poll cleanups/microoptimizationsVadim Lobanov
The "count" and "pt" variables are declared and modified by do_poll(), as well as accessed and written indirectly in the do_pollfd() subroutine. This patch pulls all handling of these variables into the do_poll() function, thereby eliminating the odd use of indirection in do_pollfd(). This is done by pulling the "struct pollfd" traversal loop from do_pollfd() into its only caller do_poll(). As an added bonus, the patch saves a few clock cycles, and also adds comments to make the code easier to follow. Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] fs/fat/misc.c: unexport fat_sync_bhsAdrian Bunk
Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] fs/locks.c: make posix_locks_deadlock() staticAdrian Bunk
We can now make posix_locks_deadlock() static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] vfs: add lock owner argument to flush operationMiklos Szeredi
Pass the POSIX lock owner ID to the flush operation. This is useful for filesystems which don't want to store any locking state in inode->i_flock but want to handle locking/unlocking POSIX locks internally. FUSE is one such filesystem but I think it possible that some network filesystems would need this also. Also add a flag to indicate that a POSIX locking request was generated by close(), so filesystems using the above feature won't send an extra locking request in this case. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] locks: clean up locks_remove_posix()Miklos Szeredi
locks_remove_posix() can use posix_lock_file() instead of doing the lock removal by hand. posix_lock_file() now does exacly the same. The comment about pids no longer applies, posix_lock_file() takes only the owner into account. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] locks: don't do unnecessary allocationsMiklos Szeredi
posix_lock_file() always allocates new locks in advance, even if it's easy to determine that no allocations will be needed. Optimize these cases: - FL_ACCESS flag is set - Unlocking the whole range Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] locks: don't unnecessarily fail posix lock operationsMiklos Szeredi
posix_lock_file() was too cautious, failing operations on OOM, even if they didn't actually require an allocation. This has the disadvantage, that a failing unlock on process exit could lead to a memory leak. There are two possibilites for this: - filesystem implements .lock() and calls back to posix_lock_file(). On cleanup of files_struct locks_remove_posix() is called which should remove all locks belonging to files_struct. However if filesystem calls posix_lock_file() which fails, then those locks will never be freed. - if a file is closed while a lock is blocked, then after acquiring fcntl_setlk() will undo the lock. But this unlock itself might fail on OOM, again possibly leaking the lock. The solution is to move the checking of the allocations until after it is sure that they will be needed. This will solve the above problem since unlock will always succeed unless it splits an existing region. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] read_mapping_page for address spacePekka Enberg
Add read_mapping_page() which is used for callers that pass mapping->a_ops->readpage as the filler for read_cache_page. This removes some duplication from filesystem code. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] ext2 XIP won't build without MMUAl Viro
Disable Ext2 XIP if the kernel is configured in no-MMU mode as the former won't build. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] frv: binfmt_elf_fdpic __user annotationsAl Viro
Add __user annotations to binfmt_elf_fdpic. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] lsm: add task_setioprio hookJames Morris
Implement an LSM hook for setting a task's IO priority, similar to the hook for setting a tasks's nice value. A previous version of this LSM hook was included in an older version of multiadm by Jan Engelhardt, although I don't recall it being submitted upstream. Also included is the corresponding SELinux hook, which re-uses the setsched permission in the proccess class. Signed-off-by: James Morris <jmorris@namei.org> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Cc: Jan Engelhardt <jengelh@linux01.gwdg.de> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Jens Axboe <axboe@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] writeback: fix range handlingOGAWA Hirofumi
When a writeback_control's `start' and `end' fields are used to indicate a one-byte-range starting at file offset zero, the required values of .start=0,.end=0 mean that the ->writepages() implementation has no way of telling that it is being asked to perform a range request. Because we're currently overloading (start == 0 && end == 0) to mean "this is not a write-a-range request". To make all this sane, the patch changes range of writeback_control. So caller does: If it is calling ->writepages() to write pages, it sets range (range_start/end or range_cyclic) always. And if range_cyclic is true, ->writepages() thinks the range is cyclic, otherwise it just uses range_start and range_end. This patch does, - Add LLONG_MAX, LLONG_MIN, ULLONG_MAX to include/linux/kernel.h -1 is usually ok for range_end (type is long long). But, if someone did, range_end += val; range_end is "val - 1" u64val = range_end >> bits; u64val is "~(0ULL)" or something, they are wrong. So, this adds LLONG_MAX to avoid nasty things, and uses LLONG_MAX for range_end. - All callers of ->writepages() sets range_start/end or range_cyclic. - Fix updates of ->writeback_index. It seems already bit strange. If it starts at 0 and ended by check of nr_to_write, this last index may reduce chance to scan end of file. So, this updates ->writeback_index only if range_cyclic is true or whole-file is scanned. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Nathan Scott <nathans@sgi.com> Cc: Anton Altaparmakov <aia21@cantab.net> Cc: Steven French <sfrench@us.ibm.com> Cc: "Vladimir V. Saveliev" <vs@namesys.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] tightening hugetlb strict accountingChen, Kenneth W
Current hugetlb strict accounting for shared mapping always assume mapping starts at zero file offset and reserves pages between zero and size of the file. This assumption often reserves (or lock down) a lot more pages then necessary if application maps at none zero file offset. libhugetlbfs is one example that requires proper reservation on shared mapping starts at none zero offset. This patch extends the reservation and hugetlb strict accounting to support any arbitrary pair of (offset, len), resulting a much more robust and accurate scheme. More importantly, it won't lock down any hugetlb pages outside file mapping. Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Acked-by: Adam Litke <agl@us.ibm.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: William Lee Irwin III <wli@holomorphy.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] for_each_possible_cpu: xfsKAMEZAWA Hiroyuki
for_each_cpu() actually iterates across all possible CPUs. We've had mistakes in the past where people were using for_each_cpu() where they should have been iterating across only online or present CPUs. This is inefficient and possibly buggy. We're renaming for_each_cpu() to for_each_possible_cpu() to avoid this in the future. This patch replaces for_each_cpu with for_each_possible_cpu. in xfs. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] XFS: Use the dentry passed to statfs() to limit the scope of the resultsDavid Howells
Enable XFS to limit the statfs() results to the project quota covering the dentry used as a base for call. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] VFS: Permit filesystem to perform statfs with a known root dentryDavid Howells
Give the statfs superblock operation a dentry pointer rather than a superblock pointer. This complements the get_sb() patch. That reduced the significance of sb->s_root, allowing NFS to place a fake root there. However, NFS does require a dentry to use as a target for the statfs operation. This permits the root in the vfsmount to be used instead. linux/mount.h has been added where necessary to make allyesconfig build successfully. Interest has also been expressed for use with the FUSE and XFS filesystems. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Nathan Scott <nathans@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] VFS: Permit filesystem to override root dentry on mountDavid Howells
Extend the get_sb() filesystem operation to take an extra argument that permits the VFS to pass in the target vfsmount that defines the mountpoint. The filesystem is then required to manually set the superblock and root dentry pointers. For most filesystems, this should be done with simple_set_mnt() which will set the superblock pointer and then set the root dentry to the superblock's s_root (as per the old default behaviour). The get_sb() op now returns an integer as there's now no need to return the superblock pointer. This patch permits a superblock to be implicitly shared amongst several mount points, such as can be done with NFS to avoid potential inode aliasing. In such a case, simple_set_mnt() would not be called, and instead the mnt_root and mnt_sb would be set directly. The patch also makes the following changes: (*) the get_sb_*() convenience functions in the core kernel now take a vfsmount pointer argument and return an integer, so most filesystems have to change very little. (*) If one of the convenience function is not used, then get_sb() should normally call simple_set_mnt() to instantiate the vfsmount. This will always return 0, and so can be tail-called from get_sb(). (*) generic_shutdown_super() now calls shrink_dcache_sb() to clean up the dcache upon superblock destruction rather than shrink_dcache_anon(). This is required because the superblock may now have multiple trees that aren't actually bound to s_root, but that still need to be cleaned up. The currently called functions assume that the whole tree is rooted at s_root, and that anonymous dentries are not the roots of trees which results in dentries being left unculled. However, with the way NFS superblock sharing are currently set to be implemented, these assumptions are violated: the root of the filesystem is simply a dummy dentry and inode (the real inode for '/' may well be inaccessible), and all the vfsmounts are rooted on anonymous[*] dentries with child trees. [*] Anonymous until discovered from another tree. (*) The documentation has been adjusted, including the additional bit of changing ext2_* into foo_* in the documentation. [akpm@osdl.org: convert ipath_fs, do other stuff] Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Nathan Scott <nathans@sgi.com> Cc: Roland Dreier <rolandd@cisco.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-22[PATCH] prune_one_dentry() tweaksAndrew Morton
- Add description of d_lock handling to comments over prune_one_dentry(). - It has three callsites - uninline it, saving 200 bytes of text. Cc: Jan Blunck <jblunck@suse.de> Cc: Kirill Korotaev <dev@openvz.org> Cc: Olaf Hering <olh@suse.de> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-22[PATCH] Fix dcache race during umountNeilBrown
The race is that the shrink_dcache_memory shrinker could get called while a filesystem is being unmounted, and could try to prune a dentry belonging to that filesystem. If it does, then it will call in to iput on the inode while the dentry is no longer able to be found by the umounting process. If iput takes a while, generic_shutdown_super could get all the way though shrink_dcache_parent and shrink_dcache_anon and invalidate_inodes without ever waiting on this particular inode. Eventually the superblock gets freed anyway and if the iput tried to touch it (which some filesystems certainly do), it will lose. The promised "Self-destruct in 5 seconds" doesn't lead to a nice day. The race is closed by holding s_umount while calling prune_one_dentry on someone else's dentry. As a down_read_trylock is used, shrink_dcache_memory will no longer try to prune the dentry of a filesystem that is being unmounted, and unmount will not be able to start until any such active prune_one_dentry completes. This requires that prune_dcache *knows* which filesystem (if any) it is doing the prune on behalf of so that it can be careful of other filesystems. shrink_dcache_memory isn't called it on behalf of any filesystem, and so is careful of everything. shrink_dcache_anon is now passed a super_block rather than the s_anon list out of the superblock, so it can get the s_anon list itself, and can pass the superblock down to prune_dcache. If prune_dcache finds a dentry that it cannot free, it leaves it where it is (at the tail of the list) and exits, on the assumption that some other thread will be removing that dentry soon. To try to make sure that some work gets done, a limited number of dnetries which are untouchable are skipped over while choosing the dentry to work on. I believe this race was first found by Kirill Korotaev. Cc: Jan Blunck <jblunck@suse.de> Acked-by: Kirill Korotaev <dev@openvz.org> Cc: Olaf Hering <olh@suse.de> Acked-by: Balbir Singh <balbir@in.ibm.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Balbir Singh <balbir@in.ibm.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-22[PATCH] remove steal_locks()Miklos Szeredi
This patch removes the steal_locks() function. steal_locks() doesn't work correctly with any filesystem that does it's own lock management, including NFS, CIFS, etc. In addition it has weird semantics on local filesystems in case tasks sharing file-descriptor tables are doing POSIX locking operations in parallel to execve(). The steal_locks() function has an effect on applications doing: clone(CLONE_FILES) /* in child */ lock execve lock POSIX locks acquired before execve (by "child", "parent" or any further task sharing files_struct) will after the execve be owned exclusively by "child". According to Chris Wright some LSB/LTP kind of suite triggers without the stealing behavior, but there's no known real-world application that would also fail. Apps using NPTL are not affected, since all other threads are killed before execve. Apps using LinuxThreads are only affected if they - have multiple threads during exec (LinuxThreads doesn't kill other threads, the app may do it with pthread_kill_other_threads_np()) - rely on POSIX locks being inherited across exec Both conditions are documented, but not their interaction. Apps using clone() natively are affected if they - use clone(CLONE_FILES) - rely on POSIX locks being inherited across exec The above scenarios are unlikely, but possible. If the patch is vetoed, there's a plan B, that involves mostly keeping the weird stealing semantics, but changing the way lock ownership is handled so that network and local filesystems work consistently. That would add more complexity though, so this solution seems to be preferred by most people. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Matthew Wilcox <willy@debian.org> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Steven French <sfrench@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-22[PATCH] Fix a race condition between ->i_mapping and iput()OGAWA Hirofumi
This race became a cause of oops, and can reproduce by the following. while true; do dd if=/dev/zero of=/dev/.static/dev/hdg1 bs=512 count=1000 & sync done This race condition was between __sync_single_inode() and iput(). cpu0 (fs's inode) cpu1 (bdev's inode) ----------------- ------------------- close("/dev/hda2") [...] __sync_single_inode() /* copy the bdev's ->i_mapping */ mapping = inode->i_mapping; generic_forget_inode() bdev_clear_inode() /* restre the fs's ->i_mapping */ inode->i_mapping = &inode->i_data; /* bdev's inode was freed */ destroy_inode(inode); if (wait) { /* dereference a freed bdev's mapping->host */ filemap_fdatawait(mapping); /* Oops */ Since __sync_single_inode() is only taking a ref-count of fs's inode, the another process can be close() and freeing the bdev's inode while writing fs's inode. So, __sync_signle_inode() accesses the freed ->i_mapping, oops. This patch takes a ref-count on the bdev's inode for the fs's inode before setting a ->i_mapping, and the clear_inode() of the fs's inode does iput() on the bdev's inode. So if the fs's inode is still living, bdev's inode shouldn't be freed. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-22[PATCH] NTFS: Critical bug fix (affects MIPS and possibly others)Anton Altaparmakov
Many thanks to Pauline Ng for the detailed bug report and analysis! Signed-off-by: Anton Altaparmakov <aia21@cantab.net> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-21Merge git://oss.sgi.com:8090/xfs-2.6Linus Torvalds
* git://oss.sgi.com:8090/xfs-2.6: (43 commits) [XFS] Remove files from the build that are now unused. [XFS] Fix a Makefile issue related to exports.o handling. [XFS] Remove version 1 directory code. Never functioned on Linux, just [XFS] Map EFSCORRUPTED to an actual error code, not just a made up one [XFS] Kill direct access to ->count in valusema(); all we ever use it for [XFS] Remove unneeded conditional code on NFS export interface related [XFS] Remove an incorrect use of unlikely() on a relatively likely code [XFS] Push some common code out of write path into core XFS code for [XFS] Remove unnecessary local from open_exec dmapi path. [XFS] Minor XFS documentation updates. [XFS] Fix broken const use inside local suffix_strtoul routine. [XFS] Fix nused counter. It's currently getting set to -1 rather than [XFS] Fix mismerge of the fs_writable cleanup patch causing a freeze/thaw [XFS] Fix up debug code so that bulkstat wont generate thousands of [XFS] Remove unused parameter from di2xflags routine. [XFS] Cleanup a missed porting conversion, and freezing. [XFS] Resolve a namespace collision on remaining vtypes for FreeBSD [XFS] Resolve a namespace collision on vnode/vnodeops for FreeBSD porters. [XFS] Resolve a namespace collision on vfs/vfsops for FreeBSD porters. [XFS] statvfs component of directory/project quota support, code ...
2006-06-21[PATCH] Driver core: add generic "subsystem" link to all devicesKay Sievers
Like the SUBSYTEM= key we find in the environment of the uevent, this creates a generic "subsystem" link in sysfs for every device. Userspace usually doesn't care at all if its a "class" or a "bus" device. This provides an unified way to determine the subsytem of a device, regardless of the way the driver core has created it. Signed-off-by: Kay Sievers <kay.sievers@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-06-20Merge branch 'audit.b21' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current * 'audit.b21' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current: (25 commits) [PATCH] make set_loginuid obey audit_enabled [PATCH] log more info for directory entry change events [PATCH] fix AUDIT_FILTER_PREPEND handling [PATCH] validate rule fields' types [PATCH] audit: path-based rules [PATCH] Audit of POSIX Message Queue Syscalls v.2 [PATCH] fix se_sen audit filter [PATCH] deprecate AUDIT_POSSBILE [PATCH] inline more audit helpers [PATCH] proc_loginuid_write() uses simple_strtoul() on non-terminated array [PATCH] update of IPC audit record cleanup [PATCH] minor audit updates [PATCH] fix audit_krule_to_{rule,data} return values [PATCH] add filtering by ppid [PATCH] log ppid [PATCH] collect sid of those who send signals to auditd [PATCH] execve argument logging [PATCH] fix deadlocks in AUDIT_LIST/AUDIT_LIST_RULES [PATCH] audit_panic() is audit-internal [PATCH] inotify (5/5): update kernel documentation ... Manual fixup of conflict in unclude/linux/inotify.h
2006-06-20Merge git://git.infradead.org/~dwmw2/rbtree-2.6Linus Torvalds
* git://git.infradead.org/~dwmw2/rbtree-2.6: [RBTREE] Switch rb_colour() et al to en_US spelling of 'color' for consistency Update UML kernel/physmem.c to use rb_parent() accessor macro [RBTREE] Update hrtimers to use rb_parent() accessor macro. [RBTREE] Add explicit alignment to sizeof(long) for struct rb_node. [RBTREE] Merge colour and parent fields of struct rb_node. [RBTREE] Remove dead code in rb_erase() [RBTREE] Update JFFS2 to use rb_parent() accessor macro. [RBTREE] Update eventpoll.c to use rb_parent() accessor macro. [RBTREE] Update key.c to use rb_parent() accessor macro. [RBTREE] Update ext3 to use rb_parent() accessor macro. [RBTREE] Change rbtree off-tree marking in I/O schedulers. [RBTREE] Add accessor macros for colour and parent fields of rb_node