aboutsummaryrefslogtreecommitdiff
path: root/fs/dcache.c
AgeCommit message (Collapse)Author
2006-06-23[PATCH] VFS: Permit filesystem to override root dentry on mountDavid Howells
Extend the get_sb() filesystem operation to take an extra argument that permits the VFS to pass in the target vfsmount that defines the mountpoint. The filesystem is then required to manually set the superblock and root dentry pointers. For most filesystems, this should be done with simple_set_mnt() which will set the superblock pointer and then set the root dentry to the superblock's s_root (as per the old default behaviour). The get_sb() op now returns an integer as there's now no need to return the superblock pointer. This patch permits a superblock to be implicitly shared amongst several mount points, such as can be done with NFS to avoid potential inode aliasing. In such a case, simple_set_mnt() would not be called, and instead the mnt_root and mnt_sb would be set directly. The patch also makes the following changes: (*) the get_sb_*() convenience functions in the core kernel now take a vfsmount pointer argument and return an integer, so most filesystems have to change very little. (*) If one of the convenience function is not used, then get_sb() should normally call simple_set_mnt() to instantiate the vfsmount. This will always return 0, and so can be tail-called from get_sb(). (*) generic_shutdown_super() now calls shrink_dcache_sb() to clean up the dcache upon superblock destruction rather than shrink_dcache_anon(). This is required because the superblock may now have multiple trees that aren't actually bound to s_root, but that still need to be cleaned up. The currently called functions assume that the whole tree is rooted at s_root, and that anonymous dentries are not the roots of trees which results in dentries being left unculled. However, with the way NFS superblock sharing are currently set to be implemented, these assumptions are violated: the root of the filesystem is simply a dummy dentry and inode (the real inode for '/' may well be inaccessible), and all the vfsmounts are rooted on anonymous[*] dentries with child trees. [*] Anonymous until discovered from another tree. (*) The documentation has been adjusted, including the additional bit of changing ext2_* into foo_* in the documentation. [akpm@osdl.org: convert ipath_fs, do other stuff] Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Nathan Scott <nathans@sgi.com> Cc: Roland Dreier <rolandd@cisco.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-22[PATCH] prune_one_dentry() tweaksAndrew Morton
- Add description of d_lock handling to comments over prune_one_dentry(). - It has three callsites - uninline it, saving 200 bytes of text. Cc: Jan Blunck <jblunck@suse.de> Cc: Kirill Korotaev <dev@openvz.org> Cc: Olaf Hering <olh@suse.de> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-22[PATCH] Fix dcache race during umountNeilBrown
The race is that the shrink_dcache_memory shrinker could get called while a filesystem is being unmounted, and could try to prune a dentry belonging to that filesystem. If it does, then it will call in to iput on the inode while the dentry is no longer able to be found by the umounting process. If iput takes a while, generic_shutdown_super could get all the way though shrink_dcache_parent and shrink_dcache_anon and invalidate_inodes without ever waiting on this particular inode. Eventually the superblock gets freed anyway and if the iput tried to touch it (which some filesystems certainly do), it will lose. The promised "Self-destruct in 5 seconds" doesn't lead to a nice day. The race is closed by holding s_umount while calling prune_one_dentry on someone else's dentry. As a down_read_trylock is used, shrink_dcache_memory will no longer try to prune the dentry of a filesystem that is being unmounted, and unmount will not be able to start until any such active prune_one_dentry completes. This requires that prune_dcache *knows* which filesystem (if any) it is doing the prune on behalf of so that it can be careful of other filesystems. shrink_dcache_memory isn't called it on behalf of any filesystem, and so is careful of everything. shrink_dcache_anon is now passed a super_block rather than the s_anon list out of the superblock, so it can get the s_anon list itself, and can pass the superblock down to prune_dcache. If prune_dcache finds a dentry that it cannot free, it leaves it where it is (at the tail of the list) and exits, on the assumption that some other thread will be removing that dentry soon. To try to make sure that some work gets done, a limited number of dnetries which are untouchable are skipped over while choosing the dentry to work on. I believe this race was first found by Kirill Korotaev. Cc: Jan Blunck <jblunck@suse.de> Acked-by: Kirill Korotaev <dev@openvz.org> Cc: Olaf Hering <olh@suse.de> Acked-by: Balbir Singh <balbir@in.ibm.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Balbir Singh <balbir@in.ibm.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-31[PATCH] dcache: Add helper d_hash_and_lookupEric W. Biederman
It is very common to hash a dentry and then to call lookup. If we take fs specific hash functions into account the full hash logic can get ugly. Further full_name_hash as an inline function is almost 100 bytes on x86 so having a non-inline choice in some cases can measurably decrease code size. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-31[PATCH] inotify: IN_DELETE events missingAmy Griffis
IN_DELETE events are no longer generated for the removal of a file from a watched directory. This seems to be a result of clearing DCACHE_INOTIFY_PARENT_WATCHED in d_delete() directly before calling fsnotify_nameremove(). Assuming the flag doesn't need to be cleared before dentry_iput(), this should do the trick. Signed-off-by: Amy Griffis <amy.griffis@hp.com> Cc: John McCutchan <ttb@tentacle.dhs.org> Acked-by: Robert Love <rml@novell.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivialLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: drivers/char/ftape/lowlevel/fdc-io.c: Correct a comment Kconfig help: MTD_JEDECPROBE already supports Intel Remove ugly debugging stuff do_mounts.c: Minor ROOT_DEV comment cleanup BUG_ON() Conversion in drivers/s390/block/dasd_devmap.c BUG_ON() Conversion in mm/mempool.c BUG_ON() Conversion in mm/memory.c BUG_ON() Conversion in kernel/fork.c BUG_ON() Conversion in ipc/sem.c BUG_ON() Conversion in fs/ext2/ BUG_ON() Conversion in fs/hfs/ BUG_ON() Conversion in fs/dcache.c BUG_ON() Conversion in fs/buffer.c BUG_ON() Conversion in input/serio/hp_sdc_mlc.c BUG_ON() Conversion in md/dm-table.c BUG_ON() Conversion in md/dm-path-selector.c BUG_ON() Conversion in drivers/isdn BUG_ON() Conversion in drivers/char BUG_ON() Conversion in drivers/mtd/
2006-03-26Remove ugly debugging stuffArtem B. Bityuckiy
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-03-26[PATCH] Use __read_mostly on some hot fs variablesEric Dumazet
I discovered on oprofile hunting on a SMP platform that dentry lookups were slowed down because d_hash_mask, d_hash_shift and dentry_hashtable were in a cache line that contained inodes_stat. So each time inodes_stats is changed by a cpu, other cpus have to refill their cache line. This patch moves some variables to the __read_mostly section, in order to avoid false sharing. RCU dentry lookups can go full speed. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26BUG_ON() Conversion in fs/dcache.cEric Sesterhenn
this changes if() BUG(); constructs to BUG_ON() which is cleaner and can better optimized away Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-03-25[PATCH] Reduce sched latency in shrink_dcache_sb()Kirill Korotaev
This patch reduces scheduling latency in shrink_dcache_sb() noticed during remounting of big partitions with many cached dentries. The same latency fix was applied to select_parent() long ago. Signed-off-by: Denis Lunev <den@sw.ru> Signed-off-by: Pavel Emelianov <xemul@sw.ru> Signed-off-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25[PATCH] inotify: lock avoidance with parent watch status in dentryNick Piggin
Previous inotify work avoidance is good when inotify is completely unused, but it breaks down if even a single watch is in place anywhere in the system. Robin Holt notices that udev is one such culprit - it slows down a 512-thread application on a 512 CPU system from 6 seconds to 22 minutes. Solve this by adding a flag in the dentry that tells inotify whether or not its parent inode has a watch on it. Event queueing to parent will skip taking locks if this flag is cleared. Setting and clearing of this flag on all child dentries versus event delivery: this is no in terms of race cases, and that was shown to be equivalent to always performing the check. The essential behaviour is that activity occuring _after_ a watch has been added and _before_ it has been removed, will generate events. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Robert Love <rml@novell.com> Cc: John McCutchan <ttb@tentacle.dhs.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25[PATCH] Optimise d_find_alias()David Howells
The attached patch optimises d_find_alias() to only take the spinlock if there's anything in the the inode's alias list. If there isn't, it returns NULL immediately. With respect to the superblock sharing patch, this should reduce by one the number of times the dcache_lock is taken by nfs_lookup() for ordinary directory lookups. Only in the case where there's already a dentry for particular directory inode (such as might happen when another mountpoint is rooted at that dentry) will the lock then be taken the extra time. Signed-off-by: David Howells <dhowells@redhat.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-24[PATCH] cpuset memory spread slab cache hooksPaul Jackson
Change the kmem_cache_create calls for certain slab caches to support cpuset memory spreading. See the previous patches, cpuset_mem_spread, for an explanation of cpuset memory spreading, and cpuset_mem_spread_slab_cache for the slab cache support for memory spreading. The slab caches marked for now are: dentry_cache, inode_cache, some xfs slab caches, and buffer_head. This list may change over time. In particular, other file system types that are used extensively on large NUMA systems may want to allow for spreading their directory and inode slab cache entries. Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-08[PATCH] fix file countingDipankar Sarma
I have benchmarked this on an x86_64 NUMA system and see no significant performance difference on kernbench. Tested on both x86_64 and powerpc. The way we do file struct accounting is not very suitable for batched freeing. For scalability reasons, file accounting was constructor/destructor based. This meant that nr_files was decremented only when the object was removed from the slab cache. This is susceptible to slab fragmentation. With RCU based file structure, consequent batched freeing and a test program like Serge's, we just speed this up and end up with a very fragmented slab - llm22:~ # cat /proc/sys/fs/file-nr 587730 0 758844 At the same time, I see only a 2000+ objects in filp cache. The following patch I fixes this problem. This patch changes the file counting by removing the filp_count_lock. Instead we use a separate percpu counter, nr_files, for now and all accesses to it are through get_nr_files() api. In the sysctl handler for nr_files, we populate files_stat.nr_files before returning to user. Counting files as an when they are created and destroyed (as opposed to inside slab) allows us to correctly count open files with RCU. Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-03[PATCH] make "struct d_cookie" depend on CONFIG_PROFILINGMarcelo Tosatti
Shrinks "struct dentry" from 128 bytes to 124 on x86, allowing 31 objects per slab instead of 30. Cc: John Levon <levon@movementarian.org> Cc: Philippe Elie <phil.el@wanadoo.fr> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-14[PATCH] Unlinline a bunch of other functionsArjan van de Ven
Remove the "inline" keyword from a bunch of big functions in the kernel with the goal of shrinking it by 30kb to 40kb Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Jeff Garzik <jgarzik@pobox.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-10[PATCH] d_instantiate_unique / NFS inode leakageOleg Drokin
If we have found aliased dentry that we return, inode reference is not dropped and inode is not attached anywhere, so it seems the reference to inode is leaked in that case. Cc: Trond Myklebust <trond.myklebust@fys.uio.no>, Cc: <viro@parcelfarce.linux.theplanet.co.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08[PATCH] shrink dentry structEric Dumazet
Some long time ago, dentry struct was carefully tuned so that on 32 bits UP, sizeof(struct dentry) was exactly 128, ie a power of 2, and a multiple of memory cache lines. Then RCU was added and dentry struct enlarged by two pointers, with nice results for SMP, but not so good on UP, because breaking the above tuning (128 + 8 = 136 bytes) This patch reverts this unwanted side effect, by using an union (d_u), where d_rcu and d_child are placed so that these two fields can share their memory needs. At the time d_free() is called (and d_rcu is really used), d_child is known to be empty and not touched by the dentry freeing. Lockless lookups only access d_name, d_parent, d_lock, d_op, d_flags (so the previous content of d_child is not needed if said dentry was unhashed but still accessed by a CPU because of RCU constraints) As dentry cache easily contains millions of entries, a size reduction is worth the extra complexity of the ugly C union. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Cc: Dipankar Sarma <dipankar@in.ibm.com> Cc: Maneesh Soni <maneesh@in.ibm.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: Ian Kent <raven@themaw.net> Cc: Paul Jackson <pj@sgi.com> Cc: Al Viro <viro@ftp.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Neil Brown <neilb@cse.unsw.edu.au> Cc: James Morris <jmorris@namei.org> Cc: Stephen Smalley <sds@epoch.ncsc.mil> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-07[PATCH] Remove hlist_for_each_rcu() API, convert existing use to ↵Paul E. McKenney
hlist_for_each_entry_rcu Remove the hlist_for_each_rcu() API, which is used only in one place, and is trivially converted to hlist_for_each_entry_rcu(), making the code shorter and more readable. Any out-of-tree uses may be similarly converted. Signed-off-by: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-28[PATCH] gfp_t: fs/*Al Viro
- ->releasepage() annotated (s/int/gfp_t), instances updated - missing gfp_t in fs/* added - fixed misannotation from the original sweep caught by bitwise checks: XFS used __nocast both for gfp_t and for flags used by XFS allocator. The latter left with unsigned int __nocast; we might want to add a different type for those but for now let's leave them alone. That, BTW, is a case when __nocast use had been actively confusing - it had been used in the same code for two different and similar types, with no way to catch misuses. Switch of gfp_t to bitwise had caught that immediately... One tricky bit is left alone to be dealt with later - mapping->flags is a mix of gfp_t and error indications. Left alone for now. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-19Make fsnotify possibly work better for the inode removal caseLinus Torvalds
Checking i_nlink is dubious, but the alternatives look even less appetizing. Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-10[PATCH] janitor: fs/dcache.c: list_for_each*Domen Puncer
First one is list_for_each_entry (thanks maks), second 2 list_for_each_safe. Signed-off-by: Maximilian Attems <janitor@sternwelten.at> Signed-off-by: Domen Puncer <domen@coderock.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-08[PATCH] fsnotify_name/inoderemoveJohn McCutchan
The patch below unhooks fsnotify from vfs_unlink & vfs_rmdir. It introduces two new fsnotify calls, that are hooked in at the dcache level. This not only more closely matches how the VFS layer works, it also avoids the problem with locking and inode lifetimes. The two functions are - fsnotify_nameremove -- called when a directory entry is going away. It notifies the PARENT of the deletion. This is called from d_delete(). - inoderemove -- called when the files inode itself is going away. It notifies the inode that is being deleted. This is called from dentry_iput(). Signed-off-by: John McCutchan <ttb@tentacle.dhs.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-05[PATCH] make some things staticAdrian Bunk
This patch makes some needlessly global identifiers static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Arjan van de Ven <arjanv@infradead.org> Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-16Linux-2.6.12-rc2Linus Torvalds
Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!