aboutsummaryrefslogtreecommitdiff
path: root/Documentation/filesystems
AgeCommit message (Collapse)Author
2008-11-12DOC: update xip method infoMarco Stornelli
xip documentation updated: - change "get_xip_page" to "get_xip_mem"; - explain changed function parameters Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-06fat: Fix ATTR_RO for directoryOGAWA Hirofumi
FAT has the ATTR_RO (read-only) attribute. But on Windows, the ATTR_RO of the directory will be just ignored actually, and is used by only applications as flag. E.g. it's setted for the customized folder by Explorer. http://msdn2.microsoft.com/en-us/library/aa969337.aspx This adds "rodir" option. If user specified it, ATTR_RO is used as read-only flag even if it's the directory. Otherwise, inode->i_mode is not used to hold ATTR_RO (i.e. fat_mode_can_save_ro() returns 0). Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-06fat: document additional vfat mount optionsBart Trojanowski
While debugging a sync mount regression on vfat I noticed that there were mount options parsed by the driver that were not documented. [hirofumi@mail.parknet.co.jp: fix some parts] Signed-off-by: Bart Trojanowski <bart@jukie.net> Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-30fs: remove prepare_write/commit_writeNick Piggin
Nothing uses prepare_write or commit_write. Remove them from the tree completely. [akpm@linux-foundation.org: schedule simple_prepare_write() for unexporting] Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6Linus Torvalds
* 'linux-next' of git://git.infradead.org/ubifs-2.6: (25 commits) UBIFS: fix ubifs_compress commentary UBIFS: amend printk UBIFS: do not read unnecessary bytes when unpacking bits UBIFS: check buffer length when scanning for LPT nodes UBIFS: correct condition to eliminate unecessary assignment UBIFS: add more debugging messages for LPT UBIFS: fix bulk-read handling uptodate pages UBIFS: improve garbage collection UBIFS: allow for sync_fs when read-only UBIFS: commit on sync_fs UBIFS: correct comment for commit_on_unmount UBIFS: update dbg_dump_inode UBIFS: fix commentary UBIFS: fix races in bit-fields UBIFS: ensure data read beyond i_size is zeroed out correctly UBIFS: correct key comparison UBIFS: use bit-fields when possible UBIFS: check data CRC when in error state UBIFS: improve znode splitting rules UBIFS: add no_chk_data_crc mount option ...
2008-10-20ext3: add an option to control error handling on file dataHidehiro Kawai
If the journal doesn't abort when it gets an IO error in file data blocks, the file data corruption will spread silently. Because most of applications and commands do buffered writes without fsync(), they don't notice the IO error. It's scary for mission critical systems. On the other hand, if the journal aborts whenever it gets an IO error in file data blocks, the system will easily become inoperable. So this patch introduces a filesystem option to determine whether it aborts the journal or just call printk() when it gets an IO error in file data. If you mount a ext3 fs with data_err=abort option, it aborts on file data write error. If you mount it with data_err=ignore, it doesn't abort, just call printk(). data_err=ignore is the default. Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: Jan Kara <jack@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20documentation: clarify dirty_ratio and dirty_background_ratio descriptionAndrea Righi
The current documentation of dirty_ratio and dirty_background_ratio is a bit misleading. In the documentation we say that they are "a percentage of total system memory", but the current page writeback policy, intead, is to apply the percentages to the dirtyable memory, that means free pages + reclaimable pages. Better to be more explicit to clarify this concept. Signed-off-by: Andrea Righi <righi.andrea@gmail.com> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20coredump_filter: add hugepage dumpingKOSAKI Motohiro
Presently hugepage's vma has a VM_RESERVED flag in order not to be swapped. But a VM_RESERVED vma isn't core dumped because this flag is often used for some kernel vmas (e.g. vmalloc, sound related). Thus hugepages are never dumped and it can't be debugged easily. Many developers want hugepages to be included into core-dump. However, We can't read generic VM_RESERVED area because this area is often IO mapping area. then these area reading may change device state. it is definitly undesiable side-effect. So adding a hugepage specific bit to the coredump filter is better. It will be able to hugepage core dumping and doesn't cause any side-effect to any i/o devices. In additional, libhugetlb use hugetlb private mapping pages as anonymous page. Then, hugepage private mapping pages should be core dumped by default. Then, /proc/[pid]/core_dump_filter has two new bits. - bit 5 mean hugetlb private mapping pages are dumped or not. (default: yes) - bit 6 mean hugetlb shared mapping pages are dumped or not. (default: no) I tested by following method. % ulimit -c unlimited % ./crash_hugepage 50 % ./crash_hugepage 50 -p % ls -lh % gdb ./crash_hugepage core % % echo 0x43 > /proc/self/coredump_filter % ./crash_hugepage 50 % ./crash_hugepage 50 -p % ls -lh % gdb ./crash_hugepage core #include <stdlib.h> #include <stdio.h> #include <unistd.h> #include <sys/mman.h> #include <string.h> #include "hugetlbfs.h" int main(int argc, char** argv){ char* p; int ch; int mmap_flags = MAP_SHARED; int fd; int nr_pages; while((ch = getopt(argc, argv, "p")) != -1) { switch (ch) { case 'p': mmap_flags &= ~MAP_SHARED; mmap_flags |= MAP_PRIVATE; break; default: /* nothing*/ break; } } argc -= optind; argv += optind; if (argc == 0){ printf("need # of pages\n"); exit(1); } nr_pages = atoi(argv[0]); if (nr_pages < 2) { printf("nr_pages must >2\n"); exit(1); } fd = hugetlbfs_unlinked_fd(); p = mmap(NULL, nr_pages * gethugepagesize(), PROT_READ|PROT_WRITE, mmap_flags, fd, 0); sleep(2); *(p + gethugepagesize()) = 1; /* COW */ sleep(2); /* crash! */ *(int*)0 = 1; return 0; } Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Kawai Hidehiro <hidehiro.kawai.ez@hitachi.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: William Irwin <wli@holomorphy.com> Cc: Adam Litke <agl@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-17ext4: Update Documentation/filesystems/ext4.txtDiego Calleja
Since Ext4 is supposed to be stable in 2.6.28-rc, ext4's documentation file should be updated. [ More updates also added by Theodore Ts'o. ] Signed-off-by: Diego Calleja <diegocg@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-10-16autofs4: device node ioctl documentationIan Kent
Add documentation for the miscellaneous device module of autofs4. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-16Document panic_on_unrecovered_nmi sysctlBernhard Walle
This adds "panic_on_unrecovered_nmi" sysctl to Documentation/filesystems/proc.txt. The text is mainly taken from http://readlist.com/lists/vger.kernel.org/linux-kernel/43/217998.html. Signed-off-by: Bernhard Walle <bwalle@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-16Remove Andrew Morton's http://www.zip.com.au/~akpm/FD Cami
Remove Andrew Morton's http://www.zip.com.au/~akpm/ urls, update to new ones when necessary, delete references otherwise. There are still instances of that living in: Documentation/zh_CN/HOWTO Documentation/zh_CN/SubmittingPatches Documentation/ko_KR/HOWTO Documentation/ja_JP/SubmittingPatches Signed-off-by: Francois Cami <francois.cami@free.fr> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-16doc: typo in Documentation/filesystems/nfsroot.txtShane McDonald
Add a missing word to the explanation of the purpose of the zdisk and bzdisk make targets. Signed-off-by: Shane McDonald <mcdonald.shane@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-16Fix Documentation/filesystems/ramfs-rootfs-initramfs.txtfrans
First a file hello.c is created, then the file hello2.c is compiled. Change this to hello.c Signed-off-by: Frans Meulenbroeks <fransmeulenbroeks@gmail.com> Signed-off-by: Rob Landley <rob@landley.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-13ocfs2: Documentation update for user_xattr / nouser_xattr mount optionsMark Fasheh
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13ocfs2: Add the 'inode64' mount option.Joel Becker
Now that ocfs2 limits inode numbers to 32bits, add a mount option to disable the limit. This parallels XFS. 64bit systems can handle the larger inode numbers. [ Added description of inode64 mount option in ocfs2.txt. --Mark ] Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13Merge branch 'proc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/adobriyan/proc * 'proc' of git://git.kernel.org/pub/scm/linux/kernel/git/adobriyan/proc: proc: remove kernel.maps_protect proc: remove now unneeded ADDBUF macro [PATCH] proc: show personality via /proc/pid/personality [PATCH] signal, procfs: some lock_task_sighand() users do not need rcu_read_lock() proc: move PROC_PAGE_MONITOR to fs/proc/Kconfig proc: make grab_header() static proc: remove unused get_dma_list() proc: remove dummy vmcore_open() proc: proc_sys_root tweak proc: fix return value of proc_reg_open() in "too late" case Fixed up trivial conflict in removed file arch/sparc/include/asm/dma_32.h
2008-10-10ext4: add an option to control error handling on file dataHidehiro Kawai
If the journal doesn't abort when it gets an IO error in file data blocks, the file data corruption will spread silently. Because most of applications and commands do buffered writes without fsync(), they don't notice the IO error. It's scary for mission critical systems. On the other hand, if the journal aborts whenever it gets an IO error in file data blocks, the system will easily become inoperable. So this patch introduces a filesystem option to determine whether it aborts the journal or just call printk() when it gets an IO error in file data. If you mount an ext4 fs with data_err=abort option, it aborts on file data write error. If you mount it with data_err=ignore, it doesn't abort, just call printk(). data_err=ignore is the default. Here is the corresponding patch of the ext3 version: http://kerneltrap.org/mailarchive/linux-kernel/2008/9/9/3239374 Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2008-10-10ext4: Rename ext4dev to ext4Theodore Ts'o
The ext4 filesystem is getting stable enough that it's time to drop the "dev" prefix. Also remove the requirement for the TEST_FILESYS flag. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-10-10proc: remove kernel.maps_protectAlexey Dobriyan
After commit 831830b5a2b5d413407adf380ef62fe17d6fcbf2 aka "restrict reading from /proc/<pid>/maps to those who share ->mm or can ptrace" sysctl stopped being relevant because commit moved security checks from ->show time to ->start time (mm_for_maps()). Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Kees Cook <kees.cook@canonical.com>
2008-10-08vfs: vfs-level fiemap interfaceMark Fasheh
Basic vfs-level fiemap infrastructure, which sets up a new ->fiemap inode operation. Userspace can get extent information on a file via fiemap ioctl. As input, the fiemap ioctl takes a struct fiemap which includes an array of struct fiemap_extent (fm_extents). Size of the extent array is passed as fm_extent_count and number of extents returned will be written into fm_mapped_extents. Offset and length fields on the fiemap structure (fm_start, fm_length) describe a logical range which will be searched for extents. All extents returned will at least partially contain this range. The actual extent offsets and ranges returned will be unmodified from their offset and range on-disk. The fiemap ioctl returns '0' on success. On error, -1 is returned and errno is set. If errno is equal to EBADR, then fm_flags will contain those flags which were passed in which the kernel did not understand. On all other errors, the contents of fm_extents is undefined. As fiemap evolved, there have been many authors of the vfs patch. As far as I can tell, the list includes: Kalpak Shah <kalpak.shah@sun.com> Andreas Dilger <adilger@sun.com> Eric Sandeen <sandeen@redhat.com> Mark Fasheh <mfasheh@suse.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: linux-api@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org
2008-10-09ext4: Use readahead when reading an inode from the inode tableTheodore Ts'o
With modern hard drives, reading 64k takes roughly the same time as reading a 4k block. So request readahead for adjacent inode table blocks to reduce the time it takes when iterating over directories (especially when doing this in htree sort order) in a cold cache case. With this patch, the time it takes to run "git status" on a kernel tree after flushing the caches via "echo 3 > /proc/sys/vm/drop_caches" is reduced by 21%. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-10-09ext4: Improve the documentation for ext4's /proc tunablesTheodore Ts'o
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Alex Tomas <bzzz@sun.com> Cc: Andreas Dilger <adilger@sun.com>
2008-09-30UBIFS: add no_chk_data_crc mount optionAdrian Hunter
UBIFS read performance can be improved by skipping the CRC check when data nodes are read. This option can be used if the underlying media is considered to be highly reliable. Note that CRCs are always checked for metadata. Read speed on Arm platform with OneNAND goes from 19 MiB/s to 27 MiB/s with data CRC checking disabled. Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
2008-09-30UBIFS: add bulk-read facilityAdrian Hunter
Some flash media are capable of reading sequentially at faster rates. UBIFS bulk-read facility is designed to take advantage of that, by reading in one go consecutive data nodes that are also located consecutively in the same LEB. Read speed on Arm platform with OneNAND goes from 17 MiB/s to 19 MiB/s. Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
2008-09-13coredump_filter: add description of bit 4Hidehiro Kawai
There is no description of bit 4 of coredump_filter in the documentation. This patch adds it. Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: Roland McGrath <roland@redhat.com> Cc: Mel Gorman <mel@csn.ul.ie> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-09-09update Documentation/filesystems/Locking for 2.6.27 changesChristoph Hellwig
In the 2.6.27 circle ->fasync lost the BKL, and the last remaining ->open variant that takes the BKL is also gone. ->get_sb and ->kill_sb didn't have BKL forever, so updated the entries while we're at that. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-09-02ipc: document the new auto_msgmni proc fileNadia Derbey
Update Documentation/filesystems/proc.txt: it describes the file auto_msgmni intoduced to enable/disable msgmni automatic recomputing upon memory add/remove (see thread http://lkml.org/lkml/2008/7/4/27). Also added a description for msgmni (this filex is only listed in Documentation/sysctl/kernel.txt). Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-09-02NTFS: update homepageAdrian Bunk
Update the location of the NTFS homepage in several files. Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: Jeff Garzik <jeff@garzik.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-22Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: Update documentation to remind users to update mke2fs.conf ext4: Fix small file fragmentation ext4: Initialize writeback_index to 0 when allocating a new inode ext4: make sure ext4_has_free_blocks returns 0 for ENOSPC ext4: journal credit fix for the delayed allocation's writepages() function ext4: Rework the ext4_da_writepages() function ext4: journal credits reservation fixes for DIO, fallocate ext4: journal credits reservation fixes for extent file writepage ext4: journal credits calulation cleanup and fix for non-extent writepage ext4: Fix bug where we return ENOSPC even though we have plenty of inodes ext4: don't try to resize if there are no reserved gdt blocks left ext4: Use ext4_discard_reservations instead of mballoc-specific call ext4: Fix ext4_dx_readdir hash collision handling ext4: Fix delalloc release block reservation for truncate ext4: Fix potential truncate BUG due to i_prealloc_list being non-empty ext4: Handle unwritten extent properly with delayed allocation
2008-08-13Documentation: fix typo in ubifs.txtSebastian Siewior
Signed-off-by: Sebastian Siewior <bigeasy@linutronix.de> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-08-12docsrc: build Documentation/ sourcesRandy Dunlap
Currently source files in the Documentation/ sub-dir can easily bit-rot since they are not generally buildable, either because they are hidden in text files or because there are no Makefile rules for them. This needs to be fixed so that the source files remain usable and good examples of code instead of bad examples. Add the ability to build source files that are in the Documentation/ dir. Add to Kconfig as "BUILD_DOCSRC" config symbol. Use "CONFIG_BUILD_DOCSRC=1 make ..." to build objects from the Documentation/ sources. Or enable BUILD_DOCSRC in the *config system. However, this symbol depends on HEADERS_CHECK since the header files need to be installed (for userspace builds). Built (using cross-tools) for x86-64, i386, alpha, ia64, sparc32, sparc64, powerpc, sh, m68k, & mips. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Reviewed-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-12quota: documentation for sending "below quota" messages via netlink and tiny ↵Jan Kara
doc update Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-31[PATCH] configfs: Convenience macros for attribute definition.Joel Becker
Sysfs has the _ATTR() and _ATTR_RO() macros to make defining extended form attributes easier. configfs should have something similiar. - _CONFIGFS_ATTR() and _CONFIGFS_ATTR_RO() are the counterparts to the sysfs macros. - CONFIGFS_ATTR_STRUCT() creates the extended form attribute structure. - CONFIGFS_ATTR_OPS() defines the show_attribute()/store_attribute() operations that call the show()/store() operations of the extended form configfs_attributes. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-07-27ext4: Update documentation to remind users to update mke2fs.confTheodore Ts'o
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-07-26Documentation cleanup: trivial misspelling, punctuation, and grammar ↵Matt LaPlante
corrections. Cc: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26omfs: add filesystem documentationBob Copeland
These patches add the Optimized MPEG Filesystem, a proprietary filesystem used by the embedded devices Rio Karma and ReplayTV, which are no longer manufactured. This filesystem module enables people to access files on these devices. This patch: OMFS is a proprietary filesystem created for the ReplayTV and also used by the Rio Karma. It uses hash tables with unordered, unbounded lists in each bucket for directories, extents for data blocks, 64-bit addressing for blocks, with up to 8K blocks (only 2K of a given block is ever used for metadata, so the FS still works with 4K pages). Document the filesystem usage and structures. Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26relay: add buffer-only channels; useful for early loggingEduard - Gabriel Munteanu
Allows one to create and use a channel with no associated files. Files can be initialized later. This is useful in scenarios such as logging in early code, before VFS is up. Therefore, such channels can be created and used as soon as kmem_cache_init() completed. This is needed by kmemtrace to do tracing in early kernel code. [kosaki.motohiro@jp.fujitsu.com: build fix] Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> Cc: Tom Zanussi <tzanussi@gmail.com> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25UTC timestamp option for FAT filesystems fixJoe Peterson
Signed-off-by: Joe Peterson <joe@skyrush.com> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24vmallocinfo: add NUMA informationEric Dumazet
Christoph recently added /proc/vmallocinfo file to get information about vmalloc allocations. This patch adds NUMA specific information, giving number of pages allocated on each memory node. This should help to check that vmalloc() is able to respect NUMA policies. Example of output on a four nodes machine (one cpu per node) 1) network hash tables are evenly spreaded on four nodes (OK) (Same point for inodes and dentries hash tables) 2) iptables tables (x_tables) are correctly allocated on each cpu node (OK). 3) sys_swapon() allocates its memory from one node only. 4) each loaded module is using memory on one node. Sysadmins could tune their setup to change points 3) and 4) if necessary. grep "pages=" /proc/vmallocinfo 0xffffc20000000000-0xffffc20000201000 2101248 alloc_large_system_hash+0x204/0x2c0 pages=512 vmalloc N0=128 N1=128 N2=128 N3=128 0xffffc20000201000-0xffffc20000302000 1052672 alloc_large_system_hash+0x204/0x2c0 pages=256 vmalloc N0=64 N1=64 N2=64 N3=64 0xffffc2000031a000-0xffffc2000031d000 12288 alloc_large_system_hash+0x204/0x2c0 pages=2 vmalloc N1=1 N2=1 0xffffc2000031f000-0xffffc2000032b000 49152 cramfs_uncompress_init+0x2e/0x80 pages=11 vmalloc N0=3 N1=3 N2=2 N3=3 0xffffc2000033e000-0xffffc20000341000 12288 sys_swapon+0x640/0xac0 pages=2 vmalloc N0=2 0xffffc20000341000-0xffffc20000344000 12288 xt_alloc_table_info+0xfe/0x130 [x_tables] pages=2 vmalloc N0=2 0xffffc20000344000-0xffffc20000347000 12288 xt_alloc_table_info+0xfe/0x130 [x_tables] pages=2 vmalloc N1=2 0xffffc20000347000-0xffffc2000034a000 12288 xt_alloc_table_info+0xfe/0x130 [x_tables] pages=2 vmalloc N2=2 0xffffc2000034a000-0xffffc2000034d000 12288 xt_alloc_table_info+0xfe/0x130 [x_tables] pages=2 vmalloc N3=2 0xffffc20004381000-0xffffc20004402000 528384 alloc_large_system_hash+0x204/0x2c0 pages=128 vmalloc N0=32 N1=32 N2=32 N3=32 0xffffc20004402000-0xffffc20004803000 4198400 alloc_large_system_hash+0x204/0x2c0 pages=1024 vmalloc vpages N0=256 N1=256 N2=256 N3=256 0xffffc20004803000-0xffffc20004904000 1052672 alloc_large_system_hash+0x204/0x2c0 pages=256 vmalloc N0=64 N1=64 N2=64 N3=64 0xffffc20004904000-0xffffc20004bec000 3047424 sys_swapon+0x640/0xac0 pages=743 vmalloc vpages N0=743 0xffffffffa0000000-0xffffffffa000f000 61440 sys_init_module+0xc27/0x1d00 pages=14 vmalloc N1=14 0xffffffffa000f000-0xffffffffa0014000 20480 sys_init_module+0xc27/0x1d00 pages=4 vmalloc N0=4 0xffffffffa0014000-0xffffffffa0017000 12288 sys_init_module+0xc27/0x1d00 pages=2 vmalloc N0=2 0xffffffffa0017000-0xffffffffa0022000 45056 sys_init_module+0xc27/0x1d00 pages=10 vmalloc N1=10 0xffffffffa0022000-0xffffffffa0028000 24576 sys_init_module+0xc27/0x1d00 pages=5 vmalloc N3=5 0xffffffffa0028000-0xffffffffa0050000 163840 sys_init_module+0xc27/0x1d00 pages=39 vmalloc N1=39 0xffffffffa0050000-0xffffffffa0052000 8192 sys_init_module+0xc27/0x1d00 pages=1 vmalloc N1=1 0xffffffffa0052000-0xffffffffa0056000 16384 sys_init_module+0xc27/0x1d00 pages=3 vmalloc N1=3 0xffffffffa0056000-0xffffffffa0081000 176128 sys_init_module+0xc27/0x1d00 pages=42 vmalloc N3=42 0xffffffffa0081000-0xffffffffa00ae000 184320 sys_init_module+0xc27/0x1d00 pages=44 vmalloc N3=44 0xffffffffa00ae000-0xffffffffa00b1000 12288 sys_init_module+0xc27/0x1d00 pages=2 vmalloc N3=2 0xffffffffa00b1000-0xffffffffa00b9000 32768 sys_init_module+0xc27/0x1d00 pages=7 vmalloc N0=7 0xffffffffa00b9000-0xffffffffa00c4000 45056 sys_init_module+0xc27/0x1d00 pages=10 vmalloc N3=10 0xffffffffa00c6000-0xffffffffa00e0000 106496 sys_init_module+0xc27/0x1d00 pages=25 vmalloc N2=25 0xffffffffa00e0000-0xffffffffa00f1000 69632 sys_init_module+0xc27/0x1d00 pages=16 vmalloc N2=16 0xffffffffa00f1000-0xffffffffa00f4000 12288 sys_init_module+0xc27/0x1d00 pages=2 vmalloc N3=2 0xffffffffa00f4000-0xffffffffa00f7000 12288 sys_init_module+0xc27/0x1d00 pages=2 vmalloc N3=2 [akpm@linux-foundation.org: fix comment] Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24access_process_vm device memory infrastructureRik van Riel
In order to be able to debug things like the X server and programs using the PPC Cell SPUs, the debugger needs to be able to access device memory through ptrace and /proc/pid/mem. This patch: Add the generic_access_phys access function and put the hooks in place to allow access_process_vm to access device or PPC Cell SPU memory. [riel@redhat.com: Add documentation for the vm_ops->access function] Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Benjamin Herrensmidt <benh@kernel.crashing.org> Cc: Dave Airlie <airlied@linux.ie> Cc: Hugh Dickins <hugh@veritas.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnd Bergmann <arnd@arndb.de> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linusLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: remove CONFIG_KMOD from core kernel code remove CONFIG_KMOD from lib remove CONFIG_KMOD from sparc64 rework try_then_request_module to do less in non-modular kernels remove mention of CONFIG_KMOD from documentation make CONFIG_KMOD invisible modules: Take a shortcut for checking if an address is in a module module: turn longs into ints for module sizes Shrink struct module: CONFIG_UNUSED_SYMBOLS ifdefs module: reorder struct module to save space on 64 bit builds module: generic each_symbol iterator function module: don't use stop_machine for waiting rmmod
2008-07-22remove mention of CONFIG_KMOD from documentationJohannes Berg
Also includes a few Kconfig files (xtensa, blackfin) Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: linux-doc@vger.kernel.org Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Randy Dunlap <rdunlap@xenotime.net>
2008-07-21sysfs: add /sys/dev/{char,block} to lookup sysfs path by major:minorDan Williams
Why?: There are occasions where userspace would like to access sysfs attributes for a device but it may not know how sysfs has named the device or the path. For example what is the sysfs path for /dev/disk/by-id/ata-ST3160827AS_5MT004CK? With this change a call to stat(2) returns the major:minor then userspace can see that /sys/dev/block/8:32 links to /sys/block/sdc. What are the alternatives?: 1/ Add an ioctl to return the path: Doable, but sysfs is meant to reduce the need to proliferate ioctl interfaces into the kernel, so this seems counter productive. 2/ Use udev to create these symlinks: Also doable, but it adds a udev dependency to utilities that might be running in a limited environment like an initramfs. 3/ Do a full-tree search of sysfs. [kay.sievers@vrfy.org: fix duplicate registrations] [kay.sievers@vrfy.org: cleanup suggestions] Cc: Neil Brown <neilb@suse.de> Cc: Tejun Heo <htejun@gmail.com> Acked-by: Kay Sievers <kay.sievers@vrfy.org> Reviewed-by: SL Baur <steve@xemacs.org> Acked-by: Kay Sievers <kay.sievers@vrfy.org> Acked-by: Mark Lord <lkml@rtr.ca> Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-20Merge branch 'for-2.6.27' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
* 'for-2.6.27' of git://linux-nfs.org/~bfields/linux: (51 commits) nfsd: nfs4xdr.c do-while is not a compound statement nfsd: Use C99 initializers in fs/nfsd/nfs4xdr.c lockd: Pass "struct sockaddr *" to new failover-by-IP function lockd: get host reference in nlmsvc_create_block() instead of callers lockd: minor svclock.c style fixes lockd: eliminate duplicate nlmsvc_lookup_host call from nlmsvc_lock lockd: eliminate duplicate nlmsvc_lookup_host call from nlmsvc_testlock lockd: nlm_release_host() checks for NULL, caller needn't file lock: reorder struct file_lock to save space on 64 bit builds nfsd: take file and mnt write in nfs4_upgrade_open nfsd: document open share bit tracking nfsd: tabulate nfs4 xdr encoding functions nfsd: dprint operation names svcrdma: Change WR context get/put to use the kmem cache svcrdma: Create a kmem cache for the WR contexts svcrdma: Add flush_scheduled_work to module exit function svcrdma: Limit ORD based on client's advertised IRD svcrdma: Remove unused wait q from svcrdma_xprt structure svcrdma: Remove unneeded spin locks from __svc_rdma_free svcrdma: Add dma map count and WARN_ON ...
2008-07-17configfs: Allow ->make_item() and ->make_group() to return detailed errors.Joel Becker
The configfs operations ->make_item() and ->make_group() currently return a new item/group. A return of NULL signifies an error. Because of this, -ENOMEM is the only return code bubbled up the stack. Multiple folks have requested the ability to return specific error codes when these operations fail. This patch adds that ability by changing the ->make_item/group() ops to return ERR_PTR() values. These errors are bubbled up appropriately. NULL returns are changed to -ENOMEM for compatibility. Also updated are the in-kernel users of configfs. This is a rework of reverted commit 11c3b79218390a139f2d474ee1e983a672d5839a. Signed-off-by: Joel Becker <joel.becker@oracle.com>
2008-07-17Revert "configfs: Allow ->make_item() and ->make_group() to return detailed ↵Joel Becker
errors." This reverts commit 11c3b79218390a139f2d474ee1e983a672d5839a. The code will move to PTR_ERR(). Signed-off-by: Joel Becker <joel.becker@oracle.com>
2008-07-17Merge branch 'upstream-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: [PATCH] ocfs2: fix oops in mmap_truncate testing configfs: call drop_link() to cleanup after create_link() failure configfs: Allow ->make_item() and ->make_group() to return detailed errors. configfs: Fix failing mkdir() making racing rmdir() fail configfs: Fix deadlock with racing rmdir() and rename() configfs: Make configfs_new_dirent() return error code instead of NULL configfs: Protect configfs_dirent s_links list mutations configfs: Introduce configfs_dirent_lock ocfs2: Don't snprintf() without a format. ocfs2: Fix CONFIG_OCFS2_DEBUG_FS #ifdefs ocfs2/net: Silence build warnings on sparc64 ocfs2: Handle error during journal load ocfs2: Silence an error message in ocfs2_file_aio_read() ocfs2: use simple_read_from_buffer() ocfs2: fix printk format warnings with OCFS2_FS_STATS=n [PATCH 2/2] ocfs2: Instrument fs cluster locks [PATCH 1/2] ocfs2: Add CONFIG_OCFS2_FS_STATS config option
2008-07-16Merge branch 'for_linus' of git://git.infradead.org/~dedekind/ubifs-2.6Linus Torvalds
* 'for_linus' of git://git.infradead.org/~dedekind/ubifs-2.6: UBIFS: include to compilation UBIFS: add new flash file system UBIFS: add brief documentation MAINTAINERS: add UBIFS section do_mounts: allow UBI root device name VFS: export sync_sb_inodes VFS: move inode_lock into sync_sb_inodes
2008-07-15Merge branch 'genirq' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'genirq' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: genirq: remove extraneous checks in manage.c genirq: Expose default irq affinity mask (take 3)