aboutsummaryrefslogtreecommitdiff
path: root/mm
AgeCommit message (Collapse)Author
2007-05-09Move remote node draining out of slab allocatorsChristoph Lameter
Currently the slab allocators contain callbacks into the page allocator to perform the draining of pagesets on remote nodes. This requires SLUB to have a whole subsystem in order to be compatible with SLAB. Moving node draining out of the slab allocators avoids a section of code in SLUB. Move the node draining so that is is done when the vm statistics are updated. At that point we are already touching all the cachelines with the pagesets of a processor. Add a expire counter there. If we have to update per zone or global vm statistics then assume that the pageset will require subsequent draining. The expire counter will be decremented on each vm stats update pass until it reaches zero. Then we will drain one batch from the pageset. The draining will cause vm counter updates which will then cause another expiration until the pcp is empty. So we will drain a batch every 3 seconds. Note that remote node draining is a somewhat esoteric feature that is required on large NUMA systems because otherwise significant portions of system memory can become trapped in pcp queues. The number of pcp is determined by the number of processors and nodes in a system. A system with 4 processors and 2 nodes has 8 pcps which is okay. But a system with 1024 processors and 512 nodes has 512k pcps with a high potential for large amount of memory being caught in them. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09Make vm statistics update interval configurableChristoph Lameter
Make it configurable. Code in mm makes the vm statistics intervals independent from the cache reaper use that opportunity to make it configurable. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09vmstat: use our own timer eventsChristoph Lameter
vmstat is currently using the cache reaper to periodically bring the statistics up to date. The cache reaper does only exists in SLUB as a way to provide compatibility with SLAB. This patch removes the vmstat calls from the slab allocators and provides its own handling. The advantage is also that we can use a different frequency for the updates. Refreshing vm stats is a pretty fast job so we can run this every second and stagger this by only one tick. This will lead to some overlap in large systems. F.e a system running at 250 HZ with 1024 processors will have 4 vm updates occurring at once. However, the vm stats update only accesses per node information. It is only necessary to stagger the vm statistics updates per processor in each node. Vm counter updates occurring on distant nodes will not cause cacheline contention. We could implement an alternate approach that runs the first processor on each node at the second and then each of the other processor on a node on a subsequent tick. That may be useful to keep a large amount of the second free of timer activity. Maybe the timer folks will have some feedback on this one? [jirislaby@gmail.com: add missing break] Cc: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09Add suspend-related notifications for CPU hotplugRafael J. Wysocki
Since nonboot CPUs are now disabled after tasks and devices have been frozen and the CPU hotplug infrastructure is used for this purpose, we need special CPU hotplug notifications that will help the CPU-hotplug-aware subsystems distinguish normal CPU hotplug events from CPU hotplug events related to a system-wide suspend or resume operation in progress. This patch introduces such notifications and causes them to be used during suspend and resume transitions. It also changes all of the CPU-hotplug-aware subsystems to take these notifications into consideration (for now they are handled in the same way as the corresponding "normal" ones). [oleg@tv-sign.ru: cleanups] Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Gautham R Shenoy <ego@in.ibm.com> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09fs: convert core functions to zero_user_pageNate Diller
It's very common for file systems to need to zero part or all of a page, the simplist way is just to use kmap_atomic() and memset(). There's actually a library function in include/linux/highmem.h that does exactly that, but it's confusingly named memclear_highpage_flush(), which is descriptive of *how* it does the work rather than what the *purpose* is. So this patchset renames the function to zero_user_page(), and calls it from the various places that currently open code it. This first patch introduces the new function call, and converts all the core kernel callsites, both the open-coded ones and the old memclear_highpage_flush() ones. Following this patch is a series of conversions for each file system individually, per AKPM, and finally a patch deprecating the old call. The diffstat below shows the entire patchset. [akpm@linux-foundation.org: fix a few things] Signed-off-by: Nate Diller <nate.diller@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09slab: shut down cache_reaper when cpu goes downChristoph Lameter
Shutdown the cache_reaper if the cpu is brought down and set the cache_reap.func to NULL. Otherwise hotplug shuts down the reaper for good. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09slab: use CPU_LOCK_[ACQUIRE|RELEASE]Heiko Carstens
Looks like this was forgotten when CPU_LOCK_[ACQUIRE|RELEASE] was introduced. Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Gautham Shenoy <ego@in.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09AFS: export a couple of core functions for AFS write supportDavid Howells
Export a couple of core functions for AFS write support to use: find_get_pages_contig() find_get_pages_tag() Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09pretend cpuset has some form of hugetlb page reservationKen Chen
When cpuset is configured, it breaks the strict hugetlb page reservation as the accounting is done on a global variable. Such reservation is completely rubbish in the presence of cpuset because the reservation is not checked against page availability for the current cpuset. Application can still potentially OOM'ed by kernel with lack of free htlb page in cpuset that the task is in. Attempt to enforce strict accounting with cpuset is almost impossible (or too ugly) because cpuset is too fluid that task or memory node can be dynamically moved between cpusets. The change of semantics for shared hugetlb mapping with cpuset is undesirable. However, in order to preserve some of the semantics, we fall back to check against current free page availability as a best attempt and hopefully to minimize the impact of changing semantics that cpuset has on hugetlb. Signed-off-by: Ken Chen <kenchen@google.com> Cc: Paul Jackson <pj@sgi.com> Cc: Christoph Lameter <clameter@engr.sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09fix leaky resv_huge_pages when cpuset is in useKen Chen
The internal hugetlb resv_huge_pages variable can permanently leak nonzero value in the error path of hugetlb page fault handler when hugetlb page is used in combination of cpuset. The leaked count can permanently trap N number of hugetlb pages in unusable "reserved" state. Steps to reproduce the bug: (1) create two cpuset, user1 and user2 (2) reserve 50 htlb pages in cpuset user1 (3) attempt to shmget/shmat 50 htlb page inside cpuset user2 (4) kernel oom the user process in step 3 (5) ipcrm the shm segment At this point resv_huge_pages will have a count of 49, even though there are no active hugetlbfs file nor hugetlb shared memory segment in the system. The leak is permanent and there is no recovery method other than system reboot. The leaked count will hold up all future use of that many htlb pages in all cpusets. The culprit is that the error path of alloc_huge_page() did not properly undo the change it made to resv_huge_page, causing inconsistent state. Signed-off-by: Ken Chen <kenchen@google.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Adam Litke <agl@us.ibm.com> Cc: Martin Bligh <mbligh@google.com> Acked-by: David Gibson <dwg@au1.ibm.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09krealloc: fix kerneldoc commentsPekka J Enberg
No "blank" (or "*") line is allowed between the function name and lines for it parameter(s). Cc: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09SLUB: rework slab order determinationChristoph Lameter
In some cases SLUB is creating uselessly slabs that are larger than slub_max_order. Also the layout of some of the slabs was not satisfactory. Go to an iterarive approach. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09SLUB: include lifetime stats and sets of cpus / nodes in tracking outputChristoph Lameter
We have information about how long an object existed and about the nodes and cpus where the allocations and frees took place. Add that information to the tracking output in /sys/slab/xx/alloc_calls and /sys/slab/free_calls This will then enable slabinfo to output nice reports like this: christoph@qirst:~/slub$ ./slabinfo kmalloc-128 Slabcache: kmalloc-128 Aliases: 0 Order : 0 Sizes (bytes) Slabs Debug Memory ------------------------------------------------------------------------ Object : 128 Total : 12 Sanity Checks : On Total: 49152 SlabObj: 200 Full : 7 Redzoning : On Used : 24832 SlabSiz: 4096 Partial: 4 Poisoning : On Loss : 24320 Loss : 72 CpuSlab: 1 Tracking : On Lalig: 13968 Align : 8 Objects: 20 Tracing : Off Lpadd: 1152 kmalloc-128 has no kmem_cache operations kmalloc-128: Kernel object allocation ----------------------------------------------------------------------- 6 param_sysfs_setup+0x71/0x130 age=284512/284512/284512 pid=1 nodes=0-1,3 11 percpu_populate+0x39/0x80 age=283914/284428/284512 pid=1 nodes=0 21 __register_chrdev_region+0x31/0x170 age=282896/284347/284473 pid=1-1705 nodes=0-2 1 sys_inotify_init+0x76/0x1c0 age=283423 pid=1004 nodes=0 19 as_get_io_context+0x32/0xd0 age=6/247567/283988 pid=1-11782 nodes=0,2 10 ida_pre_get+0x4a/0x80 age=277666/283773/284526 pid=0-2177 nodes=0,2 24 kobject_kset_add_dir+0x37/0xb0 age=282727/283860/284472 pid=1-1723 nodes=0-2 1 acpi_ds_build_internal_buffer_obj+0xd3/0x11d age=284508 pid=1 nodes=0 24 con_insert_unipair+0xd7/0x110 age=284438/284438/284438 pid=1 nodes=0,2 1 uart_open+0x2d2/0x4b0 age=283896 pid=1 nodes=0 26 dma_pool_create+0x73/0x1a0 age=282762/282833/282916 pid=1705-1723 nodes=0 1 neigh_table_init_no_netlink+0xd2/0x210 age=284461 pid=1 nodes=0 2 neigh_parms_alloc+0x2b/0xe0 age=284410/284411/284412 pid=1 nodes=2 2 neigh_resolve_output+0x1e1/0x280 age=276289/276291/276293 pid=0-2443 nodes=0 1 netlink_kernel_create+0x90/0x170 age=284472 pid=1 nodes=0 4 xt_alloc_table_info+0x39/0xf0 age=283958/283958/283959 pid=1 nodes=1 3 fn_hash_insert+0x473/0x720 age=277653/277661/277666 pid=2177-2185 nodes=0 1 get_mtrr_state+0x285/0x2a0 age=284526 pid=0 nodes=0 1 cacheinfo_cpu_callback+0x26d/0x3e0 age=284458 pid=1 nodes=0 29 kernel_param_sysfs_setup+0x25/0x90 age=284511/284511/284512 pid=1 nodes=0-1,3 5 process_zones+0x5e/0x170 age=284546/284546/284546 pid=0 nodes=0 1 drm_core_init+0x48/0x160 age=284421 pid=1 nodes=2 kmalloc-128: Kernel object freeing ------------------------------------------------------------------------ 163 <not-available> age=4295176847 pid=0 nodes=0-3 1 __vunmap+0x6e/0xf0 age=282907 pid=1723 nodes=0 28 free_as_io_context+0x12/0x90 age=9243/262197/283474 pid=42-11754 nodes=0 1 acpi_get_object_info+0x1b7/0x1d4 age=284475 pid=1 nodes=0 1 do_acpi_find_child+0x45/0x4e age=284475 pid=1 nodes=0 NUMA nodes : 0 1 2 3 ------------------------------------------ All slabs 7 2 2 1 Partial slabs 2 2 0 0 Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09SLUB: add CONFIG_SLUB_DEBUGChristoph Lameter
CONFIG_SLUB_DEBUG can be used to switch off the debugging and sysfs components of SLUB. Thus SLUB will be able to replace SLOB. SLUB can arrange objects in a denser way than SLOB and the code size should be minimal without debugging and sysfs support. Note that CONFIG_SLUB_DEBUG is materially different from CONFIG_SLAB_DEBUG. CONFIG_SLAB_DEBUG is used to enable slab debugging in SLAB. SLUB enables debugging via a boot parameter. SLUB debug code should always be present. CONFIG_SLUB_DEBUG can be modified in the embedded config section. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09SLUB: move tracking definitions and check_valid_pointer() away from debug codeChristoph Lameter
Move the tracking definitions and the check_valid_pointer() function away from the debugging related functions. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09SLUB: consolidate trace codeChristoph Lameter
Trace in both slab_alloc and slab_free has a lot of common code. Use a single function for both. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09SLUB: introduce DebugSlab(page)Christoph Lameter
This replaces the PageError() checking. DebugSlab is clearer and allows for future changes to the page bit used. We also need it to support CONFIG_SLUB_DEBUG. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09SLUB: move resiliency check into SYSFS sectionChristoph Lameter
Move the resiliency check into the SYSFS section after validate_slab that is used by the resiliency check. This will avoid a forward declaration. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09SLUB: add macros for scanning objects in a slabChristoph Lameter
Scanning of objects happens in a number of functions. Consolidate that code. DECLARE_BITMAP instead of coding the declaration for bitmaps. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09SLUB: update commentsChristoph Lameter
Update comments throughout SLUB to reflect the new developments. Fix up various awkward sentences. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09SLUB: get rid of finish_bootstrapChristoph Lameter
Its only purpose was to bring some sort of symmetry to sysfs usage when dealing with bootstrapping per cpu flushing. Since we do not time out slabs anymore we have no need to run finish_bootstrap even without sysfs. Fold it back into slab_sysfs_init and drop the initcall for the !SYFS case. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09SLUB: clean up kreallocChristoph Lameter
We really do not need all this gaga there. ksize gives us all the information we need to figure out if the object can cope with the new size. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09SLUB: use check_valid_pointer in kmem_ptr_validateChristoph Lameter
We needlessly duplicate code. Also make check_valid_pointer inline. Signed-off-by: Christoph LAemter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09SLUB: after object padding only needed for RedzoningChristoph Lameter
If no redzoning is selected then we do not need padding before the next object. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09SLUB: add support for dynamic cacheline size determinationChristoph Lameter
SLUB currently assumes that the cacheline size is static. However, i386 f.e. supports dynamic cache line size determination. Use cache_line_size() instead of L1_CACHE_BYTES in the allocator. That also explains the purpose of SLAB_HWCACHE_ALIGN. So we will need to keep that one around to allow dynamic aligning of objects depending on boot determination of the cache line size. [akpm@linux-foundation.org: need to define it before we use it] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08Remove unused variable in get_unmapped_areaRoland McGrath
Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08fbdev: mm: Deferred IO supportJaya Kumar
This implements deferred IO support in fbdev. Deferred IO is a way to delay and repurpose IO. This implementation is done using mm's page_mkwrite and page_mkclean hooks in order to detect, delay and then rewrite IO. This functionality is used by hecubafb. [adaplas] This is useful for graphics hardware with no directly addressable/mappable framebuffer. Implementing this will allow the "framebuffer" to be accesible from user space via mmap(). Signed-off-by: Jaya Kumar <jayakumar.lkml@gmail.com> Signed-off-by: Antonino Daplas <adaplas@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08Fix race between cat /proc/slab_allocators and rmmodAlexey Dobriyan
Same story as with cat /proc/*/wchan race vs rmmod race, only /proc/slab_allocators want more info than just symbol name. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08Remove do_sync_file_range()Mark Fasheh
Remove do_sync_file_range() and convert callers to just use do_sync_mapping_range(). Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08move die notifier handling to common codeChristoph Hellwig
This patch moves the die notifier handling to common code. Previous various architectures had exactly the same code for it. Note that the new code is compiled unconditionally, this should be understood as an appel to the other architecture maintainer to implement support for it aswell (aka sprinkling a notify_die or two in the proper place) arm had a notifiy_die that did something totally different, I renamed it to arm_notify_die as part of the patch and made it static to the file it's declared and used at. avr32 used to pass slightly less information through this interface and I brought it into line with the other architectures. [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: fix vmalloc_sync_all bustage] [bryan.wu@analog.com: fix vmalloc_sync_all in nommu] Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: <linux-arch@vger.kernel.org> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Bryan Wu <bryan.wu@analog.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08Factor outstanding I/O error handlingGuillaume Chazarain
Cleanup: setting an outstanding error on a mapping was open coded too many times. Factor it out in mapping_set_error(). Signed-off-by: Guillaume Chazarain <guichaz@yahoo.fr> Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08Add white list into modpost.c for memory hotplug code and ia64's machvec sectionYasunori Goto
This patch is add white list into modpost.c for some functions and ia64's section to fix section mismatchs. sparse_index_alloc() and zone_wait_table_init() calls bootmem allocator at boot time, and kmalloc/vmalloc at hotplug time. If config memory hotplug is on, there are references of bootmem allocater(init text) from them (normal text). This is cause of section mismatch. Bootmem is called by many functions and it must be used only at boot time. I think __init of them should keep for section mismatch check. So, I would like to register sparse_index_alloc() and zone_wait_table_init() into white list. In addition, ia64's .machvec section is function table of some platform dependent code. It is mixture of .init.text and normal text. These reference of __init functions are valid too. Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08Fix section mismatch of memory hotplug related code.Yasunori Goto
This is to fix many section mismatches of code related to memory hotplug. I checked compile with memory hotplug on/off on ia64 and x86-64 box. Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08mm: move common segment checks to separate helper functionDmitriy Monakhov
[akpm@linux-foundation.org: cleanup] Signed-off-by: Monakhov Dmitriy <dmonakhov@openvz.org> Cc: Christoph Hellwig <hch@lst.de> Acked-by: Anton Altaparmakov <aia21@cam.ac.uk> Acked-by: David Chinner <dgc@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08Increase slab redzone to 64bitsDavid Woodhouse
There are two problems with the existing redzone implementation. Firstly, it's causing misalignment of structures which contain a 64-bit integer, such as netfilter's 'struct ipt_entry' -- causing netfilter modules to fail to load because of the misalignment. (In particular, the first check in net/ipv4/netfilter/ip_tables.c::check_entry_size_and_hooks()) On ppc32 and sparc32, amongst others, __alignof__(uint64_t) == 8. With slab debugging, we use 32-bit redzones. And allocated slab objects aren't sufficiently aligned to hold a structure containing a uint64_t. By _just_ setting ARCH_KMALLOC_MINALIGN to __alignof__(u64) we'd disable redzone checks on those architectures. By using 64-bit redzones we avoid that loss of debugging, and also fix the other problem while we're at it. When investigating this, I noticed that on 64-bit platforms we're using a 32-bit value of RED_ACTIVE/RED_INACTIVE in the 64-bit memory location set aside for the redzone. Which means that the four bytes immediately before or after the allocated object at 0x00,0x00,0x00,0x00 for LE and BE machines, respectively. Which is probably not the most useful choice of poison value. One way to fix both of those at once is just to switch to 64-bit redzones in all cases. Signed-off-by: David Woodhouse <dwmw2@infradead.org> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Christoph Lameter <clameter@engr.sgi.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07Fix up SLUB compileLinus Torvalds
The newly merged SLUB allocator patches had been generated before the removal of "struct subsystem", and ended up applying fine, but wouldn't build based on the current tree as a result. Fix up that merge error - not that SLUB is likely really ready for showtime yet, but at least I can fix the trivial stuff. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6Linus Torvalds
* master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6: [SERIAL] sunsu: Fix section mismatch warnings. [SPARC64]: pgtable_cache_init() should be __init. [SPARC64]: Fix section mismatch warnings in arch/sparc64/kernel/prom.c [SPARC64]: Fix section mismatch warnings in arch/sparc64/kernel/pci.c [SPARC64]: Fix section mismatch warnings in arch/sparc64/kernel/console.c [MM]: sparse_init() should be __init. [SPARC64]: Update defconfig. [VIDEO]: Add Sun XVR-2500 framebuffer driver. [VIDEO]: Add Sun XVR-500 framebuffer driver. [SPARC64]: SUN4U PCI-E controller support. [SPARC]: Fix comment typo in smp4m_blackbox_current(). [SCSI] SUNESP: sun_esp.c needs linux/delay.h Fix up conflict in arch/sparc64/mm/init.c manually due to removal of pgtable_cache_init() through the -mm patches (even though that patch was also by David ;) Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07freezer: fix racy usage of try_to_freeze in kswapdRafael J. Wysocki
Currently we can miss freeze_process()->signal_wake_up() in kswapd() if it happens between try_to_freeze() and prepare_to_wait(). To prevent this from happening we should check freezing(current) before calling schedule(). Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07swsusp: use inline functions for changing page flagsRafael J. Wysocki
Replace direct invocations of SetPageNosave(), SetPageNosaveFree() etc. with calls to inline functions that can be changed in subsequent patches without modifying the code calling them. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07slob: fix page order calculation on not 4KB pageAkinobu Mita
SLOB doesn't calculate correct page order when page size is not 4KB. This patch fixes it with using get_order() instead of find_order() which is SLOB version of get_order(). Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: Matt Mackall <mpm@selenic.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07Slab allocators: remove useless __GFP_NO_GROW flagChristoph Lameter
There is no user remaining and I have never seen any use of that flag. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07slab allocators: Remove SLAB_CTOR_ATOMICChristoph Lameter
SLAB_CTOR atomic is never used which is no surprise since I cannot imagine that one would want to do something serious in a constructor or destructor. In particular given that the slab allocators run with interrupts disabled. Actions in constructors and destructors are by their nature very limited and usually do not go beyond initializing variables and list operations. (The i386 pgd ctor and dtors do take a spinlock in constructor and destructor..... I think that is the furthest we go at this point.) There is no flag passed to the destructor so removing SLAB_CTOR_ATOMIC also establishes a certain symmetry. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07slab allocators: Remove SLAB_DEBUG_INITIAL flagChristoph Lameter
I have never seen a use of SLAB_DEBUG_INITIAL. It is only supported by SLAB. I think its purpose was to have a callback after an object has been freed to verify that the state is the constructor state again? The callback is performed before each freeing of an object. I would think that it is much easier to check the object state manually before the free. That also places the check near the code object manipulation of the object. Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was compiled with SLAB debugging on. If there would be code in a constructor handling SLAB_DEBUG_INITIAL then it would have to be conditional on SLAB_DEBUG otherwise it would just be dead code. But there is no such code in the kernel. I think SLUB_DEBUG_INITIAL is too problematic to make real use of, difficult to understand and there are easier ways to accomplish the same effect (i.e. add debug code before kfree). There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be clear in fs inode caches. Remove the pointless checks (they would even be pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors. This is the last slab flag that SLUB did not support. Remove the check for unimplemented flags from SLUB. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07get_unmapped_area doesn't need hugetlbfs hacks anymoreBenjamin Herrenschmidt
Remove the hugetlbfs specific hacks in toplevel get_unmapped_area() now that all archs and hugetlbfs itself do the right thing for both cases. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: William Irwin <bill.irwin@oracle.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: Andi Kleen <ak@suse.de> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Grant Grundler <grundler@parisc-linux.org> Cc: Matthew Wilcox <willy@debian.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Adam Litke <agl@us.ibm.com> Cc: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07get_unmapped_area handles MAP_FIXED in generic codeBenjamin Herrenschmidt
generic arch_get_unmapped_area() now handles MAP_FIXED. Now that all implementations have been fixed, change the toplevel get_unmapped_area() to call into arch or drivers for the MAP_FIXED case. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: Andi Kleen <ak@suse.de> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Grant Grundler <grundler@parisc-linux.org> Cc: Matthew Wilcox <willy@debian.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: William Irwin <bill.irwin@oracle.com> Cc: Adam Litke <agl@us.ibm.com> Cc: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07oom: fix constraint deadlockDavid Rientjes
Fixes a deadlock in the OOM killer for allocations that are not __GFP_HARDWALL. Before the OOM killer checks for the allocation constraint, it takes callback_mutex. constrained_alloc() iterates through each zone in the allocation zonelist and calls cpuset_zone_allowed_softwall() to determine whether an allocation for gfp_mask is possible. If a zone's node is not in the OOM-triggering task's mems_allowed, it is not exiting, and we did not fail on a __GFP_HARDWALL allocation, cpuset_zone_allowed_softwall() attempts to take callback_mutex to check the nearest exclusive ancestor of current's cpuset. This results in deadlock. We now take callback_mutex after iterating through the zonelist since we don't need it yet. Cc: Andi Kleen <ak@suse.de> Cc: Nick Piggin <npiggin@suse.de> Cc: Christoph Lameter <clameter@engr.sgi.com> Cc: Martin J. Bligh <mbligh@mbligh.org> Signed-off-by: David Rientjes <rientjes@google.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07mm: fix handling of panic_on_oom when cpusets are in useYasunori Goto
The current panic_on_oom may not work if there is a process using cpusets/mempolicy, because other nodes' memory may remain. But some people want failover by panic ASAP even if they are used. This patch makes new setting for its request. This is tested on my ia64 box which has 3 nodes. Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Cc: Christoph Lameter <clameter@sgi.com> Cc: Paul Jackson <pj@sgi.com> Cc: Ethan Solomita <solo@google.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07fault injection: fix failslab with CONFIG_NUMAAkinobu Mita
Currently failslab injects failures into ____cache_alloc(). But with enabling CONFIG_NUMA it's not enough to let actual slab allocator functions (kmalloc, kmem_cache_alloc, ...) return NULL. This patch moves fault injection hook inside of __cache_alloc() and __cache_alloc_node(). These are lower call path than ____cache_alloc() and enable to inject faulures to slab allocators with CONFIG_NUMA. Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07slab allocators: Remove obsolete SLAB_MUST_HWCACHE_ALIGNChristoph Lameter
This patch was recently posted to lkml and acked by Pekka. The flag SLAB_MUST_HWCACHE_ALIGN is 1. Never checked by SLAB at all. 2. A duplicate of SLAB_HWCACHE_ALIGN for SLUB 3. Fulfills the role of SLAB_HWCACHE_ALIGN for SLOB. The only remaining use is in sparc64 and ppc64 and their use there reflects some earlier role that the slab flag once may have had. If its specified then SLAB_HWCACHE_ALIGN is also specified. The flag is confusing, inconsistent and has no purpose. Remove it. Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07mm: madvise avoid exclusive mmap_semNick Piggin
Avoid down_write of the mmap_sem in madvise when we can help it. Acked-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>