aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2008-07-24remove include/asm-h8300/keyboard.hAdrian Bunk
This patch removes the unused include/asm-h8300/keyboard.h Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24gigaset: gigaset_isowbuf_getbytes() may return signed unnoticedTilman Schmidt
ifd->offset is unsigned. gigaset_isowbuf_getbytes() may return signed unnoticed. Revised version of patch originally submitted by Roel Kluin <12o3l@tiscali.nl>. Signed-off-by: Tilman Schmidt <tilman@imap.cc> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24gigaset: use dev_ macros for messagesTilman Schmidt
The info() / warn() / err() macros from usb.h for generating kernel messages are considered inferior to dev_info() / dev_warn() / dev_err() from device.h. Replace them where possible. Also correct the severity level and improve the text of one message. Signed-off-by: Tilman Schmidt <tilman@imap.cc> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24security: remove unused forwardsHugh Dickins
Why would linux/security.h need forward declarations for nfsctl_arg and swap_info_struct? It's hard to imagine: remove them. Signed-off-by: Hugh Dickins <hugh@veritas.com> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24security: filesystem capabilities no longer experimentalAndrew G. Morgan
Filesystem capabilities have come of age. Remove the experimental tag for configuring filesystem capabilities. Signed-off-by: Andrew G. Morgan <morgan@kernel.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24security: filesystem capabilities refactor kernel codeAndrew G. Morgan
To date, we've tried hard to confine filesystem support for capabilities to the security modules. This has left a lot of the code in kernel/capability.c in a state where it looks like it supports something that filesystem support for capabilities actually suppresses when the LSM security/commmoncap.c code runs. What is left is a lot of code that uses sub-optimal locking in the main kernel With this change we refactor the main kernel code and make it explicit which locks are needed and that the only remaining kernel races in this area are associated with non-filesystem capability code. Signed-off-by: Andrew G. Morgan <morgan@kernel.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24security: protect legacy applications from executing with insufficient privilegeAndrew G. Morgan
When cap_bset suppresses some of the forced (fP) capabilities of a file, it is generally only safe to execute the program if it understands how to recognize it doesn't have enough privilege to work correctly. For legacy applications (fE!=0), which have no non-destructive way to determine that they are missing privilege, we fail to execute (EPERM) any executable that requires fP capabilities, but would otherwise get pP' < fP. This is a fail-safe permission check. For some discussion of why it is problematic for (legacy) privileged applications to run with less than the set of capabilities requested for them, see: http://userweb.kernel.org/~morgan/sendmail-capabilities-war-story.html With this iteration of this support, we do not include setuid-0 based privilege protection from the bounding set. That is, the admin can still (ab)use the bounding set to suppress the privileges of a setuid-0 program. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: cleanup] Signed-off-by: Andrew G. Morgan <morgan@kernel.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24mm: fix ever-decreasing swap priorityHugh Dickins
Vegard Nossum has noticed the ever-decreasing negative priority in a swapon /swapoff loop, which eventually would misprioritize when int wraps positive. Not worth spending much code on, but probably better fixed. It's easy to handle the swapping on and off of just one area, but there's not much point if a pair or more still misbehave. To handle the general case, swapoff should compact negative priorities, keeping them always from -1 to -MAX_SWAPFILES. That's a change, but should cause no regression, since these negative (unspecified) priorities are disjoint from the the positive specified priorities 0 to 32767. One small functional difference, which seems appropriate: when swapoff fails to free all swap from a negative priority area, that area is now reinserted at lowest priority, rather than at its original priority. In moving down swapon's setting of priority, I notice that an area is visible to /proc/swaps when it has swap_map set, yet that was being set before all the visible fields were properly filled in: corrected. Signed-off-by: Hugh Dickins <hugh@veritas.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reported-by: Vegard Nossum <vegard.nossum@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24mm: make CONFIG_MIGRATION available w/o CONFIG_NUMAGerald Schaefer
We'd like to support CONFIG_MEMORY_HOTREMOVE on s390, which depends on CONFIG_MIGRATION. So far, CONFIG_MIGRATION is only available with NUMA support. This patch makes CONFIG_MIGRATION selectable for architectures that define ARCH_ENABLE_MEMORY_HOTREMOVE. When MIGRATION is enabled w/o NUMA, the kernel won't compile because migrate_vmas() does not know about vm_ops->migrate() and vma_migratable() does not know about policy_zone. To fix this, those two functions can be restricted to '#ifdef CONFIG_NUMA' because they are not being used w/o NUMA. vma_migratable() is moved over from migrate.h to mempolicy.h. [kosaki.motohiro@jp.fujitsu.com: build fix] Acked-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: KOSAKI Motorhiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24kcalloc: remove runtime divisionMilton Miller
While in all cases in the kernel we know the size of the elements to be created, we don't always know the count of elements. By commuting the size and count in the overflow check, the compiler can reduce the runtime division of size_t with a compare to a (unique) constant in these cases. Signed-off-by: Milton Miller <miltonm@bga.com> Cc: Takashi Iwai <tiwai@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24memory-hotplug: add sysfs removable attribute for hotplug memory removeBadari Pulavarty
Memory may be hot-removed on a per-memory-block basis, particularly on POWER where the SPARSEMEM section size often matches the memory-block size. A user-level agent must be able to identify which sections of memory are likely to be removable before attempting the potentially expensive operation. This patch adds a file called "removable" to the memory directory in sysfs to help such an agent. In this patch, a memory block is considered removable if; o It contains only MOVABLE pageblocks o It contains only pageblocks with free pages regardless of pageblock type On the other hand, a memory block starting with a PageReserved() page will never be considered removable. Without this patch, the user-agent is forced to choose a memory block to remove randomly. Sample output of the sysfs files: ./memory/memory0/removable: 0 ./memory/memory1/removable: 0 ./memory/memory2/removable: 0 ./memory/memory3/removable: 0 ./memory/memory4/removable: 0 ./memory/memory5/removable: 0 ./memory/memory6/removable: 0 ./memory/memory7/removable: 1 ./memory/memory8/removable: 0 ./memory/memory9/removable: 0 ./memory/memory10/removable: 0 ./memory/memory11/removable: 0 ./memory/memory12/removable: 0 ./memory/memory13/removable: 0 ./memory/memory14/removable: 0 ./memory/memory15/removable: 0 ./memory/memory16/removable: 0 ./memory/memory17/removable: 1 ./memory/memory18/removable: 1 ./memory/memory19/removable: 1 ./memory/memory20/removable: 1 ./memory/memory21/removable: 1 ./memory/memory22/removable: 1 Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24memory-hotplug: don't calculate vm_total_pages twice when rebuilding ↵Kent Liu
zonelists in online_pages() If zonelist is required to be rebuilt in online_pages(), there is no need to recalculate vm_total_pages in that function, as it has been updated in the call build_all_zonelists(). Signed-off-by: Kent Liu <kent.liu@linux.intel.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24memory hotplug: small fixes to bootmem freeing for memory hotremoveYasunori Goto
- Change some naming * Magic -> types * MIX_INFO -> MIX_SECTION_INFO * Change definition of bootmem type from direct hex value - __free_pages_bootmem() becomes __meminit. Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Andy Whitcroft <apw@shadowen.org> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Cc: Johannes Weiner <hannes@saeurebad.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24memory hotplug: allocate usemap on the section with pgdatYasunori Goto
Usemaps are allocated on the section which has pgdat by this. Because usemap size is very small, many other sections usemaps are allocated on only one page. If a section has usemap, it can't be removed until removing other sections. This dependency is not desirable for memory removing. Pgdat has similar feature. When a section has pgdat area, it must be the last section for removing on the node. So, if section A has pgdat and section B has usemap for section A, Both sections can't be removed due to dependency each other. To solve this issue, this patch collects usemap on same section with pgdat as much as possible. If other sections doesn't have any dependency, this section will be able to be removed finally. Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Andy Whitcroft <apw@shadowen.org> Cc: David Miller <davem@davemloft.net> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Hiroyuki KAMEZAWA <kamezawa.hiroyu@jp.fujitsu.com> Cc: Tony Breeds <tony@bakeyournoodle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24mm: remove initialization of static per-cpu variablesVegard Nossum
This was required by some old, no-longer-used gcc on sparc. Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24PAGE_ALIGN(): correctly handle 64-bit values on 32-bit architecturesAndrea Righi
On 32-bit architectures PAGE_ALIGN() truncates 64-bit values to the 32-bit boundary. For example: u64 val = PAGE_ALIGN(size); always returns a value < 4GB even if size is greater than 4GB. The problem resides in PAGE_MASK definition (from include/asm-x86/page.h for example): #define PAGE_SHIFT 12 #define PAGE_SIZE (_AC(1,UL) << PAGE_SHIFT) #define PAGE_MASK (~(PAGE_SIZE-1)) ... #define PAGE_ALIGN(addr) (((addr)+PAGE_SIZE-1)&PAGE_MASK) The "~" is performed on a 32-bit value, so everything in "and" with PAGE_MASK greater than 4GB will be truncated to the 32-bit boundary. Using the ALIGN() macro seems to be the right way, because it uses typeof(addr) for the mask. Also move the PAGE_ALIGN() definitions out of include/asm-*/page.h in include/linux/mm.h. See also lkml discussion: http://lkml.org/lkml/2008/6/11/237 [akpm@linux-foundation.org: fix drivers/media/video/uvc/uvc_queue.c] [akpm@linux-foundation.org: fix v850] [akpm@linux-foundation.org: fix powerpc] [akpm@linux-foundation.org: fix arm] [akpm@linux-foundation.org: fix mips] [akpm@linux-foundation.org: fix drivers/media/video/pvrusb2/pvrusb2-dvb.c] [akpm@linux-foundation.org: fix drivers/mtd/maps/uclinux.c] [akpm@linux-foundation.org: fix powerpc] Signed-off-by: Andrea Righi <righi.andrea@gmail.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24mm: make register_page_bootmem_info_section() staticAdrian Bunk
Make the needlessly global register_page_bootmem_info_section() static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24mm/page_alloc.c: cleanupsAdrian Bunk
This patch contains the following cleanups: - make the following needlessly global variables static: - required_kernelcore - zone_movable_pfn[] - make the following needlessly global functions static: - move_freepages() - move_freepages_block() - setup_pageset() - find_usable_zone_for_movable() - adjust_zone_range_for_zone_movable() - __absent_pages_in_range() - find_min_pfn_for_node() - find_zone_movable_pfns_for_nodes() Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24mm: add alloc_pages_exact() and free_pages_exact()Timur Tabi
alloc_pages_exact() is similar to alloc_pages(), except that it allocates the minimum number of pages to fulfill the request. This is useful if you want to allocate a very large buffer that is slightly larger than an even power-of-two number of pages. In that case, alloc_pages() will waste a lot of memory. I have a video driver that wants to allocate a 5MB buffer. alloc_pages() wiill waste 3MB of physically-contiguous memory. Signed-off-by: Timur Tabi <timur@freescale.com> Cc: Andi Kleen <andi@firstfloor.org> Acked-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24bootmem: replace node_boot_start in struct bootmem_dataJohannes Weiner
Almost all users of this field need a PFN instead of a physical address, so replace node_boot_start with node_min_pfn. [Lee.Schermerhorn@hp.com: fix spurious BUG_ON() in mark_bootmem()] Signed-off-by: Johannes Weiner <hannes@saeureba.de> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24bootmem: revisit alloc_bootmem_sectionJohannes Weiner
Since alloc_bootmem_core does no goal-fallback anymore and just returns NULL if the allocation fails, we might now use it in alloc_bootmem_section without all the fixup code for a misplaced allocation. Also, the limit can be the first PFN of the next section as the semantics is that the limit is _above_ the allocated region, not within. Signed-off-by: Johannes Weiner <hannes@saeurebad.de> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24bootmem: Make __alloc_bootmem_low_node fall back to other nodesJohannes Weiner
__alloc_bootmem_node already does this, make the interface consistent. Signed-off-by: Johannes Weiner <hannes@saeurebad.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24bootmem: respect goal more likelyJohannes Weiner
The old node-agnostic code tried allocating on all nodes starting from the one with the lowest range. alloc_bootmem_core retried without the goal if it could not satisfy it and so the goal was only respected at all when it happened to be on the first (lowest page numbers) node (or theoretically if allocations failed on all nodes before to the one holding the goal). Introduce a non-panicking helper that starts allocating from the node holding the goal and falls back only after all thes tries failed, thus moving the goal fallback code out of alloc_bootmem_core. Make all other allocation functions benefit from this new helper. Signed-off-by: Johannes Weiner <hannes@saeurebad.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24bootmem: factor out the marking of a PFN rangeJohannes Weiner
Introduce new helpers that mark a range that resides completely on a node or node-agnostic ranges that might also span node boundaries. The free/reserve API functions will then directly use these helpers. Note that the free/reserve semantics become more strict: while the prior code took basically arbitrary range arguments and marked the PFNs that happen to fall into that range, the new code requires node-specific ranges to be completely on the node. The node-agnostic requests might span node boundaries as long as the nodes are contiguous. Passing ranges that do not satisfy these criteria is a bug. [akpm@linux-foundation.org: fix printk warnings] Signed-off-by: Johannes Weiner <hannes@saeurebad.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24bootmem: free/reserve helpersJohannes Weiner
Factor out the common operation of marking a range on the bitmap. [akpm@linux-foundation.org: fix various warnings] Signed-off-by: Johannes Weiner <hannes@saeurebad.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24bootmem: clean up alloc_bootmem_coreJohannes Weiner
alloc_bootmem_core has become quite nasty to read over time. This is a clean rewrite that keeps the semantics. bdata->last_pos has been dropped. bdata->last_success has been renamed to hint_idx and it is now an index relative to the node's range. Since further block searching might start at this index, it is now set to the end of a succeeded allocation rather than its beginning. bdata->last_offset has been renamed to last_end_off to be more clear that it represents the ending address of the last allocation relative to the node. [y-goto@jp.fujitsu.com: fix new alloc_bootmem_core()] Signed-off-by: Johannes Weiner <hannes@saeurebad.de> Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24bootmem: clean up free_all_bootmem_coreJohannes Weiner
Rewrite the code in a more concise way using less variables. [akpm@linux-foundation.org: fix printk warnings] Signed-off-by: Johannes Weiner <hannes@saeurebad.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24bootmem: revisit bootmem descriptor list handlingJohannes Weiner
link_bootmem handles an insertion of a new descriptor into the sorted list in more or less three explicit branches; empty list, insert in between and append. These cases can be expressed implicite. Also mark the sorted list as initdata as it can be thrown away after boot as well. Signed-off-by: Johannes Weiner <hannes@saeurebad.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24bootmem: revisit bitmap size calculationsJohannes Weiner
Reincarnate get_mapsize as bootmap_bytes and implement bootmem_bootmap_pages on top of it. Adjust users of these helpers and make free_all_bootmem_core use bootmem_bootmap_pages instead of open-coding it. Signed-off-by: Johannes Weiner <hannes@saeurebad.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24bootmem: add debugging frameworkJohannes Weiner
Introduce the bootmem_debug kernel parameter that enables very verbose diagnostics regarding all range operations of bootmem as well as the initialization and release of nodes. [akpm@linux-foundation.org: fix printk warnings] Signed-off-by: Johannes Weiner <hannes@saeurebad.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24bootmem: add documentation to API functionsJohannes Weiner
Signed-off-by: Johannes Weiner <hannes@saeurebad.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24bootmem: clean up bootmem.c file headerJohannes Weiner
Change the description, move a misplaced comment about the allocator itself and add me to the list of copyright holders. Signed-off-by: Johannes Weiner <hannes@saeurebad.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24bootmem: reorder code to match new bootmem structureJohannes Weiner
This only reorders functions so that further patches will be easier to read. No code changed. Signed-off-by: Johannes Weiner <hannes@saeurebad.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24hugetlb: quota is not freed for unused reserved private huge pagesAdam Litke
With shared reservations (and now also with private reservations), we reserve huge pages at mmap time. We also account for the mapping against fs quota to prevent a reservation from being preempted by quota exhaustion. When testing with the libhugetlbfs test suite, I found a problem with quota accounting. FS quota for allocated pages is handled correctly but we are not releasing quota for private pages that were reserved but never allocated. Do this in hugetlb_vm_op_close() at the same time as unused page reservations are released. Signed-off-by: Adam Litke <agl@us.ibm.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Johannes Weiner <hannes@saeurebad.de> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: Hugh Dickins <hugh@veritas.com> Acked-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24hugetlb: fix a hugepage reservation check for MAP_SHAREDMel Gorman
When removing a huge page from the hugepage pool for a fault the system checks to see if the mapping requires additional pages to be reserved, and if it does whether there are any unreserved pages remaining. If not, the allocation fails without even attempting to get a page. In order to determine whether to apply this check we call vma_has_private_reserves() which tells us if this vma is MAP_PRIVATE and is the owner. This incorrectly triggers the remaining reservation test for MAP_SHARED mappings which prevents allocation of the final page in the pool even though it is reserved for this mapping. In reality we only want to check this for MAP_PRIVATE mappings where the process is not the original mapper. Replace vma_has_private_reserves() with vma_has_reserves() which indicates whether further reserves are required, and update the caller. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Adam Litke <agl@us.ibm.com> Acked-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24powerpc: support multiple hugepage sizesJon Tollefson
Instead of using the variable mmu_huge_psize to keep track of the huge page size we use an array of MMU_PAGE_* values. For each supported huge page size we need to know the hugepte_shift value and have a pgtable_cache. The hstate or an mmu_huge_psizes index is passed to functions so that they know which huge page size they should use. The hugepage sizes 16M and 64K are setup(if available on the hardware) so that they don't have to be set on the boot cmd line in order to use them. The number of 16G pages have to be specified at boot-time though (e.g. hugepagesz=16G hugepages=5). Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24fs: check for statfs overflowJon Tollefson
Adds a check for an overflow in the filesystem size so if someone is checking with statfs() on a 16G blocksize hugetlbfs in a 32bit binary that it will report back EOVERFLOW instead of a size of 0. Acked-by: Nishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24powerpc: define support for 16G hugepagesJon Tollefson
The huge page size is defined for 16G pages. If a hugepagesz of 16G is specified at boot-time then it becomes the huge page size instead of the default 16M. The change in pgtable-64K.h is to the macro pte_iterate_hashed_subpages to make the increment to va (the 1 being shifted) be a long so that it is not shifted to 0. Otherwise it would create an infinite loop when the shift value is for a 16G page (when base page size is 64K). Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24powerpc: scan device tree for gigantic pagesJon Tollefson
The 16G huge pages have to be reserved in the HMC prior to boot. The location of the pages are placed in the device tree. This patch adds code to scan the device tree during very early boot and save these page locations until hugetlbfs is ready for them. Acked-by: Adam Litke <agl@us.ibm.com> Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24powerpc: function to allocate gigantic hugepagesJon Tollefson
The 16G page locations have been saved during early boot in an array. The alloc_bootmem_huge_page() function adds a page from here to the huge_boot_pages list. Acked-by: Adam Litke <agl@us.ibm.com> Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24hugetlb: allow arch overridden hugepage allocationJon Tollefson
Allow alloc_bootmem_huge_page() to be overridden by architectures that can't always use bootmem. This requires huge_boot_pages to be available for use by this function. This is required for powerpc 16G pages, which have to be reserved prior to boot-time. The location of these pages are indicated in the device tree. Acked-by: Adam Litke <agl@us.ibm.com> Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24hugetlb: override default huge page sizeNick Piggin
Allow configurations with the default huge page size which is different to the traditional HPAGE_SIZE size. The default huge page size is the one represented in the legacy /proc ABIs, SHM, and which is defaulted to when mounting hugetlbfs filesystems. This is implemented with a new kernel option default_hugepagesz=, which defaults to HPAGE_SIZE if not specified. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24x86: add hugepagesz option on 64-bitAndi Kleen
Add an hugepagesz=... option similar to IA64, PPC etc. to x86-64. This finally allows to select GB pages for hugetlbfs in x86 now that all the infrastructure is in place. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24x86: support GB hugepages on 64-bitAndi Kleen
Acked-by: Adam Litke <agl@us.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24hugetlb: introduce pud_hugeAndi Kleen
Straight forward extensions for huge pages located in the PUD instead of PMDs. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24hugetlb: printk cleanupAndi Kleen
- Reword sentence to clarify meaning with multiple options - Add support for using GB prefixes for the page size - Add extra printk to delayed > MAX_ORDER allocation code Acked-by: Adam Litke <agl@us.ibm.com> Acked-by: Nishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24hugetlb: support boot allocate different sizesAndi Kleen
Make some infrastructure changes to allow boot-time allocation of different hugepage page sizes. - move all basic hstate initialisation into hugetlb_add_hstate - create a new function hugetlb_hstate_alloc_pages() to do the actual initial page allocations. Call this function early in order to allocate giant pages from bootmem. - Check for multiple hugepages= parameters Acked-by: Adam Litke <agl@us.ibm.com> Acked-by: Nishanth Aravamudan <nacc@us.ibm.com> Acked-by: Andrew Hastings <abh@cray.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24hugetlb: support larger than MAX_ORDERAndi Kleen
This is needed on x86-64 to handle GB pages in hugetlbfs, because it is not practical to enlarge MAX_ORDER to 1GB. Instead the 1GB pages are only allocated at boot using the bootmem allocator using the hugepages=... option. These 1G bootmem pages are never freed. In theory it would be possible to implement that with some complications, but since it would be a one-way street (>= MAX_ORDER pages cannot be allocated later) I decided not to currently. The >= MAX_ORDER code is not ifdef'ed per architecture. It is not very big and the ifdef uglyness seemed not be worth it. Known problems: /proc/meminfo and "free" do not display the memory allocated for gb pages in "Total". This is a little confusing for the user. Acked-by: Andrew Hastings <abh@cray.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24mm: export prep_compound_page to mmAndi Kleen
hugetlb will need to get compound pages from bootmem to handle the case of them being greater than or equal to MAX_ORDER. Export the constructor function needed for this. Acked-by: Adam Litke <agl@us.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24mm: introduce non panic alloc_bootmemAndi Kleen
Straight forward variant of the existing __alloc_bootmem_node, only subsequent patch when allocating giant hugepages at boot -- don't want to panic if we can't allocate as many as the user asked for. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>