aboutsummaryrefslogtreecommitdiff
path: root/arch/powerpc/mm/slb.c
AgeCommit message (Collapse)Author
2009-08-20Merge commit 'paulus-perf/master' into nextBenjamin Herrenschmidt
2009-08-20powerpc: Preload application text segment instead of TASK_UNMAPPED_BASEAnton Blanchard
TASK_UNMAPPED_BASE is not used with the new top down mmap layout. We can reuse this preload slot by loading in the segment at 0x10000000, where almost all PowerPC binaries are linked at. On a microbenchmark that bounces a token between two 64bit processes over pipes and calls gettimeofday each iteration (to access the VDSO), both the 32bit and 64bit context switch rate improves (tested on a 4GHz POWER6): 32bit: 273k/sec -> 283k/sec 64bit: 277k/sec -> 284k/sec Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20powerpc: Rearrange SLB preload codeAnton Blanchard
With the new top down layout it is likely that the pc and stack will be in the same segment, because the pc is most likely in a library allocated via a top down mmap. Right now we bail out early if these segments match. Rearrange the SLB preload code to sanity check all SLB preload addresses are not in the kernel, then check all addresses for conflicts. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-18powerpc: Allow perf_counters to access user memory at interrupt timePaul Mackerras
This provides a mechanism to allow the perf_counters code to access user memory in a PMU interrupt routine. Such an access can cause various kinds of interrupt: SLB miss, MMU hash table miss, segment table miss, or TLB miss, depending on the processor. This commit only deals with 64-bit classic/server processors, which use an MMU hash table. 32-bit processors are already able to access user memory at interrupt time. Since we don't soft-disable on 32-bit, we avoid the possibility of reentering hash_page or the TLB miss handlers, since they run with interrupts disabled. On 64-bit processors, an SLB miss interrupt on a user address will update the slb_cache and slb_cache_ptr fields in the paca. This is OK except in the case where a PMU interrupt occurs in switch_slb, which also accesses those fields. To prevent this, we hard-disable interrupts in switch_slb. Interrupts are already soft-disabled at this point, and will get hard-enabled when they get soft-enabled later. This also reworks slb_flush_and_rebolt: to avoid hard-disabling twice, and to make sure that it clears the slb_cache_ptr when called from other callers than switch_slb, the existing routine is renamed to __slb_flush_and_rebolt, which is called by switch_slb and the new version of slb_flush_and_rebolt. Similarly, switch_stab (used on POWER3 and RS64 processors) gets a hard_irq_disable() to protect the per-cpu variables used there and in ste_allocate. If a MMU hashtable miss interrupt occurs, normally we would call hash_page to look up the Linux PTE for the address and create a HPTE. However, hash_page is fairly complex and takes some locks, so to avoid the possibility of deadlock, we check the preemption count to see if we are in a (pseudo-)NMI handler, and if so, we don't call hash_page but instead treat it like a bad access that will get reported up through the exception table mechanism. An interrupt whose handler runs even though the interrupt occurred when soft-disabled (such as the PMU interrupt) is considered a pseudo-NMI handler, which should use nmi_enter()/nmi_exit() rather than irq_enter()/irq_exit(). Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-07-08powerpc: Cleanup & use pr_devel() in arch/powerpc/mm/slb.cMichael Ellerman
pr_debug() can now result in code being generated even when DEBUG is not defined. That's not really desirable in some places. With CONFIG_DYNAMIC_DEBUG=y: size before: text data bss dec hex filename 3261 416 4 3681 e61 arch/powerpc/mm/slb.o size after: text data bss dec hex filename 2861 248 4 3113 c29 arch/powerpc/mm/slb.o Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-06-12trivial: spelling fix in ppc code commentsSankar P
Fixes a trivial spelling error in powerpc code comments. Signed-off-by: Sankar P <sankar.curiosity@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2008-05-15[POWERPC] vmemmap fixes to use smaller pagesBenjamin Herrenschmidt
This changes vmemmap to use a different region (region 0xf) of the address space, and to configure the page size of that region dynamically at boot. The problem with the current approach of always using 16M pages is that it's not well suited to machines that have small amounts of memory such as small partitions on pseries, or PS3's. In fact, on the PS3, failure to allocate the 16M page backing vmmemmap tends to prevent hotplugging the HV's "additional" memory, thus limiting the available memory even more, from my experience down to something like 80M total, which makes it really not very useable. The logic used by my match to choose the vmemmap page size is: - If 16M pages are available and there's 1G or more RAM at boot, use that size. - Else if 64K pages are available, use that - Else use 4K pages I've tested on a POWER6 (16M pages) and on an iSeries POWER3 (4K pages) and it seems to work fine. Note that I intend to change the way we organize the kernel regions & SLBs so the actual region will change from 0xf back to something else at one point, as I simplify the SLB miss handler, but that will be for a later patch. Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-05-02[POWERPC] Bolt in SLB entry for kernel stack on secondary cpusPaul Mackerras
This fixes a regression reported by Kamalesh Bulabel where a POWER4 machine would crash because of an SLB miss at a point where the SLB miss exception was unrecoverable. This regression is tracked at: http://bugzilla.kernel.org/show_bug.cgi?id=10082 SLB misses at such points shouldn't happen because the kernel stack is the only memory accessed other than things in the first segment of the linear mapping (which is mapped at all times by entry 0 of the SLB). The context switch code ensures that SLB entry 2 covers the kernel stack, if it is not already covered by entry 0. None of entries 0 to 2 are ever replaced by the SLB miss handler. Where this went wrong is that the context switch code assumes it doesn't have to write to SLB entry 2 if the new kernel stack is in the same segment as the old kernel stack, since entry 2 should already be correct. However, when we start up a secondary cpu, it calls slb_initialize, which doesn't set up entry 2. This is correct for the boot cpu, where we will be using a stack in the kernel BSS at this point (i.e. init_thread_union), but not necessarily for secondary cpus, whose initial stack can be allocated anywhere. This doesn't cause any immediate problem since the SLB miss handler will just create an SLB entry somewhere else to cover the initial stack. In fact it's possible for the cpu to go quite a long time without SLB entry 2 being valid. Eventually, though, the entry created by the SLB miss handler will get overwritten by some other entry, and if the next access to the stack is at an unrecoverable point, we get the crash. This fixes the problem by making slb_initialize create a suitable entry for the kernel stack, if we are on a secondary cpu and the stack isn't covered by SLB entry 0. This requires initializing the get_paca()->kstack field earlier, so I do that in smp_create_idle where the current field is initialized. This also abstracts a bit of the computation that mk_esid_data in slb.c does so that it can be used in slb_initialize. Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-05-02[POWERPC] Fix slb.c compile warningsGeoff Levand
Arrange for a syntax check to always be done on the powerpc/mm/slb.c DBG() macro by defining it to pr_debug() for non-debug builds. Also, fix these related compile warnings: slb.c:273: warning: format '%04x' expects type 'unsigned int', but argument 2 has type 'long unsigned int slb.c:274: warning: format '%04x' expects type 'unsigned int', but argument 2 has type 'long unsigned int' Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-03-20[POWERPC] Fix PMU + soft interrupt disable bugAnton Blanchard
Since the PMU is an NMI now, it can come at any time we are only soft disabled. We must hard disable around the two places we allow the kernel stack SLB and r1 to go out of sync. Otherwise the PMU exception can force a kernel stack SLB into another slot, which can lead to it getting evicted, which can lead to a nasty unrecoverable SLB miss in the exception entry code. Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Olof Johansson <olof@lixom.net> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-01-24Merge branch 'linux-2.6'Paul Mackerras
2008-01-15[POWERPC] Fix boot failure on POWER6Paul Mackerras
Commit 473980a99316c0e788bca50996375a2815124ce1 added a call to clear the SLB shadow buffer before registering it. Unfortunately this means that we clear out the entries that slb_initialize has previously set in there. On POWER6, the hypervisor uses the SLB shadow buffer when doing partition switches, and that means that after the next partition switch, each non-boot CPU has no SLB entries to map the kernel text and data, which causes it to crash. This fixes it by reverting most of 473980a9 and instead clearing the 3rd entry explicitly in slb_initialize. This fixes the problem that 473980a9 was trying to solve, but without breaking POWER6. Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-01-11[POWERPC] Fix CPU hotplug when using the SLB shadow bufferMichael Neuling
Before we register the SLB shadow buffer, we need to invalidate the entries in the buffer, otherwise we can end up stale entries from when we previously offlined the CPU. This does this invalidate as well as unregistering the buffer with PHYP before we offline the cpu. Tested and fixes crashes seen on 970MP (thanks to tonyb) and POWER5. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-12-11[POWERPC] Use SLB size from the device treeMichael Neuling
Currently we hardwire the number of SLBs to 64, but PAPR says we should use the ibm,slb-size property to obtain the number of SLB entries. This uses this property instead of assuming 64. If no property is found, we assume 64 entries as before. This soft patches the SLB handler, so it shouldn't change performance at all. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08[POWERPC] Fix switch_slb handling of 1T ESID valueswill schmidt
Now that we have 1TB segment size support, we need to be using the GET_ESID_1T macro when comparing ESID values for pc, stack, and unmapped_base within switch_slb(). A new helper function called esids_match() contains the logic for deciding when to call GET_ESID and GET_ESID_1T. This fixes a duplicate-slb-entry inspired machine-check exception I was seeing when trying to run java on a power6 partition. Tested on power6 and power5. Signed-off-by: Will Schmidt <will_schmidt@vnet.ibm.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08[POWERPC] Include udbg.h when using udbg_printfwill schmidt
This fixes the error error: implicit declaration of function "udbg_printf" We have a few spots where we reference udbg_printf() without #including udbg.h. These are within #ifdef DEBUG blocks, so unnoticed until we do a #define DEBUG or #define DEBUG_LOW nearby. Signed-off-by: Will Schmidt <will_schmidt@vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-10-17[POWERPC] Add 1TB workaround for PA6TOlof Johansson
PA6T has a bug where the slbie instruction does not honor the large segment bit. As a result, we have to always use slbia when switching context. We don't have to worry about changing the slbie's during fault processing, since they should never be replacing one VSID with another using the same ESID. I.e. there's no risk for inserting duplicate entries due to a failed slbie of the old entry. So as long as we clear it out on context switch we should be fine. Signed-off-by: Olof Johansson <olof@lixom.net> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-10-12[POWERPC] Use 1TB segmentsPaul Mackerras
This makes the kernel use 1TB segments for all kernel mappings and for user addresses of 1TB and above, on machines which support them (currently POWER5+, POWER6 and PA6T). We detect that the machine supports 1TB segments by looking at the ibm,processor-segment-sizes property in the device tree. We don't currently use 1TB segments for user addresses < 1T, since that would effectively prevent 32-bit processes from using huge pages unless we also had a way to revert to using 256MB segments. That would be possible but would involve extra complications (such as keeping track of which segment size was used when HPTEs were inserted) and is not addressed here. Parts of this patch were originally written by Ben Herrenschmidt. Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-09-19[POWERPC] Remove barriers from the SLB shadow buffer updateMichael Neuling
After talking to an IBM POWER hypervisor (PHYP) design and development guy, there seems to be no need for memory barriers when updating the SLB shadow buffer provided we only update it from the current CPU, which we do. Also, these guys see no need in the future for these barriers. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-08-25[POWERPC] Fix SLB initialization at boot timePaul Mackerras
This partially reverts edd0622bd2e8f755c960827e15aa6908c3c5aa94. It turns out that the part of that commit that aimed to ensure that we created an SLB entry for the kernel stack on secondary CPUs when starting the CPU didn't achieve its aim, and in fact caused a regression, because get_paca()->kstack is not initialized at the point where slb_initialize is called. This therefore just reverts that part of that commit, while keeping the change to slb_flush_and_rebolt, which is correct and necessary. Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-08-10[POWERPC] Fix potential duplicate entry in SLB shadow bufferPaul Mackerras
We were getting a duplicate entry in the SLB shadow buffer in slb_flush_and_rebolt() if the kernel stack was in the same segment as PAGE_OFFSET, which on POWER6 causes the hypervisor to terminate the partition with an error. This fixes it. Also we were not creating an SLB entry (or an SLB shadow buffer entry) for the kernel stack on secondary CPUs when starting the CPU. This isn't a major problem, since an appropriate entry will be created on demand, but this fixes that also for consistency. Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-08-03[POWERPC] Fixes for the SLB shadow buffer codeMichael Neuling
On a machine with hardware 64kB pages and a kernel configured for a 64kB base page size, we need to change the vmalloc segment from 64kB pages to 4kB pages if some driver creates a non-cacheable mapping in the vmalloc area. However, we never updated with SLB shadow buffer. This fixes it. Thanks to paulus for finding this. Also added some write barriers to ensure the shadow buffer contents are always consistent. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-05-09[POWERPC] Introduce address space "slices"Benjamin Herrenschmidt
The basic issue is to be able to do what hugetlbfs does but with different page sizes for some other special filesystems; more specifically, my need is: - Huge pages - SPE local store mappings using 64K pages on a 4K base page size kernel on Cell - Some special 4K segments in 64K-page kernels for mapping a dodgy type of powerpc-specific infiniband hardware that requires 4K MMU mappings for various reasons I won't explain here. The main issues are: - To maintain/keep track of the page size per "segment" (as we can only have one page size per segment on powerpc, which are 256MB divisions of the address space). - To make sure special mappings stay within their allotted "segments" (including MAP_FIXED crap) - To make sure everybody else doesn't mmap/brk/grow_stack into a "segment" that is used for a special mapping Some of the necessary mechanisms to handle that were present in the hugetlbfs code, but mostly in ways not suitable for anything else. The patch relies on some changes to the generic get_unmapped_area() that just got merged. It still hijacks hugetlb callbacks here or there as the generic code hasn't been entirely cleaned up yet but that shouldn't be a problem. So what is a slice ? Well, I re-used the mechanism used formerly by our hugetlbfs implementation which divides the address space in "meta-segments" which I called "slices". The division is done using 256MB slices below 4G, and 1T slices above. Thus the address space is divided currently into 16 "low" slices and 16 "high" slices. (Special case: high slice 0 is the area between 4G and 1T). Doing so simplifies significantly the tracking of segments and avoids having to keep track of all the 256MB segments in the address space. While I used the "concepts" of hugetlbfs, I mostly re-implemented everything in a more generic way and "ported" hugetlbfs to it. Slices can have an associated page size, which is encoded in the mmu context and used by the SLB miss handler to set the segment sizes. The hash code currently doesn't care, it has a specific check for hugepages, though I might add a mechanism to provide per-slice hash mapping functions in the future. The slice code provide a pair of "generic" get_unmapped_area() (bottomup and topdown) functions that should work with any slice size. There is some trickiness here so I would appreciate people to have a look at the implementation of these and let me know if I got something wrong. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-12-04[POWERPC] iSeries: fix slb.c for combined buildStephen Rothwell
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-08-08[POWERPC] Implement SLB shadow bufferMichael Neuling
This adds a shadow buffer for the SLBs and regsiters it with PHYP. Only the bolted SLB entries (top 3) are shadowed. The SLB shadow buffer tells the hypervisor what the kernel needs to have in the SLB for the kernel to be able to function. The hypervisor can use this information to speed up partition context switches. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-30Remove obsolete #include <linux/config.h>Jörn Engel
Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-15powerpc: Use 64k pages without needing cache-inhibited large pagesPaul Mackerras
Some POWER5+ machines can do 64k hardware pages for normal memory but not for cache-inhibited pages. This patch lets us use 64k hardware pages for most user processes on such machines (assuming the kernel has been configured with CONFIG_PPC_64K_PAGES=y). User processes start out using 64k pages and get switched to 4k pages if they use any non-cacheable mappings. With this, we use 64k pages for the vmalloc region and 4k pages for the imalloc region. If anything creates a non-cacheable mapping in the vmalloc region, the vmalloc region will get switched to 4k pages. I don't know of any driver other than the DRM that would do this, though, and these machines don't have AGP. When a region gets switched from 64k pages to 4k pages, we do not have to clear out all the 64k HPTEs from the hash table immediately. We use the _PAGE_COMBO bit in the Linux PTE to indicate whether the page was hashed in as a 64k page or a set of 4k pages. If hash_page is trying to insert a 4k page for a Linux PTE and it sees that it has already been inserted as a 64k page, it first invalidates the 64k HPTE before inserting the 4k HPTE. The hash invalidation routines also use the _PAGE_COMBO bit, to determine whether to look for a 64k HPTE or a set of 4k HPTEs to remove. With those two changes, we can tolerate a mix of 4k and 64k HPTEs in the hash table, and they will all get removed when the address space is torn down. Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-12powerpc: Remove unused paca->pgdir fieldPaul Mackerras
The pgdir field in the paca was a leftover from the dynamic VSIDs patch, and is not used in the current kernel code. This removes it. Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09[PATCH] powerpc: Replace VMALLOCBASE with VMALLOC_STARTDavid Gibson
On ppc64, we independently define VMALLOCBASE and VMALLOC_START to be the same thing: the start of the vmalloc() area at 0xd000000000000000. VMALLOC_START is used much more widely, including in generic code, so this patch gets rid of the extraneous VMALLOCBASE. This does require moving the definitions of region IDs from page_64.h to pgtable.h, but they don't clearly belong in the former rather than the latter, anyway. While we're moving them, clean up the definitions of the REGION_IDs: - Abolish REGION_SIZE, it was only used once, to define REGION_MASK anyway - Define the specific region ids in terms of the REGION_ID() macro. - Define KERNEL_REGION_ID in terms of PAGE_OFFSET rather than KERNELBASE. It amounts to the same thing, but conceptually this is about the region of the linear mapping (which starts at PAGE_OFFSET) rather than of the kernel text itself (which is at KERNELBASE). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09[PATCH] powerpc: Separate usage of KERNELBASE and PAGE_OFFSETMichael Ellerman
This patch separates usage of KERNELBASE and PAGE_OFFSET. I haven't looked at any of the PPC32 code, if we ever want to support Kdump on PPC we'll have to do another audit, ditto for iSeries. This patch makes PAGE_OFFSET the constant, it'll always be 0xC * 1 gazillion for 64-bit. To get a physical address from a virtual one you subtract PAGE_OFFSET, _not_ KERNELBASE. KERNELBASE is the virtual address of the start of the kernel, it's often the same as PAGE_OFFSET, but _might not be_. If you want to know something's offset from the start of the kernel you should subtract KERNELBASE. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09[PATCH] powerpc: Add a is_kernel_addr() macroMichael Ellerman
There's a bunch of code that compares an address with KERNELBASE to see if it's a "kernel address", ie. >= KERNELBASE. The proper test is actually to compare with PAGE_OFFSET, since we're going to change KERNELBASE soon. So replace all of them with an is_kernel_addr() macro that does that. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-11-06[PATCH] ppc64: support 64k pagesBenjamin Herrenschmidt
Adds a new CONFIG_PPC_64K_PAGES which, when enabled, changes the kernel base page size to 64K. The resulting kernel still boots on any hardware. On current machines with 4K pages support only, the kernel will maintain 16 "subpages" for each 64K page transparently. Note that while real 64K capable HW has been tested, the current patch will not enable it yet as such hardware is not released yet, and I'm still verifying with the firmware architects the proper to get the information from the newer hypervisors. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-10powerpc: Merge arch/ppc64/mm to arch/powerpc/mmPaul Mackerras
This moves the remaining files in arch/ppc64/mm to arch/powerpc/mm, and arranges that we use them when compiling with ARCH=ppc64. Signed-off-by: Paul Mackerras <paulus@samba.org>