aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2007-05-02[PATCH] x86-64: configurable fake numa node sizesDavid Rientjes
Extends the numa=fake x86_64 command-line option to allow for configurable node sizes. These nodes can be used in conjunction with cpusets for coarse memory resource management. The old command-line option is still supported: numa=fake=32 gives 32 fake NUMA nodes, ignoring the NUMA setup of the actual machine. But now you may configure your system for the node sizes of your choice: numa=fake=2*512,1024,2*256 gives two 512M nodes, one 1024M node, two 256M nodes, and the rest of system memory to a sixth node. The existing hash function is maintained to support the various node sizes that are possible with this implementation. Each node of the same size receives roughly the same amount of available pages, regardless of any reserved memory with its address range. The total available pages on the system is calculated and divided by the number of equal nodes to allocate. These nodes are then dynamically allocated and their borders extended until such time as their number of available pages reaches the required size. Configurable node sizes are recommended when used in conjunction with cpusets for memory control because it eliminates the overhead associated with scanning the zonelists of many smaller full nodes on page_alloc(). Cc: Andi Kleen <ak@suse.de> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Paul Jackson <pj@sgi.com> Cc: Christoph Lameter <clameter@engr.sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-05-02[PATCH] x86: Log reason why TSC was marked unstablejohn stultz
Change mark_tsc_unstable() so it takes a string argument, which holds the reason the TSC was marked unstable. This is then displayed the first time mark_tsc_unstable is called. This should help us better debug why the TSC was marked unstable on certain systems and allow us to make sure we're not being overly paranoid when throwing out this troublesome clocksource. Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02[PATCH] i386: modpost apic related warning fixesVivek Goyal
o Modpost generates warnings for i386 if compiled with CONFIG_RELOCATABLE=y WARNING: vmlinux - Section mismatch: reference to .init.text:find_unisys_acpi_oem_table from .text between 'acpi_madt_oem_check' (at offset 0xc0101eda) and 'enable_apic_mode' WARNING: vmlinux - Section mismatch: reference to .init.text:acpi_get_table_header_early from .text between 'acpi_madt_oem_check' (at offset 0xc0101ef0) and 'enable_apic_mode' WARNING: vmlinux - Section mismatch: reference to .init.text:parse_unisys_oem from .text between 'acpi_madt_oem_check' (at offset 0xc0101f2e) and 'enable_apic_mode' WARNING: vmlinux - Section mismatch: reference to .init.text:setup_unisys from .text between 'acpi_madt_oem_check' (at offset 0xc0101f37) and 'enable_apic_mode'WARNING: vmlinux - Section mismatch: reference to .init.text:parse_unisys_oem from .text between 'mps_oem_check' (at offset 0xc0101ec7) and 'acpi_madt_oem_check' WARNING: vmlinux - Section mismatch: reference to .init.text:es7000_sw_apic from .text between 'enable_apic_mode' (at offset 0xc0101f48) and 'check_apicid_present' o Some functions which are inline (acpi_madt_oem_check) are not inlined by compiler as these functions are accessed using function pointer. These functions are put in .text section and they in-turn access __init type functions hence modpost generates warnings. o Do not iniline acpi_madt_oem_check, instead make it __init. Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Cc: Len Brown <lenb@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-05-02[PATCH] x86-64: Set HASHDIST_DEFAULT to 1 for x86_64 NUMARavikiran G Thirumalai
Enable system hashtable memory to be distributed among nodes on x86_64 NUMA Forcing the kernel to use node interleaved vmalloc instead of bootmem for the system hashtable memory (alloc_large_system_hash) reduces the memory imbalance on node 0 by around 40MB on a 8 node x86_64 NUMA box: Before the following patch, on bootup of a 8 node box: Node 0 MemTotal: 3407488 kB Node 0 MemFree: 3206296 kB Node 0 MemUsed: 201192 kB Node 0 Active: 7012 kB Node 0 Inactive: 512 kB Node 0 Dirty: 0 kB Node 0 Writeback: 0 kB Node 0 FilePages: 1912 kB Node 0 Mapped: 420 kB Node 0 AnonPages: 5612 kB Node 0 PageTables: 468 kB Node 0 NFS_Unstable: 0 kB Node 0 Bounce: 0 kB Node 0 Slab: 5408 kB Node 0 SReclaimable: 644 kB Node 0 SUnreclaim: 4764 kB After the patch (or using hashdist=1 on the kernel command line): Node 0 MemTotal: 3407488 kB Node 0 MemFree: 3247608 kB Node 0 MemUsed: 159880 kB Node 0 Active: 3012 kB Node 0 Inactive: 616 kB Node 0 Dirty: 0 kB Node 0 Writeback: 0 kB Node 0 FilePages: 2424 kB Node 0 Mapped: 380 kB Node 0 AnonPages: 1200 kB Node 0 PageTables: 396 kB Node 0 NFS_Unstable: 0 kB Node 0 Bounce: 0 kB Node 0 Slab: 6304 kB Node 0 SReclaimable: 1596 kB Node 0 SUnreclaim: 4708 kB I guess it is a good idea to keep HASHDIST_DEFAULT "on" for x86_64 NUMA since x86_64 has no dearth of vmalloc space? Or maybe enable hash distribution for all 64bit NUMA arches? The following patch does it only for x86_64. I ran a HPC MPI benchmark -- 'Ansys wingsolid', which takes up quite a bit of memory and uses up tlb entries. This was on a 4 way, 2 socket Tyan AMD box (non vsmp), with 8G total memory (4G pernode). The results with and without hash distribution are: 1. Vanilla - runtime of 1188.000s 2. With hashdist=1 runtime of 1154.000s Oprofile output for the duration of run is: 1. Vanilla: PU: AMD64 processors, speed 2411.16 MHz (estimated) Counted L1_AND_L2_DTLB_MISSES events (L1 and L2 DTLB misses) with a unit mask of 0x00 (No unit mask) count 500 samples % app name symbol name 163054 6.5513 libansys1.so MultiFront::decompose(int, int, Elemset *, int *, int, int, int) 162061 6.5114 libansys3.so blockSaxpy6L_fd 162042 6.5107 libansys3.so blockInnerProduct6L_fd 156286 6.2794 libansys3.so maxb33_ 87879 3.5309 libansys1.so elmatrixmultpcg_ 84857 3.4095 libansys4.so saxpy_pcg 58637 2.3560 libansys4.so .st4560 46612 1.8728 libansys4.so .st4282 43043 1.7294 vmlinux-t copy_user_generic_string 41326 1.6604 libansys3.so blockSaxpyBackSolve6L_fd 41288 1.6589 libansys3.so blockInnerProductBackSolve6L_fd 2. With hashdist=1 CPU: AMD64 processors, speed 2411.13 MHz (estimated) Counted L1_AND_L2_DTLB_MISSES events (L1 and L2 DTLB misses) with a unit mask of 0x00 (No unit mask) count 500 samples % app name symbol name 162993 6.9814 libansys1.so MultiFront::decompose(int, int, Elemset *, int *, int, int, int) 160799 6.8874 libansys3.so blockInnerProduct6L_fd 160459 6.8729 libansys3.so blockSaxpy6L_fd 156018 6.6826 libansys3.so maxb33_ 84700 3.6279 libansys4.so saxpy_pcg 83434 3.5737 libansys1.so elmatrixmultpcg_ 58074 2.4875 libansys4.so .st4560 46000 1.9703 libansys4.so .st4282 41166 1.7632 libansys3.so blockSaxpyBackSolve6L_fd 41033 1.7575 libansys3.so blockInnerProductBackSolve6L_fd 35762 1.5318 libansys1.so inner_product_sub 35591 1.5245 libansys1.so inner_product_sub2 28259 1.2104 libansys4.so addVectors Signed-off-by: Pravin B. Shelar <pravin.shelar@calsoftinc.com> Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org> Signed-off-by: Shai Fultheim <shai@scalex86.org> Signed-off-by: Andi Kleen <ak@suse.de> Acked-by: Christoph Lameter <clameter@engr.sgi.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-05-02[PATCH] x86-64: fix x86_64-mm-sched-clock-shareAndrew Morton
Fix for the following patch. Provide dummy cpufreq functions when CPUFREQ is not compiled in. Cc: Andi Kleen <ak@suse.de> Cc: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02[PATCH] x86-64: build-time checkingVivek Goyal
o X86_64 kernel should run from 2MB aligned address for two reasons. - Performance. - For relocatable kernels, page tables are updated based on difference between compile time address and load time physical address. This difference should be multiple of 2MB as kernel text and data is mapped using 2MB pages and PMD should be pointing to a 2MB aligned address. Life is simpler if both compile time and load time kernel addresses are 2MB aligned. o Flag the error at compile time if one is trying to build a kernel which does not meet alignment restrictions. Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-05-02[PATCH] x86-64: Relocatable Kernel SupportVivek Goyal
This patch modifies the x86_64 kernel so that it can be loaded and run at any 2M aligned address, below 512G. The technique used is to compile the decompressor with -fPIC and modify it so the decompressor is fully relocatable. For the main kernel the page tables are modified so the kernel remains at the same virtual address. In addition a variable phys_base is kept that holds the physical address the kernel is loaded at. __pa_symbol is modified to add that when we take the address of a kernel symbol. When loaded with a normal bootloader the decompressor will decompress the kernel to 2M and it will run there. This both ensures the relocation code is always working, and makes it easier to use 2M pages for the kernel and the cpu. AK: changed to not make RELOCATABLE default in Kconfig Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02[PATCH] x86: __pa and __pa_symbol address space separationVivek Goyal
Currently __pa_symbol is for use with symbols in the kernel address map and __pa is for use with pointers into the physical memory map. But the code is implemented so you can usually interchange the two. __pa which is much more common can be implemented much more cheaply if it is it doesn't have to worry about any other kernel address spaces. This is especially true with a relocatable kernel as __pa_symbol needs to peform an extra variable read to resolve the address. There is a third macro that is added for the vsyscall data __pa_vsymbol for finding the physical addesses of vsyscall pages. Most of this patch is simply sorting through the references to __pa or __pa_symbol and using the proper one. A little of it is continuing to use a physical address when we have it instead of recalculating it several times. swapper_pgd is now NULL. leave_mm now uses init_mm.pgd and init_mm.pgd is initialized at boot (instead of compile time) to the physmem virtual mapping of init_level4_pgd. The physical address changed. Except for the for EMPTY_ZERO page all of the remaining references to __pa_symbol appear to be during kernel initialization. So this should reduce the cost of __pa in the common case, even on a relocated kernel. As this is technically a semantic change we need to be on the lookout for anything I missed. But it works for me (tm). Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02[PATCH] x86-64: Remove the identity mapping as early as possibleVivek Goyal
With the rewrite of the SMP trampoline and the early page allocator there is nothing that needs identity mapped pages, once we start executing C code. So add zap_identity_mappings into head64.c and remove zap_low_mappings() from much later in the code. The functions are subtly different thus the name change. This also kills boot_level4_pgt which was from an earlier attempt to move the identity mappings as early as possible, and is now no longer needed. Essentially I have replaced boot_level4_pgt with trampoline_level4_pgt in trampoline.S Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02[PATCH] x86-64: wakeup.S rename registers to reflect right namesVivek Goyal
o Use appropriate names for 64bit regsiters. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02[PATCH] x86-64: Add EFER to the register set saved by save_processor_stateVivek Goyal
EFER varies like %cr4 depending on the cpu capabilities, and which cpu capabilities we want to make use of. So save/restore it make certain we have the same EFER value when we are done. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02[PATCH] x86-64: cleanup segmentsVivek Goyal
Move __KERNEL32_CS up into the unused gdt entry. __KERNEL32_CS is used when entering the kernel so putting it first is useful when trying to keep boot gdt sizes to a minimum. Set the accessed bit on all gdt entries. We don't care so there is no need for the cpu to burn the extra cycles, and it potentially allows the pages to be immutable. Plus it is confusing when debugging and your gdt entries mysteriously change. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02[PATCH] x86-64: Clean up the early boot page tableVivek Goyal
- Merge physmem_pgt and ident_pgt, removing physmem_pgt. The merge is broken as soon as mm/init.c:init_memory_mapping is run. - As physmem_pgt is gone don't export it in pgtable.h. - Use defines from pgtable.h for page permissions. - Fix the physical memory identity mapping so it is at the correct address. - Remove the physical memory mapping from wakeup_level4_pgt it is at the wrong address so we can't possibly be usinging it. - Simply NEXT_PAGE the work to calculate the phys_ alias of the labels was very cool. Unfortuantely it was a brittle special purpose hack that makes maitenance more difficult. Instead just use label - __START_KERNEL_map like we do everywhere else in assembly. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02[PATCH] x86-64: Assembly safe page.h and pgtable.hVivek Goyal
This patch makes pgtable.h and page.h safe to include in assembly files like head.S. Allowing us to use symbolic constants instead of hard coded numbers when refering to the page tables. This patch copies asm-sparc64/const.h to asm-x86_64 to get a definition of _AC() a very convinient macro that allows us to force the type when we are compiling the code in C and to drop all of the type information when we are using the constant in assembly. Previously this was done with multiple definition of the same constant. const.h was modified slightly so that it works when given CONFIG options as arguments. This patch adds #ifndef __ASSEMBLY__ ... #endif and _AC(1,UL) where appropriate so the assembler won't choke on the header files. Otherwise nothing should have changed. AK: added const.h to exported headers to fix headers_check Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02[PATCH] x86-64: dma_ops as constStephen Hemminger
The dma_ops structure can be const since it never changes after boot. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02[PATCH] x86-64: fix cpu MHz reporting on constant_tsc cpusJoerg Roedel
This patch fixes the reporting of cpu_mhz in /proc/cpuinfo on CPUs with a constant TSC rate and a kernel with disabled cpufreq. Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Andi Kleen <ak@suse.de> arch/x86_64/kernel/apic.c | 2 - arch/x86_64/kernel/time.c | 58 +++++++++++++++++++++++++++++++++++++++--- arch/x86_64/kernel/tsc.c | 12 +++++--- arch/x86_64/kernel/tsc_sync.c | 2 - include/asm-x86_64/proto.h | 1 5 files changed, 65 insertions(+), 10 deletions(-)
2007-05-02[PATCH] x86-64: Remove duplicated code for reading control registersGlauber de Oliveira Costa
On Tue, Mar 13, 2007 at 05:33:09AM -0700, Randy.Dunlap wrote: > On Tue, 13 Mar 2007, Glauber de Oliveira Costa wrote: > > > Tiny cleanup: > > > > In x86_64, the same functions for reading cr3 and writing cr{3,4} are > > defined in tlbflush.h and system.h, whith just a name change. > > The only difference is the clobbering of memory, which seems a safe, and > > even needed change for the write_cr4. This patch removes the duplicate. > > write_cr3() is moved to system.h for consistency. > > missing patch..... > thanks. Attached now -- Glauber de Oliveira Costa Red Hat Inc. "Free as in Freedom" Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02[PATCH] x86-64: Remove unused set_seg_baseRusty Russell
The set_seg_base function isn't used anywhere (2.6.21-rc3-git1) Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02[PATCH] i386: clean up mach_reboot_fixupsJeremy Fitzhardinge
The reboot_fixups stuff seems to be a bit of a mess, specifically the header is in linux/ when its a purely i386-specific piece of code. I'm not sure why it has its config option; its only currently needed for "geode-gx1/cs5530a", so perhaps whatever config option controls that hardware should enable this? Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02[PATCH] i386: Update __copy_to_user_inatomic linuxdoc descriptionAneesh Kumar K.V
Explicity specify that the caller should pin the user memory otherwise the function will sleep Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@gmail.com> Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02[PATCH] i386: Change sysenter_setup to __cpuinit & improve __INIT, __INITDATAPrarit Bhargava
Change sysenter_setup to __cpuinit. Change __INIT & __INITDATA to be cpu hotplug aware. Resolve MODPOST warnings similar to: WARNING: vmlinux - Section mismatch: reference to .init.text:sysenter_setup from .text between 'identify_cpu' (at offset 0xc040a380) and 'detect_ht' and WARNING: vmlinux - Section mismatch: reference to .init.data:vsyscall_int80_end from .text between 'sysenter_setup' (at offset 0xc041a269) and 'enable_sep_cpu' WARNING: vmlinux - Section mismatch: reference to .init.data:vsyscall_int80_start from .text between 'sysenter_setup' (at offset 0xc041a26e) and 'enable_sep_cpu' WARNING: vmlinux - Section mismatch: reference to .init.data:vsyscall_sysenter_end from .text between 'sysenter_setup' (at offset 0xc041a275) and 'enable_sep_cpu' WARNING: vmlinux - Section mismatch: reference to .init.data:vsyscall_sysenter_start from .text between 'sysenter_setup' (at offset 0xc041a27a) and 'enable_sep_cpu' Signed-off-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02[PATCH] x86-64: Some cleanup in time.cAndi Kleen
Move prototypes into header files Remove unneeded includes. Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02[PATCH] i386: Add an option for the VIA C7 which sets appropriate L1 cacheSimon Arlott
The VIA C7 is a 686 (with TSC) that supports MMX, SSE and SSE2, it also has a cache line length of 64 according to http://www.digit-life.com/articles2/cpu/rmma-via-c7.html. This patch sets gcc to -march=686 and select s the correct cache shift. Signed-off-by: Simon Arlott <simon@fire.lp0.eu> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-05-02[PATCH] i386: pit_latch_buggy has no effecttakada
Eliminated the arch/i386/kernel/timers in 2.6.18, use clocksoures instead. pit_latch_buggy was referred in timers/timer_tsc.c, and currently removed. Therefore nobody refer it. Until 2.6.17, MediaGX's TSC works correctly. after 2.6.18, warned "TSC appears to be running slowly. Marking it as unstable". So marked unstable TSC when CS55x0. Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02[PATCH] i386: No need to use -traditional for processing asm in i386/kernel/Jeremy Fitzhardinge
No need to use -traditional for processing asm in i386/kernel/ Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02[PATCH] x86: consolidate smp_send_stop()Jan Beulich
Synchronize i386's smp_send_stop() with x86-64's in only try-locking the call lock to prevent deadlocks when called from panic(). In both version, disable interrupts before clearing the CPU off the online map to eliminate races with IRQ handlers inspecting this map. Also in both versions, save/restore interrupts rather than disabling/ enabling them. On x86-64, eliminate one function used here by folding it into its single caller, convert to static, and rename for consistency with i386 (lkcd may like this). Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02[PATCH] x86-64: adjust inclusion of asm/vsyscall32.hJan Beulich
Avoid including asm/vsyscall32.h in virtually every source file. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02[PATCH] x86: adjust inclusion of asm/fixmap.hJan Beulich
Move inclusion of asm/fixmap.h to where it is really used rather than where it may have been used long ago (requires a few other adjustments to includes due to previous implicit dependencies). Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02[PATCH] x86: default to physical mode on hotplug CPU kernelsIngo Molnar
Default to physical mode on hotplug CPU kernels. Furher simplify and clean up the APIC initialization code. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Andi Kleen <ak@suse.de> Cc: "Li, Shaohua" <shaohua.li@intel.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org>
2007-05-02[PATCH] x86-64: always use physical delivery mode on > 8 CPUsIngo Molnar
Remove clustered APIC mode. There's little point in the use of clustered APIC mode, broadcasting is limited to within the cluster only, and chipsets have bugs in this area as well. So default to physical APIC mode when the CPU count is large, and default to logical APIC mode when the CPU count is 8 or smaller. (this patch only removes the use of genapic_cluster and cleans up the resulting genapic.c file - removal of all remaining traces of clustered mode will be done by another patch.) Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Andi Kleen <ak@suse.de> Cc: "Li, Shaohua" <shaohua.li@intel.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org>
2007-05-02[PATCH] x86: revert x86_64-mm-fix-the-irqbalance-quirk-for-e7320-e7520-e7525Andrew Morton
Obsoleted by Ingo's genapic stuff. Cc: Ingo Molnar <mingo@elte.hu> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Andi Kleen <ak@suse.de> Cc: "Li, Shaohua" <shaohua.li@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02[PATCH] x86-64: revert x86_64-mm-add-genapic_forceAndrew Morton
This is obsoleted by new Ingo genapic patches. Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Andi Kleen <ak@suse.de> Cc: "Li, Shaohua" <shaohua.li@intel.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Andi Kleen <ak@suse.de>
2007-04-30pm: include EIO from errno-base.hDavid Rientjes
For backwards compatibility, call_platform_enable_wakeup() can return 0 instead of -EIO since we aren't guaranteed to have errno defined. Cc: David Brownell <david-b@pacbell.net> Signed-off-by: David Rientjes <rientjes@google.com> Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-30Add kvasprintf()Jeremy Fitzhardinge
Add a kvasprintf() function to complement kasprintf(). No in-tree users yet, but I have some coming up. [akpm@linux-foundation.org: EXPORT it] Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Keir Fraser <keir@xensource.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-30power management: force pm_ops.valid callback to be assignedJohannes Berg
This patch changes the docs and behaviour from "all states valid" to "no states valid" if no .valid callback is assigned. Users of pm_ops that only need mem sleep can assign pm_valid_only_mem without any overhead, others will require more elaborate callbacks. Now that all users of pm_ops have a .valid callback this is a safe thing to do and prevents things from getting messy again as they were before. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Pavel Machek <pavel@ucw.cz> Looks-okay-to: Rafael J. Wysocki <rjw@sisk.pl> Cc: <linux-pm@lists.linux-foundation.org> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-30power management: implement pm_ops.valid for everybodyJohannes Berg
Almost all users of pm_ops only support mem sleep, don't check in .valid and don't reject any others in .prepare so users can be confused if they check /sys/power/state, especially when new states are added (these would then result in s-t-r although they're supposed to be something different). This patch implements a generic pm_valid_only_mem function that is then exported for users and puts it to use in almost all existing pm_ops. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Cc: David Brownell <david-b@pacbell.net> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: linux-pm@lists.linux-foundation.org Cc: Len Brown <lenb@kernel.org> Acked-by: Russell King <rmk@arm.linux.org.uk> Cc: Greg KH <greg@kroah.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-30power management: remove firmware disk modeJohannes Berg
This patch removes the firmware disk suspend mode which is the wrong approach, it is supposed to be used for implementing firmware-based disk suspend but cannot actually be used for that. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: <linux-pm@lists.linux-foundation.org> Cc: David Brownell <david-b@pacbell.net> Cc: Len Brown <lenb@kernel.org> Acked-by: Russell King <rmk@arm.linux.org.uk> Cc: Greg KH <greg@kroah.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-30rework pm_ops pm_disk_mode, kill misuseJohannes Berg
This patch series cleans up some misconceptions about pm_ops. Some users of the pm_ops structure attempt to use it to stop the user from entering suspend to disk, this, however, is not possible since the user can always use "shutdown" in /sys/power/disk and then the pm_ops are never invoked. Also, platforms that don't support suspend to disk simply should not allow configuring SOFTWARE_SUSPEND (read the help text on it, it only selects suspend to disk and nothing else, all the other stuff depends on PM). The pm_ops structure is actually intended to provide a way to enter platform-defined sleep states (currently supported states are "standby" and "mem" (suspend to ram)) and additionally (if SOFTWARE_SUSPEND is configured) allows a platform to support a platform specific way to enter low-power mode once everything has been saved to disk. This is currently only used by ACPI (S4). This patch: The pm_ops.pm_disk_mode is used in totally bogus ways since nobody really seems to understand what it actually does. This patch clarifies the pm_disk_mode description. It also removes all the arm and sh users that think they can veto suspend to disk via pm_ops; not so since the user can always do echo shutdown > /sys/power/disk, they need to find a better way involving Kconfig or such. ACPI is the only user left with a non-zero pm_disk_mode. The patch also sets the default mode to shutdown again, but when a new pm_ops is registered its pm_disk_mode is selected as default, that way the default stays for ACPI where it is apparently required. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Cc: David Brownell <david-b@pacbell.net> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: <linux-pm@lists.linux-foundation.org> Cc: Len Brown <lenb@kernel.org> Acked-by: Russell King <rmk@arm.linux.org.uk> Cc: Greg KH <greg@kroah.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Acked-by: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-30Extend print_symbol capabilityRobert Peterson
Today's print_symbol function dumps a kernel symbol with printk. This patch extends the functionality of kallsyms.c so that the symbol lookup function may be used without the printk. This is useful for modules that want to dump symbols elsewhere, for example, to debugfs. I intend to use the new function call in the GFS2 file system (which will be a separate patch). [akpm@linux-foundation.org: build fix] [clameter@sgi.com: sprint_symbol should return length of string like sprintf] Signed-off-by: Robert Peterson <rpeterso@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Cc: Sam Ravnborg <sam@ravnborg.org> Acked-by: Paulo Marques <pmarques@grupopie.com> Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-30Merge branch 'for-linus' of ↵Linus Torvalds
master.kernel.org:/pub/scm/linux/kernel/git/jikos/hid * 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jikos/hid: (21 commits) USB HID: don't warn on idVendor == 0 USB HID: add 'quirks' module parameter USB HID: add support for dynamically-created quirks USB HID: clarify static quirk handling as squirks USB HID: encapsulate quirk handling into hid-quirks.c USB HID: EMS USBII device needs HID_QUIRK_MULTI_INPUT HID: update copyright and authorship macro HID: introduce proper zeroing of unused bits in output reports USB HID: add support for WiseGroup MP-8800 Quad Joypad USB HID: add FF support for Logitech Force 3D Pro Joystick USB HID: numlock quirk for dell W7658 keyboard USB HID: Logitech MX3000 keyboard needs report descriptor quirk USB HID: extend quirk for Logitech S510 keyboard USB HID: usbkbd/usbmouse - handle errors when registering devices USB HID: add QUIRK_HIDDEV for Belkin Flip KVM HID: enable dead keys on a belkin wireless keyboard USB HID: Thustmaster firestorm dual power v1 support USB HID: specify explicit size for hid_blacklist.quirks USB HID: fix retry & reset logic USB HID: consolidate vendor/product ids ...
2007-04-30Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (21 commits) [IPV4] SNMP: Support OutMcastPkts and OutBcastPkts [IPV4] SNMP: Support InMcastPkts and InBcastPkts [IPV4] SNMP: Support InTruncatedPkts [IPV4] SNMP: Support InNoRoutes [SNMP]: Add definitions for {In,Out}BcastPkts [TCP] FRTO: RFC4138 allows Nagle override when new data must be sent [TCP] FRTO: Delay skb available check until it's mandatory [XFRM]: Restrict upper layer information by bundle. [TCP]: Catch skb with S+L bugs earlier [PATCH] INET : IPV4 UDP lookups converted to a 2 pass algo [L2TP]: Add the ability to autoload a pppox protocol module. [SKB]: Introduce skb_queue_walk_safe() [AF_IUCV/IUCV]: smp_call_function deadlock [IPV6]: Fix slab corruption running ip6sic [TCP]: Update references in two old comments [XFRM]: Export SPD info [IPV6]: Track device renames in snmp6. [SCTP]: Fix sctp_getsockopt_local_addrs_old() to use local storage. [NET]: Remove NETIF_F_INTERNAL_STATS, default to internal stats. [NETPOLL]: Remove CONFIG_NETPOLL_RX ...
2007-04-30Merge branch 'for-linus' of git://git.kernel.dk/data/git/linux-2.6-blockLinus Torvalds
* 'for-linus' of git://git.kernel.dk/data/git/linux-2.6-block: [PATCH] elevator: elv_list_lock does not need irq disabling [BLOCK] Don't pin lots of memory in mempools cfq-iosched: speedup cic rb lookup ll_rw_blk: add io_context private pointer cfq-iosched: get rid of cfqq hash cfq-iosched: tighten queue request overlap condition cfq-iosched: improve sync vs async workloads cfq-iosched: never allow an async queue idling cfq-iosched: get rid of ->dispatch_slice cfq-iosched: don't pass unused preemption variable around cfq-iosched: get rid of ->cur_rr and ->cfq_list cfq-iosched: slice offset should take ioprio into account [PATCH] cfq-iosched: style cleanups and comments cfq-iosched: sort IDLE queues into the rbtree cfq-iosched: sort RT queues into the rbtree [PATCH] cfq-iosched: speed up rbtree handling cfq-iosched: rework the whole round-robin list concept cfq-iosched: minor updates cfq-iosched: development update cfq-iosched: improve preemption for cooperating tasks
2007-04-30Merge branch 'for-2.6.22' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc * 'for-2.6.22' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (255 commits) [POWERPC] Remove dev_dbg redefinition in drivers/ps3/vuart.c [POWERPC] remove kernel module option for booke wdt [POWERPC] Avoid putting cpu node twice [POWERPC] Spinlock initializer cleanup [POWERPC] ppc4xx_sgdma needs dma-mapping.h [POWERPC] arch/powerpc/sysdev/timer.c build fix [POWERPC] get_property cleanups [POWERPC] Remove the unused HTDMSOUND driver [POWERPC] cell: cbe_cpufreq cleanup and crash fix [POWERPC] Declare enable_kernel_spe in a header [POWERPC] Add dt_xlate_addr() to bootwrapper [POWERPC] bootwrapper: CONFIG_ -> CONFIG_DEVICE_TREE [POWERPC] Don't define a custom bd_t for Xilixn Virtex based boards. [POWERPC] Add sane defaults for Xilinx EDK generated xparameters files [POWERPC] Add uartlite boot console driver for the zImage wrapper [POWERPC] Stop using ppc_sys for Xilinx Virtex boards [POWERPC] New registration for common Xilinx Virtex ppc405 platform devices [POWERPC] Merge common virtex header files [POWERPC] Rework Kconfig dependancies for Xilinx Virtex ppc405 platform [POWERPC] Clean up cpufreq Kconfig dependencies ...
2007-04-30[SNMP]: Add definitions for {In,Out}BcastPktsMitsuru Chinen
The updated IP-MIB RFC (RFC4293) specifys new objects, InBcastPkts and OutBcastPkts. This adds definitions for them. Signed-off-by: Mitsuru Chinen <mitch@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-30[TCP] FRTO: RFC4138 allows Nagle override when new data must be sentIlpo Järvinen
This is a corner case where less than MSS sized new data thingie is awaiting in the send queue. For F-RTO to work correctly, a new data segment must be sent at certain point or F-RTO cannot be used at all. RFC4138 allows overriding of Nagle at that point. Implementation uses frto_counter states 2 and 3 to distinguish when Nagle override is needed. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-30[XFRM]: Restrict upper layer information by bundle.Masahide NAKAMURA
On MIPv6 usage, XFRM sub policy is enabled. When main (IPsec) and sub (MIPv6) policy selectors have the same address set but different upper layer information (i.e. protocol number and its ports or type/code), multiple bundle should be created. However, currently we have issue to use the same bundle created for the first time with all flows covered by the case. It is useful for the bundle to have the upper layer information to be restructured correctly if it does not match with the flow. 1. Bundle was created by two policies Selector from another policy is added to xfrm_dst. If the flow does not match the selector, it goes to slow path to restructure new bundle by single policy. 2. Bundle was created by one policy Flow cache is added to xfrm_dst as originated one. If the flow does not match the cache, it goes to slow path to try searching another policy. Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-30[TCP]: Catch skb with S+L bugs earlierIlpo Järvinen
SACKED_ACKED and LOST are mutually exclusive with SACK, thus having their sum larger than packets_out is bug with SACK. Eventually these bugs trigger traps in the tcp_clean_rtx_queue with SACK but it's much more informative to do this here. Non-SACK TCP, however, could get more than packets_out duplicate ACKs which each increment sacked_out, so it makes sense to do this kind of limitting for non-SACK TCP but not for SACK enabled one. Perhaps the author had the opposite in mind but did the logic accidently wrong way around? Anyway, the sacked_out incrementer code for non-SACK already deals this issue before calling sync_left_out so this trapping can be done unconditionally. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-30Merge branch 'cfq' into for-linusJens Axboe
2007-04-30[BLOCK] Don't pin lots of memory in mempoolsJens Axboe
Currently we scale the mempool sizes depending on memory installed in the machine, except for the bio pool itself which sits at a fixed 256 entry pre-allocation. There's really no point in "optimizing" this OOM path, we just need enough preallocated to make progress. A single unit is enough, lets scale it down to 2 just to be on the safe side. This patch saves ~150kb of pinned kernel memory on a 32-bit box. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-04-30[SKB]: Introduce skb_queue_walk_safe()James Chapman
This patch provides a method for walking skb lists while inserting or removing skbs from the list. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>