aboutsummaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2010-01-18x86, irq: Use 0x20 for the IRQ_MOVE_CLEANUP_VECTOR instead of 0x1fSuresh Siddha
After talking to some more folks inside intel (Peter Anvin, Asit Mallick), the safest option (for future compatibility etc) seen was to use vector 0x20 for IRQ_MOVE_CLEANUP_VECTOR instead of using vector 0x1f (which is documented as reserved vector in the Intel IA32 manuals). Also we don't need to reserve the entire privilege level (all 16 vectors in the priority bucket that IRQ_MOVE_CLEANUP_VECTOR falls into), as the x86 architecture (section 10.9.3 in SDM Vol3a) specifies that with in the priority level, the higher the vector number the higher the priority. And hence we don't need to reserve the complete priority level 0x20-0x2f for the IRQ migration cleanup logic. So change the IRQ_MOVE_CLEANUP_VECTOR to 0x20 and allow 0x21-0x2f to be used for device interrupts. 0x30-0x3f will be used for ISA interrupts (these also can be migrated in the context of IOAPIC and hence need to be at a higher priority level than IRQ_MOVE_CLEANUP_VECTOR). Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <20100114002118.521826763@sbs-t61.sc.intel.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Maciej W. Rozycki <macro@linux-mips.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-01-18x86, vmi: Fix vmi_get_timer_vector() to use IRQ0_VECTORSuresh Siddha
FIRST_DEVICE_VECTOR is going away and it looks like a bad hack to steal FIRST_DEVICE_VECTOR / FIRST_EXTERNAL_VECTOR, when it looks like it needs IRQ0_VECTOR. Fix vmi_get_timer_vector() to use IRQ0_VECTOR. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <20100114002118.436172066@sbs-t61.sc.intel.com> Cc: Alok N Kataria <akataria@vmware.com> Cc: Zach Amsden <zach@vmware.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-01-04x86, apic: Don't waste a vector to improve vector spreadH. Peter Anvin
We want to use a vector-assignment sequence that avoids stumbling onto 0x80 earlier in the sequence, in order to improve the spread of vectors across priority levels on machines with a small number of interrupt sources. Right now, this is done by simply making the first vector (0x31 or 0x41) completely unusable. This is unnecessary; all we need is to start assignment at a +1 offset, we don't actually need to prohibit the usage of this vector once we have wrapped around. Signed-off-by: H. Peter Anvin <hpa@zytor.com> LKML-Reference: <4B426550.6000209@kernel.org>
2010-01-04x86, apic: Reclaim IDT vectors 0x20-0x2fH. Peter Anvin
Reclaim 16 IDT vectors and make them available for general allocation. Reclaim vectors 0x20-0x2f by reallocating the IRQ_MOVE_CLEANUP_VECTOR to vector 0x1f. This is in the range of vector numbers that is officially reserved for the CPU (for exceptions), however, the use of the APIC to generate any vector 0x10 or above is documented, and the CPU internally can receive any vector number (the legacy BIOS uses INT 0x08-0x0f for interrupts, as messed up as that is.) Since IRQ_MOVE_CLEANUP_VECTOR has to be alone in the lowest-numbered priority level (block of 16), this effectively enables us to reclaim an otherwise-unusable APIC priority level and put it to use. Since this is a transient kernel-only allocation we can change it at any time, and if/when there is an exception at vector 0x1f this assignment needs to be changed as part of OS enabling that new feature. Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4B4284C6.9030107@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-30x86: Increase NR_IRQS and nr_irqsYinghai Lu
I have a system with lots of igb and ixgbe, when iov/vf are enabled for them, we hit the limit of 3064. when system has 20 pcie installed, and one card has 2 functions, and one function needs 64 msi-x, may need 20 * 2 * 64 = 2560 for msi-x but if iov and vf are enabled may need 20 * 2 * 64 * 3 = 7680 for msi-x assume system with 5 ioapic, nr_irqs_gsi will be 120. NR_CPUS = 512, and nr_cpu_ids = 128 will have NR_IRQS = 256 + 512 * 64 = 33024 will have nr_irqs = 120 + 8 * 128 + 120 * 64 = 8824 When SPARSE_IRQ is not set, there is no increase with kernel data size. when NR_CPUS=128, and SPARSE_IRQ is set: text data bss dec hex filename 21837444 4216564 12480736 38534744 24bfe58 vmlinux.before 21837442 4216580 12480736 38534758 24bfe66 vmlinux.after when NR_CPUS=4096, and SPARSE_IRQ is set text data bss dec hex filename 21878619 5610244 13415392 40904255 270263f vmlinux.before 21878617 5610244 13415392 40904253 270263d vmlinux.after Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <4B398ECD.1080506@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-24Merge branch 'misc-2.6.33' into releaseLen Brown
2009-12-24Merge branch 'pdc' into releaseLen Brown
2009-12-23Revert "x86, ucode-amd: Ensure ucode update on suspend/resume after CPU ↵Linus Torvalds
off/online cycle" This reverts commit 9f15226e75583547aaf542c6be4bdac1060dd425. It's just wrong, and broke resume for Rafael even on a non-AMD CPU. As Rafael says: "... it causes microcode_init_cpu() to be called during resume even for CPUs for which there's no microcode to apply. That, in turn, results in executing request_firmware() (on Intel CPUs at least) which doesn't work at this stage of resume (we have device interrupts disabled, I/O devices are still suspended and so on). If I'm not mistaken, the "if (uci->valid)" logic means "if that CPU is known to us" , so before commit 9f15226e755 microcode_resume_cpu() was called for all CPUs already in the system during suspend, which was the right thing to do. The commit changed it so that the CPUs without microcode to apply are now treated as "unknown", which is not quite right. The problem this commit attempted to solve has to be handled differently." Bisected-and -requested-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-22arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c: avoid cross-CPU interrupts by ↵Andrew Morton
using smp_call_function_any() Presently acpi-cpufreq will perform the MSR read on the first CPU in the mask. That's inefficient if that CPU differs from the current CPU. Because we have to perform a cross-CPU call, but we could have run the rdmsr on the current CPU. So switch to using the new smp_call_function_any(), which will perform the call on the current CPU if that CPU is present in the mask (it is). Cc: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com> Cc: Dave Jones <davej@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Jaswinder Singh Rajput <jaswinder@kernel.org> Cc: Len Brown <len.brown@intel.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: Zhao Yakui <yakui.zhao@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Len Brown <len.brown@intel.com>
2009-12-22ACPI: processor: unify arch_acpi_processor_cleanup_pdcAlex Chiang
The x86 and ia64 implementations of the function in $subject are exactly the same. Also, since the arch-specific implementations of setting _PDC have been completely hollowed out, remove the empty shells. Cc: Tony Luck <tony.luck@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Len Brown <len.brown@intel.com>
2009-12-22ACPI: processor: finish unifying arch_acpi_processor_init_pdc()Alex Chiang
The only thing arch-specific about calling _PDC is what bits get set in the input obj_list buffer. There's no need for several levels of indirection to twiddle those bits. Additionally, since we're just messing around with a buffer, we can simplify the interface; no need to pass around the entire struct acpi_processor * just to get at the buffer. Cc: Tony Luck <tony.luck@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Len Brown <len.brown@intel.com>
2009-12-22ACPI: processor: factor out common _PDC settingsAlex Chiang
Both x86 and ia64 initialize _PDC with mostly common bit settings. Factor out the common settings and leave the arch-specific ones alone. Cc: Tony Luck <tony.luck@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Len Brown <len.brown@intel.com>
2009-12-22ACPI: processor: unify arch_acpi_processor_init_pdcAlex Chiang
The x86 and ia64 implementations of arch_acpi_processor_init_pdc() are almost exactly the same. The only difference is in what bits they set in obj_list buffer. Combine the boilerplate memory management code, and leave the arch-specific bit twiddling in separate implementations. Cc: Tony Luck <tony.luck@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Len Brown <len.brown@intel.com>
2009-12-22ACPI: processor: introduce arch_has_acpi_pdcAlex Chiang
arch dependent helper function that tells us if we should attempt to evaluate _PDC on this machine or not. The x86 implementation assumes that the CPUs in the machine must be homogeneous, and that you cannot mix CPUs of different vendors. Cc: Tony Luck <tony.luck@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Len Brown <len.brown@intel.com>
2009-12-19Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf session: Make events_stats u64 to avoid overflow on 32-bit arches hw-breakpoints: Fix hardware breakpoints -> perf events dependency perf events: Dont report side-band events on each cpu for per-task-per-cpu events perf events, x86/stacktrace: Fix performance/softlockup by providing a special frame pointer-only stack walker perf events, x86/stacktrace: Make stack walking optional perf events: Remove unused perf_counter.h header file perf probe: Check new event name kprobe-tracer: Check new event/group name perf probe: Check whether debugfs path is correct perf probe: Fix libdwarf include path for Debian
2009-12-19Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, irq: Allow 0xff for /proc/irq/[n]/smp_affinity on an 8-cpu system Makefile: Unexport LC_ALL instead of clearing it x86: Fix objdump version check in arch/x86/tools/chkobjdump.awk x86: Reenable TSC sync check at boot, even with NONSTOP_TSC x86: Don't use POSIX character classes in gen-insn-attr-x86.awk Makefile: set LC_CTYPE, LC_COLLATE, LC_NUMERIC to C x86: Increase MAX_EARLY_RES; insufficient on 32-bit NUMA x86: Fix checking of SRAT when node 0 ram is not from 0 x86, cpuid: Add "volatile" to asm in native_cpuid() x86, msr: msrs_alloc/free for CONFIG_SMP=n x86, amd: Get multi-node CPU info from NodeId MSR instead of PCI config space x86: Add IA32_TSC_AUX MSR and use it x86, msr/cpuid: Register enough minors for the MSR and CPUID drivers initramfs: add missing decompressor error check bzip2: Add missing checks for malloc returning NULL bzip2/lzma/gzip: pre-boot malloc doesn't return NULL on failure
2009-12-18hw-breakpoints: Fix hardware breakpoints -> perf events dependencyFrederic Weisbecker
The kbuild's select command doesn't propagate through the config dependencies. Hence the current rules of hardware breakpoint's config can't ensure perf can never be disabled under us. We have: config X86 selects HAVE_HW_BREAKPOINTS config HAVE_HW_BREAKPOINTS select PERF_EVENTS config PERF_EVENTS [...] x86 will select the breakpoints but that won't propagate to perf events. The user can still disable the latter, but it is necessary for the breakpoints. What we need is: - x86 selects HAVE_HW_BREAKPOINTS and PERF_EVENTS - HAVE_HW_BREAKPOINTS depends on PERF_EVENTS so that we ensure PERF_EVENTS is enabled and frozen for x86. This fixes the following kind of build errors: In file included from arch/x86/kernel/hw_breakpoint.c:31: include/linux/hw_breakpoint.h: In function 'hw_breakpoint_addr': include/linux/hw_breakpoint.h:39: error: 'struct perf_event' has no member named 'attr' v2: Select also ANON_INODES from x86, required for perf Reported-by: Cyrill Gorcunov <gorcunov@gmail.com> Reported-by: Michal Marek <mmarek@suse.cz> Reported-by: Andrew Randrianasulu <randrik_a@yahoo.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: K.Prasad <prasad@linux.vnet.ibm.com> LKML-Reference: <1261010034-7786-1-git-send-regression-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-17x86, irq: Allow 0xff for /proc/irq/[n]/smp_affinity on an 8-cpu systemSuresh Siddha
John Blackwood reported: > on an older Dell PowerEdge 6650 system with 8 cpus (4 are hyper-threaded), > and 32 bit (x86) kernel, once you change the irq smp_affinity of an irq > to be less than all cpus in the system, you can never change really the > irq smp_affinity back to be all cpus in the system (0xff) again, > even though no error status is returned on the "/bin/echo ff > > /proc/irq/[n]/smp_affinity" operation. > > This is due to that fact that BAD_APICID has the same value as > all cpus (0xff) on 32bit kernels, and thus the value returned from > set_desc_affinity() via the cpu_mask_to_apicid_and() function is treated > as a failure in set_ioapic_affinity_irq_desc(), and no affinity changes > are made. set_desc_affinity() is already checking if the incoming cpu mask intersects with the cpu online mask or not. So there is no need for the apic op cpu_mask_to_apicid_and() to check again and return BAD_APICID. Remove the BAD_APICID return value from cpu_mask_to_apicid_and() and also fix set_desc_affinity() to return -1 instead of using BAD_APICID to represent error conditions (as cpu_mask_to_apicid_and() can return logical or physical apicid values and BAD_APICID is really to represent bad physical apic id). Reported-by: John Blackwood <john.blackwood@ccur.com> Root-caused-by: John Blackwood <john.blackwood@ccur.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <1261103386.2535.409.camel@sbs-t61> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-17Merge branch 'cpumask-cleanups' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus * 'cpumask-cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: cpumask: rename tsk_cpumask to tsk_cpus_allowed cpumask: don't recommend set_cpus_allowed hack in Documentation/cpu-hotplug.txt cpumask: avoid dereferencing struct cpumask cpumask: convert drivers/idle/i7300_idle.c to cpumask_var_t cpumask: use modern cpumask style in drivers/scsi/fcoe/fcoe.c cpumask: avoid deprecated function in mm/slab.c cpumask: use cpu_online in kernel/perf_event.c
2009-12-17x86: Fix objdump version check in arch/x86/tools/chkobjdump.awkakpm@linux-foundation.org
It says Warning: objdump version is older than 2.19 Warning: Skipping posttest. because it used the wrong field from `objdump -v': akpm:/usr/src/25> /opt/crosstool/gcc-4.0.2-glibc-2.3.6/x86_64-unknown-linux-gnu/bin/x86_64-unknown-linux-gnu-objdump -v GNU objdump 2.16.1 Copyright 2005 Free Software Foundation, Inc. This program is free software; you may redistribute it under the terms of the GNU General Public License. This program has absolutely no warranty. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <200912172326.nBHNQaQl024796@imap1.linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Masami Hiramatsu <mhiramat@redhat.com>
2009-12-17x86: Reenable TSC sync check at boot, even with NONSTOP_TSCPallipadi, Venkatesh
Commit 83ce4009 did the following change If the TSC is constant and non-stop, also set it reliable. But, there seems to be few systems that will end up with TSC warp across sockets, depending on how the cpus come out of reset. Skipping TSC sync test on such systems may result in time inconsistency later. So, reenable TSC sync test even on constant and non-stop TSC systems. Set, sched_clock_stable to 1 by default and reset it in mark_tsc_unstable, if TSC sync fails. This change still gives perf benefit mentioned in 83ce4009 for systems where TSC is reliable. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <20091217202702.GA18015@linux-os.sc.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-17Merge branch 'for-33' of git://repo.or.cz/linux-kbuildLinus Torvalds
* 'for-33' of git://repo.or.cz/linux-kbuild: (29 commits) net: fix for utsrelease.h moving to generated gen_init_cpio: fixed fwrite warning kbuild: fix make clean after mismerge kbuild: generate modules.builtin genksyms: properly consider EXPORT_UNUSED_SYMBOL{,_GPL}() score: add asm/asm-offsets.h wrapper unifdef: update to upstream revision 1.190 kbuild: specify absolute paths for cscope kbuild: create include/generated in silentoldconfig scripts/package: deb-pkg: use fakeroot if available scripts/package: add KBUILD_PKG_ROOTCMD variable scripts/package: tar-pkg: use tar --owner=root Kbuild: clean up marker net: add net_tstamp.h to headers_install kbuild: move utsrelease.h to include/generated kbuild: move autoconf.h to include/generated drop explicit include of autoconf.h kbuild: move compile.h to include/generated kbuild: drop include/asm kbuild: do not check for include/asm-$ARCH ... Fixed non-conflicting clean merge of modpost.c as per comments from Stephen Rothwell (modpost.c had grown an include of linux/autoconf.h that needed to be changed to generated/autoconf.h)
2009-12-17x86/ptrace: make genregs[32]_get/set more robustLinus Torvalds
The loop condition is fragile: we compare an unsigned value to zero, and then decrement it by something larger than one in the loop. All the callers should be passing in appropriately aligned buffer lengths, but it's better to just not rely on it, and have some appropriate defensive loop limits. Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-17x86: Don't use POSIX character classes in gen-insn-attr-x86.awkRoland Dreier
Not all awk implementations (including the default awk in Ubuntu 9.10) support POSIX character classes. Since x86-opcode-map.txt is plain ASCII, we can just use explicit ranges for lower case, alphabetic, and alphanumeric characters instead. Signed-off-by: Roland Dreier <rolandd@cisco.com> Acked-by: Masami Hiramatsu <mhiramat@redhat.com> LKML-Reference: <adabphy750b.fsf@roland-alpha.cisco.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-17perf events, x86/stacktrace: Fix performance/softlockup by providing a ↵Frederic Weisbecker
special frame pointer-only stack walker It's just wasteful for stacktrace users like perf to walk through every entries on the stack whereas these only accept reliable ones, ie: that the frame pointer validates. Since perf requires pure reliable stacktraces, it needs a stack walker based on frame pointers-only to optimize the stacktrace processing. This might solve some near-lockup scenarios that can be triggered by call-graph tracing timer events. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1261024834-5336-2-git-send-regression-fweisbec@gmail.com> [ v2: fix for modular builds and small detail tidyup ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-17perf events, x86/stacktrace: Make stack walking optionalFrederic Weisbecker
The current print_context_stack helper that does the stack walking job is good for usual stacktraces as it walks through all the stack and reports even addresses that look unreliable, which is nice when we don't have frame pointers for example. But we have users like perf that only require reliable stacktraces, and those may want a more adapted stack walker, so lets make this function a callback in stacktrace_ops that users can tune for their needs. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1261024834-5336-1-git-send-regression-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-17cpumask: rename tsk_cpumask to tsk_cpus_allowedRusty Russell
Noone uses this wrapper yet, and Ingo asked that it be kept consistent with current task_struct usage. (One user crept in via linux-next: fixed) Signed-off-by: Rusty Russell <rusty@rustcorp.com.au. Cc: Ingo Molnar <mingo@elte.hu> Cc: Tejun Heo <tj@kernel.org>
2009-12-16x86: Increase MAX_EARLY_RES; insufficient on 32-bit NUMAYinghai Lu
Due to recent changes wakeup and mptable, we run out of early reservations on 32-bit NUMA. Thus, adjust the available number. Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4B22D754.2020706@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-16x86: Fix checking of SRAT when node 0 ram is not from 0Yinghai Lu
Found one system that boot from socket1 instead of socket0, SRAT get rejected... [ 0.000000] SRAT: Node 1 PXM 0 0-a0000 [ 0.000000] SRAT: Node 1 PXM 0 100000-80000000 [ 0.000000] SRAT: Node 1 PXM 0 100000000-2080000000 [ 0.000000] SRAT: Node 0 PXM 1 2080000000-4080000000 [ 0.000000] SRAT: Node 2 PXM 2 4080000000-6080000000 [ 0.000000] SRAT: Node 3 PXM 3 6080000000-8080000000 [ 0.000000] SRAT: Node 4 PXM 4 8080000000-a080000000 [ 0.000000] SRAT: Node 5 PXM 5 a080000000-c080000000 [ 0.000000] SRAT: Node 6 PXM 6 c080000000-e080000000 [ 0.000000] SRAT: Node 7 PXM 7 e080000000-10080000000 ... [ 0.000000] NUMA: Allocated memnodemap from 500000 - 701040 [ 0.000000] NUMA: Using 20 for the hash shift. [ 0.000000] Adding active range (0, 0x2080000, 0x4080000) 0 entries of 3200 used [ 0.000000] Adding active range (1, 0x0, 0x96) 1 entries of 3200 used [ 0.000000] Adding active range (1, 0x100, 0x7f750) 2 entries of 3200 used [ 0.000000] Adding active range (1, 0x100000, 0x2080000) 3 entries of 3200 used [ 0.000000] Adding active range (2, 0x4080000, 0x6080000) 4 entries of 3200 used [ 0.000000] Adding active range (3, 0x6080000, 0x8080000) 5 entries of 3200 used [ 0.000000] Adding active range (4, 0x8080000, 0xa080000) 6 entries of 3200 used [ 0.000000] Adding active range (5, 0xa080000, 0xc080000) 7 entries of 3200 used [ 0.000000] Adding active range (6, 0xc080000, 0xe080000) 8 entries of 3200 used [ 0.000000] Adding active range (7, 0xe080000, 0x10080000) 9 entries of 3200 used [ 0.000000] SRAT: PXMs only cover 917504MB of your 1048566MB e820 RAM. Not used. [ 0.000000] SRAT: SRAT not used. the early_node_map is not sorted because node0 with non zero start come first. so try to sort it right away after all regions are registered. also fixs refression by 8716273c (x86: Export srat physical topology) -v2: make it more solid to handle cross node case like node0 [0,4g), [8,12g) and node1 [4g, 8g), [12g, 16g) -v3: update comments. Reported-and-tested-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4B2579D2.3010201@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-16x86, cpuid: Add "volatile" to asm in native_cpuid()Suresh Siddha
xsave_cntxt_init() does something like: cpuid(0xd, ..); // find out what features FP/SSE/.. etc are supported xsetbv(); // enable the features known to OS cpuid(0xd, ..); // find out the size of the context for features enabled Depending on what features get enabled in xsetbv(), value of the cpuid.eax=0xd.ecx=0.ebx changes correspondingly (representing the size of the context that is enabled). As we don't have volatile keyword for native_cpuid(), gcc 4.1.2 optimizes away the second cpuid and the kernel continues to use the cpuid information obtained before xsetbv(), ultimately leading to kernel crash on processors supporting more state than the legacy FP/SSE. Add "volatile" for native_cpuid(). Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <1261009542.2745.55.camel@sbs-t61.sc.intel.com> Cc: stable@kernel.org Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-16x86, msr: msrs_alloc/free for CONFIG_SMP=nBorislav Petkov
Randy Dunlap reported the following build error: "When CONFIG_SMP=n, CONFIG_X86_MSR=m: ERROR: "msrs_free" [drivers/edac/amd64_edac_mod.ko] undefined! ERROR: "msrs_alloc" [drivers/edac/amd64_edac_mod.ko] undefined!" This is due to the fact that <arch/x86/lib/msr.c> is conditioned on CONFIG_SMP and in the UP case we have only the stubs in the header. Fork off SMP functionality into a new file (msr-smp.c) and build msrs_{alloc,free} unconditionally. Reported-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Borislav Petkov <petkovbb@gmail.com> LKML-Reference: <20091216231625.GD27228@liondog.tnic> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-16x86, amd: Get multi-node CPU info from NodeId MSR instead of PCI config spaceAndreas Herrmann
Use NodeId MSR to get NodeId and number of nodes per processor. Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> LKML-Reference: <20091216144355.GB28798@alberich.amd.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-16Merge branch 'release' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (117 commits) ACPI processor: Fix section mismatch for processor_add() ACPI: Add platform-wide _OSC support. ACPI: cleanup pci_root _OSC code. ACPI: Add a generic API for _OSC -v2 msi-wmi: depend on backlight and fix corner-cases problems msi-wmi: switch to using input sparse keymap library msi-wmi: replace one-condition switch-case with if statement msi-wmi: remove unused field 'instance' in key_entry structure msi-wmi: remove custom runtime debug implementation msi-wmi: rework init msi-wmi: remove useless includes X86 drivers: Introduce msi-wmi driver Toshiba Bluetooth Enabling driver (RFKill handler v3) ACPI: fix for lapic_timer_propagate_broadcast() acpi_pad: squish warning ACPI: dock: minor whitespace and style cleanups ACPI: dock: add struct dock_station * directly to platform device data ACPI: dock: dock_add - hoist up platform_device_register_simple() ACPI: dock: remove global 'dock_device_name' ACPI: dock: combine add|alloc_dock_dependent_device (v2) ...
2009-12-16Merge branch 'master' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (38 commits) direct I/O fallback sync simplification ocfs: stop using do_sync_mapping_range cleanup blockdev_direct_IO locking make generic_acl slightly more generic sanitize xattr handler prototypes libfs: move EXPORT_SYMBOL for d_alloc_name vfs: force reval of target when following LAST_BIND symlinks (try #7) ima: limit imbalance msg Untangling ima mess, part 3: kill dead code in ima Untangling ima mess, part 2: deal with counters Untangling ima mess, part 1: alloc_file() O_TRUNC open shouldn't fail after file truncation ima: call ima_inode_free ima_inode_free IMA: clean up the IMA counts updating code ima: only insert at inode creation time ima: valid return code from ima_inode_alloc fs: move get_empty_filp() deffinition to internal.h Sanitize exec_permission_lite() Kill cached_lookup() and real_lookup() Kill path_lookup_open() ... Trivial conflicts in fs/direct-io.c
2009-12-16Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Fix kprobes build with non-gawk awk x86: Split swiotlb initialization into two stages x86: Regex support and known-movable symbols for relocs, fix _end x86, msr: Remove incorrect, duplicated code in the MSR driver x86: Merge kernel_thread() x86: Sync 32/64-bit kernel_thread x86, 32-bit: Use same regs as 64-bit for kernel_thread_helper x86, 64-bit: Use user_mode() to determine new stack pointer in copy_thread() x86, 64-bit: Move kernel_thread to C x86-64, paravirt: Call set_iopl_mask() on 64 bits x86-32: Avoid pipeline serialization in PTREGSCALL1 and 2 x86: Merge sys_clone x86, 32-bit: Convert sys_vm86 & sys_vm86old x86: Merge sys_sigaltstack x86: Merge sys_execve x86: Merge sys_iopl x86-32: Add new pt_regs stubs cpumask: Use modern cpumask style in arch/x86/kernel/cpu/mcheck/mce-inject.c
2009-12-16Merge branch 'module' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus * 'module' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: modpost: fix segfault with short symbol names module: handle ppc64 relocating kcrctabs when CONFIG_RELOCATABLE=y Kbuild: clear marker out of modpost module: make MODULE_SYMBOL_PREFIX into a CONFIG option ARM: unexport symbols used to implement floating point emulation ARM: use unified discard definition in linker script x86: don't export inline function sparc64: don't export static inline pci_ functions
2009-12-16sanitize do_pipe_flags() callers in archAl Viro
* hpux_pipe() - no need to take BKL * sys32_pipe() in arch/x86/ia32 and xtensa_pipe() in arch/xtensa - no need at all, since both functions are open-coded sys_pipe() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-12-16iommu-helper: use bitmap libraryAkinobu Mita
Use bitmap library and kill some unused iommu helper functions. 1. s/iommu_area_free/bitmap_clear/ 2. s/iommu_area_reserve/bitmap_set/ 3. Use bitmap_find_next_zero_area instead of find_next_zero_area This cannot be simple substitution because find_next_zero_area doesn't check the last bit of the limit in bitmap 4. Remove iommu_area_free, iommu_area_reserve, and find_next_zero_area Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16gru: function to generate chipset IPI valuesJack Steiner
Create a function to generate the value that is written to the UV hub MMR to cause an IPI interrupt to be sent. The function will be used in the GRU message queue error recovery code that sends IPIs to nodes in remote partitions. Signed-off-by: Jack Steiner <steiner@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16x86: uv: update XPC to handle updated BIOS interfaceRobin Holt
The UV BIOS has moved the location of some of their pointers to the "partition reserved page" from memory into a uv hub MMR. The GRU does not support bcopy operations from MMR space so we need to special case the MMR addresses using VLOAD operations. Additionally, the BIOS call for registering a message queue watchlist has removed the 'blade' value and eliminated the structure that was being passed in. This is also reflected in this patch. Signed-off-by: Robin Holt <holt@sgi.com> Cc: Jack Steiner <steiner@sgi.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16x86: uv: introduce uv_gpa_is_mmrRobin Holt
Provide a mechanism for determining if a global physical address is pointing to a UV hub MMR. Signed-off-by: Robin Holt <holt@sgi.com> Cc: Jack Steiner <steiner@sgi.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16x86: uv: introduce a means to translate from gpa -> socket_paddrRobin Holt
The UV BIOS has been updated to implement some of our interface functionality differently than originally expected. These patches update the kernel to the bios implementation and include a few minor bug fixes which prevent us from doing significant testing on real hardware. This patch: For SGI UV systems, translate from a global physical address back to a socket physical address. This does nothing to ensure the socket physical address is actually addressable by the kernel. That is the responsibility of the user of the function. Signed-off-by: Robin Holt <holt@sgi.com> Cc: Jack Steiner <steiner@sgi.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16dma-mapping: fix off-by-one error in dma_capable()Jan Beulich
dma_mask is, when interpreted as address, the last valid byte, and hence comparison msut also be done using the last valid of the buffer in question. Also fix the open-coded instances in lib/swiotlb.c. Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Becky Bruce <beckyb@kernel.crashing.org> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16elf: kill USE_ELF_CORE_DUMPChristoph Hellwig
Currently all architectures but microblaze unconditionally define USE_ELF_CORE_DUMP. The microblaze omission seems like an error to me, so let's kill this ifdef and make sure we are the same everywhere. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: <linux-arch@vger.kernel.org> Cc: Michal Simek <michal.simek@petalogix.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16ptrace: x86: change syscall_trace_leave() to rely on tracehook when steppingOleg Nesterov
Suggested by Roland. Unlike powepc, x86 always calls tracehook_report_syscall_exit(step) with step = 0, and sends the trap by hand. This results in unnecessary SIGTRAP when PTRACE_SINGLESTEP follows the syscall-exit stop. Change syscall_trace_leave() to pass the correct "step" argument to tracehook and remove the send_sigtrap() logic. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Cc: <linux-arch@vger.kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16ptrace: x86: implement user_single_step_siginfo()Oleg Nesterov
Suggested by Roland. Implement user_single_step_siginfo() for x86. Extract this code from send_sigtrap(). Since x86 calls tracehook_report_syscall_exit(step => 0) the new helper is not used yet. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Cc: <linux-arch@vger.kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16x86: Add IA32_TSC_AUX MSR and use itSheng Yang
Clean up write_tsc() and write_tscp_aux() by replacing hardcoded values. No change in functionality. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Cc: Avi Kivity <avi@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> LKML-Reference: <1260942485-19156-4-git-send-email-sheng@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-15Merge branch 'bugzilla-14700' into releaseLen Brown
2009-12-15x86, msr/cpuid: Register enough minors for the MSR and CPUID driversH. Peter Anvin
register_chrdev() hardcodes registering 256 minors, presumably to avoid breaking old drivers. However, we need to register enough minors so that we have all possible CPUs. checkpatch warns on this patch, but the patch is correct: NR_CPUS here is a static *upper bound* on the *maximum CPU index* (not *number of CPUs!*) and that is what we want. Reported-and-tested-by: Russ Anderson <rja@sgi.com> Cc: Tejun Heo <tj@kernel.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Takashi Iwai <tiwai@suse.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: H. Peter Anvin <hpa@zytor.com> LKML-Reference: <tip-*@git.kernel.org>
2009-12-15x86: Fix kprobes build with non-gawk awkJonathan Nieder
The instruction attribute table generator fails when run by mawk or original-awk: $ mawk -f arch/x86/tools/gen-insn-attr-x86.awk \ arch/x86/lib/x86-opcode-map.txt > /dev/null Semantic error at 240: Second IMM error $ echo $? 1 Line 240 contains "c8: ENTER Iw,Ib", which indicates that this instruction has two immediate operands, the second of which is one byte. The script loops through the immediate operands using a for loop. Unfortunately, there is no guarantee in awk that a for (variable in array) loop will return the indices in increasing order. Internally, both original-awk and mawk iterate over a hash table for this purpose, and both implementations happen to produce the index 2 before 1. The supposed second immediate operand is more than one byte wide, producing the error. So loop over the indices in increasing order instead. As a side-effect, with mawk this means the silly two-entry hash table never has to be built. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Acked-by Masami Hiramatsu <mhiramat@redhat.com> Cc: Jim Keniston <jkenisto@us.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20091213220437.GA27718@progeny.tock> Signed-off-by: Ingo Molnar <mingo@elte.hu>