aboutsummaryrefslogtreecommitdiff
path: root/arch/x86/include/asm/mce.h
AgeCommit message (Collapse)Author
2009-04-22x86, mce: fix boot logging logicAndi Kleen
The earlier patch to change the poller to a separate function subtly broke the boot logging logic. This could lead to machine checks getting logged at boot even when disabled or defaulting to off on some systems. Fix that. [ Impact: bug fix - avoid spurious MCE in log ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2009-02-24x86, mce, cmci: add CMCI supportAndi Kleen
Impact: Major new feature Intel CMCI (Corrected Machine Check Interrupt) is a new feature on Nehalem CPUs. It allows the CPU to trigger interrupts on corrected events, which allows faster reaction to them instead of with the traditional polling timer. Also use CMCI to discover shared banks. Machine check banks can be shared by CPU threads or even cores. Using the CMCI enable bit it is possible to detect the fact that another CPU already saw a specific bank. Use this to assign shared banks only to one CPU to avoid reporting duplicated events. On CPU hot unplug bank sharing is re discovered. This is done using a thread that cycles through all the CPUs. To avoid races between the poller and CMCI we only poll for banks that are not CMCI capable and only check CMCI owned banks on a interrupt. The shared banks ownership information is currently only used for CMCI interrupts, not polled banks. The sharing discovery code follows the algorithm recommended in the IA32 SDM Vol3a 14.5.2.1 The CMCI interrupt handler just calls the machine check poller to pick up the machine check event that caused the interrupt. I decided not to implement a separate threshold event like the AMD version has, because the threshold is always one currently and adding another event didn't seem to add any value. Some code inspired by Yunhong Jiang's Xen implementation, which was in term inspired by a earlier CMCI implementation by me. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-24x86, mce, cmci: define MSR names and fields for new CMCI registersAndi Kleen
Impact: New register definitions only CMCI means support for raising an interrupt on a corrected machine check event instead of having to poll for it. It's a new feature in Intel Nehalem CPUs available on some machine check banks. For details see the IA32 SDM Vol3a 14.5 Define the registers for it as a preparation for further patches. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-24x86, mce, cmci: use polled banks bitmap in machine check pollerAndi Kleen
Define a per cpu bitmap that contains the banks polled by the machine check poller. This is needed for the CMCI code in the next patches to be able to disable polling on specific banks. The bank by default contains all banks, so there is no behaviour change. Only future code will remove some banks from the polling set. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-24x86, mce, cmci: factor out threshold interrupt handlerAndi Kleen
Impact: cleanup; preparation for feature The mce_amd_64 code has an own private MC threshold vector with an own interrupt handler. Since Intel needs a similar handler it makes sense to share the vector because both can not be active at the same time. I factored the common APIC handler code into a separate file which can be used by both the Intel or AMD MC code. This is needed for the next patch which adds an Intel specific CMCI handler. This patch should be a nop for AMD, it just moves some code around. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-24x86, mce, cmci: export MAX_NR_BANKSAndi Kleen
Impact: Cleanup (code movement) Move MAX_NR_BANKS into mce.h because it's needed there for followup patches. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-19x86, mce: separate correct machine check poller and fatal exception handlerAndi Kleen
Impact: cleanup, performance enhancement The machine check poller is diverging more and more from the fatal exception handler. Instead of adding more special cases separate the code paths completely. The corrected poll path is actually quite simple, and this doesn't result in much code duplication. This makes both handlers much easier to read and results in cleaner code flow. The exception handler now only needs to care about uncorrected errors, which also simplifies the handling of multiple errors. The corrected poller also now always runs in standard interrupt context and does not need to do anything special to handle NMI context. Minor behaviour changes: - MCG status is now not cleared on polling. - Only the banks which had corrected errors get cleared on polling - The exception handler only clears banks with errors now v2: Forward port to new patch order. Add "uc" argument. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2009-02-19x86, mce: factor out duplicated struct mce setup into one functionAndi Kleen
Impact: cleanup This merely factors out duplicated code to set up the initial struct mce state into a single function. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2009-02-17x86, mce: don't disable machine checks during code patchingAndi Kleen
Impact: low priority bug fix This removes part of a a patch I added myself some time ago. After some consideration the patch was a bad idea. In particular it stopped machine check exceptions during code patching. To quote the comment: * MCEs only happen when something got corrupted and in this * case we must do something about the corruption. * Ignoring it is worse than a unlikely patching race. * Also machine checks tend to be broadcast and if one CPU * goes into machine check the others follow quickly, so we don't * expect a machine check to cause undue problems during to code * patching. So undo the machine check related parts of 8f4e956b313dcccbc7be6f10808952345e3b638c NMIs are still disabled. This only removes code, the only additions are a new comment. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-01-31headers_check fix: x86, mce.hJaswinder Singh Rajput
fix the following 'make headers_check' warnings: usr/include/asm/mce.h:7: include of <linux/types.h> is preferred over <asm/types.h> usr/include/asm/mce.h:29: found __[us]{8,16,32,64} type without #include <linux/types.h> Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
2008-10-22x86: Fix ASM_X86__ header guardsH. Peter Anvin
Change header guards named "ASM_X86__*" to "_ASM_X86_*" since: a. the double underscore is ugly and pointless. b. no leading underscore violates namespace constraints. Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-10-22x86, um: ... and asm-x86 moveAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: H. Peter Anvin <hpa@zytor.com>