aboutsummaryrefslogtreecommitdiff
path: root/arch/i386
AgeCommit message (Collapse)Author
2006-01-04[PATCH] i386: "invalid operand" -> "invalid opcode"Chuck Ebbert
According to the manual, INT 6 is "invalid opcode", not "invalid operand". Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-04Merge master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreqLinus Torvalds
2005-12-31[PATCH] x86: teach dump_task_regs() about the -8 offset.Stas Sergeev
This should fix multi-threaded core-files Signed-off-by: stsp@aknet.ru Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-20[PATCH] Fix build with CONFIG_PCI_MMCONFIGAndi Kleen
Now needs to include the type 1 functions ("direct") too. Reported by Pavel Roskin <proski@gnu.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-16[PATCH] PCI: Fix dumb bug in mmconfig fixAndi Kleen
Use correct address when referencing mmconfig aperture while checking for broken MCFG. This was a typo when porting the code from 64bit to 32bit. It caused oopses at boot on some ThinkPads. Should definitely go into 2.6.15. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-15[PATCH] i386,amd64: ioremap.c __iomem annotationsAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-15[PATCH] i386,amd64: mmconfig __iomem annotationsAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-12[PATCH] i386/x86-64 Correct for broken MCFG tables on K8 systemsAndi Kleen
They report all busses as MMCONFIG capable, but it never works for the internal devices in the CPU's builtin northbridge. It just probes all func 0 devices on bus 0 (the internal northbridge is currently always on bus 0) and if they are not accessible using MCFG they are put into a special fallback bitmap. On systems where it isn't we assume the BIOS vendor supplied correct MCFG. Requires the earlier patch for mmconfig type1 fallback Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-12[PATCH] i386/x86-64 Fall back to type 1 access when no entry foundAndi Kleen
When there is no entry for a bus in MCFG fall back to type1. This is especially important on K8 systems where always some devices can't be accessed using mmconfig (in particular the builtin northbridge doesn't support it for its own devices) Cc: <gregkh@suse.de> Cc: <jgarzik@pobox.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-12[PATCH] i386/x86-64: Don't call change_page_attr with a spinlock heldAndi Kleen
It's illegal because it can sleep. Use a two step lookup scheme instead. First look up the vm_struct, then change the direct mapping, then finally unmap it. That's ok because nobody can change the particular virtual address range as long as the vm_struct is still in the global list. Also added some LinuxDoc documentation to iounmap. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-12[PATCH] i386/x86-64 disable LAPIC completely for offline CPUShaohua Li
Disabling LAPIC timer isn't sufficient. In some situations, such as we enabled NMI watchdog, there is still unexpected interrupt (such as NMI) invoked in offline CPU. This also avoids offline CPU receives spurious interrupt and anything similar. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Andi Kleen <ak@suse.de> Acked-by: "Seth, Rohit" <rohit.seth@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-12[PATCH] kprobes: increment kprobe missed count for multiprobesKeshavamurthy Anil S
When multiple probes are registered at the same address and if due to some recursion (probe getting triggered within a probe handler), we skip calling pre_handlers and just increment nmissed field. The below patch make sure it walks the list for multiple probes case. Without the below patch we get incorrect results of nmissed count for multiple probe case. Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-12[PATCH] x86: fix NMI with CPU hotplugShaohua Li
With CPU hotplug enabled, NMI watchdog stoped working. It appears the violation is the cpu_online check in nmi handler. local ACPI based NMI watchdog is initialized before we set CPU online for APs. It's quite possible a NMI is fired before we set CPU online, and that's what happens here. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Acked-by: Zwane Mwaikambo <zwane@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-06[CPUFREQ] CPU frequency display in /proc/cpuinfoVenkatesh Pallipadi
What is the value shown in "cpu MHz" of /proc/cpuinfo when CPUs are capable of changing frequency? Today the answer is: It depends. On i386: SMP kernel - It is always the boot frequency UP kernel - Scales with the frequency change and shows that was last set. On x86_64: There is one single variable cpu_khz that gets written by all the CPUs. So, the frequency set by last CPU will be seen on /proc/cpuinfo of all the CPUs in the system. What you see also depends on whether you have constant_tsc capable CPU or not. On ia64: It is always boot time frequency of a particular CPU that gets displayed. The patch below changes this to: Show the last known frequency of the particular CPU, when cpufreq is present. If cpu doesnot support changing of frequency through cpufreq, then boot frequency will be shown. The patch affects i386, x86_64 and ia64 architectures. Signed-off-by: Venkatesh Pallipadi<venkatesh.pallipadi@intel.com> Signed-off-by: Dave Jones <davej@redhat.com>
2005-12-06[CPUFREQ] Move PMBASE reading away and do it only once at initialization timeMattia Dongili
This patch moves away PMBASE reading and only performs it at cpufreq_register_driver time by exiting with -ENODEV if unable to read the value. Signed-off-by: Mattia Dongili <malattia@linux.it> Acked-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Dave Jones <davej@redhat.com>
2005-12-06[CPUFREQ] Measure transition latency at driver initializationMattia Dongili
The attached patch introduces runtime latency measurement for ICH[234] based chipsets instead of using CPUFREQ_ETERNAL. It includes some sanity checks in case the measured value is out of range and assigns a safe value of 500uSec that should still be enough on problematics chipsets (current testing report values ~200uSec). The measurement is currently done in speedstep_get_freqs in order to avoid further unnecessary transitions and in the hope it'll come handy for SMI also. Signed-off-by: Mattia Dongili <malattia@linux.it> Acked-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Dave Jones <davej@redhat.com> speedstep-ich.c | 4 ++-- speedstep-lib.c | 32 +++++++++++++++++++++++++++++++- speedstep-lib.h | 1 + speedstep-smi.c | 1 + 4 files changed, 35 insertions(+), 3 deletions(-)
2005-12-06Merge ../linus/Dave Jones
2005-12-06[CPUFREQ] Change loglevels on powernow-k8 bios error printk's.Dave Jones
If a user has booted with 'quiet', some important messages don't get displayed which really should. We've seen at least one case where powernow-k8 stopped working, and the user needed a BIOS update that they didn't know about. Signed-off-by: Dave Jones <davej@redhat.com>
2005-12-01[PATCH] cpufreq-nforce2.c fix u32<0 testGabriel A. Devenyi
Thanks to LinuxICC (http://linuxicc.sf.net), a comparison of a u32 less than 0 was found, this patch changes the variable to a signed int so that comparison is meaningful. Signed-off-by: Gabriel A. Devenyi <ace@staticwave.ca> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Dave Jones <davej@redhat.com>
2005-11-30[ACPI] properly detect pmtimer on ASUS a8v motherboardDavid Shaohua Li
Handle FADT 2.0 xpmtmr address 0 case. http://bugzilla.kernel.org/show_bug.cgi?id=5283 Signed-off-by: Shaohua Li<shaohua.li@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Len Brown <len.brown@intel.com>
2005-11-30[CPUFREQ] Fix indentation in powernow-k8Dave Jones
Signed-off-by: Dave Jones <davej@redhat.com>
2005-11-29[PATCH] setting irq affinity is broken in ia32 with MSI enabledShaohua Li
Setting irq affinity stops working when MSI is enabled. With MSI, move_irq is empty, so we can't change irq affinity. It appears a typo in Ashok's original commit for this issue. X86_64 actually is using move_native_irq. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-29[PATCH] fix rebooting on HP nc6120 laptopThierry Vignaud
Anne NICOLAS <anne.nicolas@mandriva.com> and Andres Kaaber <andres.kaaber@rescue.ee> reported their HP laptop didn't reboot smoothly. Signed-off-by: Thierry Vignaud <tvignaud@mandriva.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-29Merge ../linusDave Jones
2005-11-29[PATCH] Support 100 MHz frequency transitionsLangsdorf, Mark
Future versions of the Opteron processor may support frequency transitions of 100 MHz, instead of the=20 current 200 MHz. This patch enables the powernow-k8 driver to transition to an odd FID code, indicating a multiple of 100 MHz frequency. Signed-off-by: Jacob Shin <jacob.shin@amd.com> Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com> Signed-off-by: Dave Jones <davej@redhat.com>
2005-11-23[PATCH] PCI: direct.c: DBGDaniel Marjamäki
The DBG() call where updated with the appropriate KERN_* symbol. Signed-off-by: Daniel Marjamäki <daniel.marjamaki@comhem.se> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-23[PATCH] PCI: remove bogus resource collision errorRajesh Shah
When attempting to hotadd a PCI card with a bridge on it, I saw the kernel reporting resource collision errors even when there were really no collisions. The problem is that the code doesn't skip over "invalid" resources with their resource type flag not set. Others have reported similar problems at boot time and for non-bridge PCI card hotplug too, where the code flags a resource collision for disabled ROMs. This patch fixes both problems. Signed-off-by: Rajesh Shah <rajesh.shah@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-23[PATCH] PCI: trivial printk updates in common.cDaniel Marjamäkia
Modified common.c so it's using the appropriate KERN_* in printk() calls. Signed-off-by: Daniel Marjamäkia <daniel.marjamaki@comhem.se> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-23[PATCH] kprobes: Fix return probes on sys_execveJim Keniston
Fix a bug in kprobes that can cause an Oops or even a crash when a return probe is installed on one of the following functions: sys_execve, do_execve, load_*_binary, flush_old_exec, or flush_thread. The fix is to remove the call to kprobe_flush_task() in flush_thread(). This fix has been tested on all architectures for which the return-probes feature has been implemented (i386, x86_64, ppc64, ia64). Please apply. BACKGROUND Up to now, we have called kprobe_flush_task() under two situations: when a task exits, and when it execs. Flushing kretprobe_instances on exit is correct because (a) do_exit() doesn't return, and (b) one or more return-probed functions may be active when a task calls do_exit(). Neither is the case for sys_execve() and its callees. Initially, the mistaken call to kprobe_flush_task() on exec was harmless because we put the "real" return address of each active probed function back in the stack, just to be safe, when we recycled its kretprobe_instance. When support for ppc64 and ia64 was added, this safety measure couldn't be employed, and was eventually dropped even for i386 and x86_64. sys_execve() and its callees were informally blacklisted for return probes until this fix was developed. Acked-by: Prasanna S Panchamukhi <prasanna@in.ibm.com> Signed-off-by: Jim Keniston <jkenisto@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-21[CPUFREQ] Improve Error reporting in powernow-k8Jacob Shin
This patch cleans up some error messages in the powernow-k8 driver and makes them more understandable. Signed-off-by: Jacob Shin <jacob.shin@amd.com> Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com> Signed-off-by: Dave Jones <davej@redhat.com>
2005-11-20[PATCH] Register disabled CPUsAshok Raj
Needed to make the earlier use disabled CPUs for CPU hotplug patch actually work. Need to register disabled processors as well, so we can count them towards cpu_possible_map as hot pluggable cpus. Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-20[PATCH] i386: Use bigsmp for > 8 core Opteron systemsAndi Kleen
bigsmp is reported to work on large Opteron systems on 32bit too. Enable it by default there. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-15[PATCH] drop "[PATCH] i386 kexec-on-panic: Don't shutdown the apics"Vivek Goyal
A patch by Eric was merged (f2b36db692b7ff6972320ad9839ae656a3b0ee3e) and later on reverted back (1e4c85f97fe26fbd70da12148b3992c0e00361fd). Along with above patch, another patch was posted and has been merged (3d1675b41b02d64bd1185903ea0d25a8c0bb6dea). That patch was dependent on the above patch and now it should also be reverted. Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-14Merge x86-64 update from AndiLinus Torvalds
2005-11-14[PATCH] x86_64: x86_64/i386 fix Intel cache detection code assumption about ā†µSiddha, Suresh B
threads sharing Fix the Intel cache detection code assumption that number of threads sharing the cache will either be equal to number of HT or core siblings. This also cleans up the code in general a bit. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-14[PATCH] x86-64/i386: Intel HT, Multi core detection fixesSiddha, Suresh B
Fields obtained through cpuid vector 0x1(ebx[16:23]) and vector 0x4(eax[14:25], eax[26:31]) indicate the maximum values and might not always be the same as what is available and what OS sees. So make sure "siblings" and "cpu cores" values in /proc/cpuinfo reflect the values as seen by OS instead of what cpuid instruction says. This will also fix the buggy BIOS cases (for example where cpuid on a single core cpu says there are "2" siblings, even when HT is disabled in the BIOS. http://bugzilla.kernel.org/show_bug.cgi?id=4359) Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-14[PATCH] x86_64: Force correct address space size for MTRR on some 64bit ā†µShaohua Li
Intel Xeons They report 40bit, but only have 36bits of physical address space. This caused problems with setting up the correct masks for MTRR. CPUID workaround for steppings 0F33h(supporting x86) and 0F34h(supporting x86 and EM64T). Detail info can be found at: http://download.intel.com/design/Xeon/specupdt/30240216.pdf http://download.intel.com/design/Pentium4/specupdt/30235221.pdf Signed-off-by: Shaohua Li<shaohua.li@intel.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-14[PATCH] x86_64: Unmap NULL during early bootupSiddha, Suresh B
We should zap the low mappings, as soon as possible, so that we can catch kernel bugs more effectively. Previously early boot had NULL mapped and didn't trap on NULL references. This patch introduces boot_level4_pgt, which will always have low identity addresses mapped. Druing boot, all the processors will use this as their level4 pgt. On BP, we will switch to init_level4_pgt as soon as we enter C code and zap the low mappings as soon as we are done with the usage of identity low mapped addresses. On AP's we will zap the low mappings as soon as we jump to C code. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-14[PATCH] x86-64/i386: Fix CPU model for family 6Suresh Siddha
According to cpuid instruction in IA32 SDM-Vol2, when computing cpu model, we need to consider extended model ID for family 0x6 also. AK: Also added fixes/simplifcation from Petr Vandrovec Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-14[PATCH] i386/x86-64: Share interrupt vectors when there is a large number of ā†µJames Cleverdon
interrupt sources Here's a patch that builds on Natalie Protasevich's IRQ compression patch and tries to work for MPS boots as well as ACPI. It is meant for a 4-node IBM x460 NUMA box, which was dying because it had interrupt pins with GSI numbers > NR_IRQS and thus overflowed irq_desc. The problem is that this system has 270 GSIs (which are 1:1 mapped with I/O APIC RTEs) and an 8-node box would have 540. This is much bigger than NR_IRQS (224 for both i386 and x86_64). Also, there aren't enough vectors to go around. There are about 190 usable vectors, not counting the reserved ones and the unused vectors at 0x20 to 0x2F. So, my patch attempts to compress the GSI range and share vectors by sharing IRQs. Cc: "Protasevich, Natalie" <Natalie.Protasevich@unisys.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-14[PATCH] x86_64: Make i386 compile again with fourth DMA32 zoneAndi Kleen
The code should deal with an additional empty zone, so fix up the #error. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-13[PATCH] i386: generic cmpxchgNick Piggin
- Make cmpxchg generally available on the i386 platform. - Provide emulation of cmpxchg suitable for uniprocessor if built and run on 386. From: Christoph Lameter <clameter@sgi.com> - Cut down patch and small style changes. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-13[PATCH] x86: fix cpu_khz with clock=pitTim Mann
Fix http://bugzilla.kernel.org/show_bug.cgi?id=5546 The cpu_khz global is not initialized and remains 0 if you boot with clock=pit, even if the processor does have a TSC. This may have bad ramifications since the variable is used in various places scattered around the kernel, though I didn't check them all to see if they can tolerate cpu_khz = 0. You can observe the problem by doing "cat /proc/cpuinfo"; the cpu MHz line says 0.000. The fix is trivial; call init_cpu_khz() from init_pit(), just as it's called from the timers/timer_foo.c:init_foo() for other values of foo. Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-13[PATCH] i386: NMI pointer comparison fixJan Beulich
Instruction pointer comparisons for the NMI on debug stack check/fixup were incorrect. From: Jan Beulich <jbeulich@novell.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Zwane Mwaikambo <zwane@holomorphy.com> Acked-by: "Seth, Rohit" <rohit.seth@intel.com> Cc: Zachary Amsden <zach@vmware.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-13[PATCH] arch/i386/mm/init.c: small cleanupsAdrian Bunk
This patch contains the following cleanups: - make a needlessly global function static - every file should include the headers containing the prototypes for it's global functions Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-13[PATCH] move pm_register/etc. to CONFIG_PM_LEGACY, pm_legacy.hJeff Garzik
Since few people need the support anymore, this moves the legacy pm_xxx functions to CONFIG_PM_LEGACY, and include/linux/pm_legacy.h. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-10[PATCH] PCI: fix for Toshiba ohci1394 quirkJesse Barnes
After much testing and agony, I've discovered that my previous ohci1394 quirk for Toshiba laptops is not 100% reliable. It apparently fails to do the interrupt line change either correctly or in time, since in about 2 out of 5 boots, the kernel's irqdebug code will *still* disable irq 11 when the ohci1394 driver is loaded (at pci_enable_device time I think). This patch switches things around a little in the workaround. First, it removes the mdelay. I didn't see a need for it and my testing has shown that it's not necessary for the quirk to work. Secondly, instead of trying to change the interrupt line to what ACPI tells us it should be, this patch makes the quirk use the value in the PCI_INTERRUPT_LINE register. On this laptop at least, that seems to be the right thing to do, though additional testing on other laptops and/or with actual firewire devices would be appreciated. Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-11-09[PATCH] i386: EXPORT_SYMBOL(screen_info) even #ifndef CONFIG_VTAdrian Bunk
The folllowing modules require screen_info but don't depend on CONFIG_VT: - vga16fb.ko - intelfb.ko Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: "Antonino A. Daplas" <adaplas@pol.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09[PATCH] sched: resched and cpu_idle reworkNick Piggin
Make some changes to the NEED_RESCHED and POLLING_NRFLAG to reduce confusion, and make their semantics rigid. Improves efficiency of resched_task and some cpu_idle routines. * In resched_task: - TIF_NEED_RESCHED is only cleared with the task's runqueue lock held, and as we hold it during resched_task, then there is no need for an atomic test and set there. The only other time this should be set is when the task's quantum expires, in the timer interrupt - this is protected against because the rq lock is irq-safe. - If TIF_NEED_RESCHED is set, then we don't need to do anything. It won't get unset until the task get's schedule()d off. - If we are running on the same CPU as the task we resched, then set TIF_NEED_RESCHED and no further action is required. - If we are running on another CPU, and TIF_POLLING_NRFLAG is *not* set after TIF_NEED_RESCHED has been set, then we need to send an IPI. Using these rules, we are able to remove the test and set operation in resched_task, and make clear the previously vague semantics of POLLING_NRFLAG. * In idle routines: - Enter cpu_idle with preempt disabled. When the need_resched() condition becomes true, explicitly call schedule(). This makes things a bit clearer (IMO), but haven't updated all architectures yet. - Many do a test and clear of TIF_NEED_RESCHED for some reason. According to the resched_task rules, this isn't needed (and actually breaks the assumption that TIF_NEED_RESCHED is only cleared with the runqueue lock held). So remove that. Generally one less locked memory op when switching to the idle thread. - Many idle routines clear TIF_POLLING_NRFLAG, and only set it in the inner most polling idle loops. The above resched_task semantics allow it to be set until before the last time need_resched() is checked before going into a halt requiring interrupt wakeup. Many idle routines simply never enter such a halt, and so POLLING_NRFLAG can be always left set, completely eliminating resched IPIs when rescheduling the idle task. POLLING_NRFLAG width can be increased, to reduce the chance of resched IPIs. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Con Kolivas <kernel@kolivas.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09[PATCH] sched: disable preempt in idle tasksNick Piggin
Run idle threads with preempt disabled. Also corrected a bugs in arm26's cpu_idle (make it actually call schedule()). How did it ever work before? Might fix the CPU hotplugging hang which Nigel Cunningham noted. We think the bug hits if the idle thread is preempted after checking need_resched() and before going to sleep, then the CPU offlined. After calling stop_machine_run, the CPU eventually returns from preemption and into the idle thread and goes to sleep. The CPU will continue executing previous idle and have no chance to call play_dead. By disabling preemption until we are ready to explicitly schedule, this bug is fixed and the idle threads generally become more robust. From: alexs <ashepard@u.washington.edu> PPC build fix From: Yoichi Yuasa <yuasa@hh.iij4u.or.jp> MIPS build fix Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Yoichi Yuasa <yuasa@hh.iij4u.or.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>