aboutsummaryrefslogtreecommitdiff
path: root/arch/i386/kernel
AgeCommit message (Collapse)Author
2005-09-05[PATCH] SYSEMU: fix sysaudit / singlestep interactionBodo Stroesser
Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> This is simply an adjustment for "Ptrace - i386: fix Syscall Audit interaction with singlestep" to work on top of SYSEMU patches, too. On this patch, I have some doubts: I wonder why we need to alter that way ptrace_disable(). I left the patch this way because it has been extensively tested, but I don't understand the reason. The current PTRACE_DETACH handling simply clears child->ptrace; actually this is not enough because entry.S just looks at the thread_flags; actually, do_syscall_trace checks current->ptrace but I don't think depending on that is good, at least for performance, so I think the clearing is done elsewhere. For instance, on PTRACE_CONT it's done, but doing PTRACE_DETACH without PTRACE_CONT is possible (and happens when gdb crashes and one kills it manually). Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> CC: Roland McGrath <roland@redhat.com> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] Uml support: add PTRACE_SYSEMU_SINGLESTEP option to i386Bodo Stroesser
This patch implements the new ptrace option PTRACE_SYSEMU_SINGLESTEP, which can be used by UML to singlestep a process: it will receive SINGLESTEP interceptions for normal instructions and syscalls, but syscall execution will be skipped just like with PTRACE_SYSEMU. Signed-off-by: Bodo Stroesser <bstroesser@fujitsu-siemens.com> Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] Uml support: reorganize PTRACE_SYSEMU supportBodo Stroesser
With this patch, we change the way we handle switching from PTRACE_SYSEMU to PTRACE_{SINGLESTEP,SYSCALL}, to free TIF_SYSCALL_EMU from double use as a preparation for PTRACE_SYSEMU_SINGLESTEP extension, without changing the behavior of the host kernel. Signed-off-by: Bodo Stroesser <bstroesser@fujitsu-siemens.com> Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] UML Support - Ptrace: adds the host SYSEMU support, for UML and ↵Laurent Vivier
general usage Jeff Dike <jdike@addtoit.com>, Paolo 'Blaisorblade' Giarrusso <blaisorblade_spam@yahoo.it>, Bodo Stroesser <bstroesser@fujitsu-siemens.com> Adds a new ptrace(2) mode, called PTRACE_SYSEMU, resembling PTRACE_SYSCALL except that the kernel does not execute the requested syscall; this is useful to improve performance for virtual environments, like UML, which want to run the syscall on their own. In fact, using PTRACE_SYSCALL means stopping child execution twice, on entry and on exit, and each time you also have two context switches; with SYSEMU you avoid the 2nd stop and so save two context switches per syscall. Also, some architectures don't have support in the host for changing the syscall number via ptrace(), which is currently needed to skip syscall execution (UML turns any syscall into getpid() to avoid it being executed on the host). Fixing that is hard, while SYSEMU is easier to implement. * This version of the patch includes some suggestions of Jeff Dike to avoid adding any instructions to the syscall fast path, plus some other little changes, by myself, to make it work even when the syscall is executed with SYSENTER (but I'm unsure about them). It has been widely tested for quite a lot of time. * Various fixed were included to handle the various switches between various states, i.e. when for instance a syscall entry is traced with one of PT_SYSCALL / _SYSEMU / _SINGLESTEP and another one is used on exit. Basically, this is done by remembering which one of them was used even after the call to ptrace_notify(). * We're combining TIF_SYSCALL_EMU with TIF_SYSCALL_TRACE or TIF_SINGLESTEP to make do_syscall_trace() notice that the current syscall was started with SYSEMU on entry, so that no notification ought to be done in the exit path; this is a bit of a hack, so this problem is solved in another way in next patches. * Also, the effects of the patch: "Ptrace - i386: fix Syscall Audit interaction with singlestep" are cancelled; they are restored back in the last patch of this series. Detailed descriptions of the patches doing this kind of processing follow (but I've already summed everything up). * Fix behaviour when changing interception kind #1. In do_syscall_trace(), we check the status of the TIF_SYSCALL_EMU flag only after doing the debugger notification; but the debugger might have changed the status of this flag because he continued execution with PTRACE_SYSCALL, so this is wrong. This patch fixes it by saving the flag status before calling ptrace_notify(). * Fix behaviour when changing interception kind #2: avoid intercepting syscall on return when using SYSCALL again. A guest process switching from using PTRACE_SYSEMU to PTRACE_SYSCALL crashes. The problem is in arch/i386/kernel/entry.S. The current SYSEMU patch inhibits the syscall-handler to be called, but does not prevent do_syscall_trace() to be called after this for syscall completion interception. The appended patch fixes this. It reuses the flag TIF_SYSCALL_EMU to remember "we come from PTRACE_SYSEMU and now are in PTRACE_SYSCALL", since the flag is unused in the depicted situation. * Fix behaviour when changing interception kind #3: avoid intercepting syscall on return when using SINGLESTEP. When testing 2.6.9 and the skas3.v6 patch, with my latest patch and had problems with singlestepping on UML in SKAS with SYSEMU. It looped receiving SIGTRAPs without moving forward. EIP of the traced process was the same for all SIGTRAPs. What's missing is to handle switching from PTRACE_SYSCALL_EMU to PTRACE_SINGLESTEP in a way very similar to what is done for the change from PTRACE_SYSCALL_EMU to PTRACE_SYSCALL_TRACE. I.e., after calling ptrace(PTRACE_SYSEMU), on the return path, the debugger is notified and then wake ups the process; the syscall is executed (or skipped, when do_syscall_trace() returns 0, i.e. when using PTRACE_SYSEMU), and do_syscall_trace() is called again. Since we are on the return path of a SYSEMU'd syscall, if the wake up is performed through ptrace(PTRACE_SYSCALL), we must still avoid notifying the parent of the syscall exit. Now, this behaviour is extended even to resuming with PTRACE_SINGLESTEP. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] Ptrace/i386: fix "syscall audit" interaction with singlestepBodo Stroesser
Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Avoid giving two traps for singlestep instead of one, when syscall auditing is enabled. In fact no singlestep trap is sent on syscall entry, only on syscall exit, as can be seen in entry.S: # Note that in this mask _TIF_SINGLESTEP is not tested !!! <<<<<<<<<<<<<< testb $(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT|_TIF_SECCOMP),TI_flags(%ebp) jnz syscall_trace_entry ... syscall_trace_entry: ... call do_syscall_trace But auditing a SINGLESTEP'ed process causes do_syscall_trace to be called, so the tracer will get one more trap on the syscall entry path, which it shouldn't. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> CC: Roland McGrath <roland@redhat.com> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] add suspend/resume for timerShaohua Li
The timers lack .suspend/.resume methods. Because of this, jiffies got a big compensation after a S3 resume. And then softlockup watchdog reports an oops. This occured with HPET enabled, but it's also possible for other timers. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] swsusp: fix remaining u32 vs. pm_message_t confusionPavel Machek
Fix remaining bits of u32 vs. pm_message confusion. Should not break anything. Signed-off-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] ISA DMA suspend for i386Pierre Ossman
Reset the ISA DMA controller into a known state after a suspend. Primary concern was reenabling the cascading DMA channel (4). Signed-off-by: Pierre Ossman <drzeus@drzeus.cx> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] unify x86/x86-64 semaphore codeBenjamin LaHaise
This patch moves the common code in x86 and x86-64's semaphore.c into a single file in lib/semaphore-sleepers.c. The arch specific asm stubs are left in the arch tree (in semaphore.c for i386 and in the asm for x86-64). There should be no changes in code/functionality with this patch. Signed-off-by: Benjamin LaHaise <benjamin.c.lahaise@intel.com> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] i386 boottime for_each_cpu brokenZwane Mwaikambo
for_each_cpu walks through all processors in cpu_possible_map, which is defined as cpu_callout_map on i386 and isn't initialised until all processors have been booted. This breaks things which do for_each_cpu iterations early during boot. So, define cpu_possible_map as a bitmap with NR_CPUS bits populated. This was triggered by a patch i'm working on which does alloc_percpu before bringing up secondary processors. From: Alexander Nyberg <alexn@telia.com> i386-boottime-for_each_cpu-broken.patch i386-boottime-for_each_cpu-broken-fix.patch The SMP version of __alloc_percpu checks the cpu_possible_map before allocating memory for a certain cpu. With the above patches the BSP cpuid is never set in cpu_possible_map which breaks CONFIG_SMP on uniprocessor machines (as soon as someone tries to dereference something allocated via __alloc_percpu, which in fact is never allocated since the cpu is not set in cpu_possible_map). Signed-off-by: Zwane Mwaikambo <zwane@arm.linux.org.uk> Signed-off-by: Alexander Nyberg <alexn@telia.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] i386: encapsulate copying of pgd entriesZachary Amsden
Add a clone operation for pgd updates. This helps complete the encapsulation of updates to page tables (or pages about to become page tables) into accessor functions rather than using memcpy() to duplicate them. This is both generally good for consistency and also necessary for running in a hypervisor which requires explicit updates to page table entries. The new function is: clone_pgd_range(pgd_t *dst, pgd_t *src, int count); dst - pointer to pgd range anwhere on a pgd page src - "" count - the number of pgds to copy. dst and src can be on the same page, but the range must not overlap and must not cross a page boundary. Note that I ommitted using this call to copy pgd entries into the software suspend page root, since this is not technically a live paging structure, rather it is used on resume from suspend. CC'ing Pavel in case he has any feedback on this. Thanks to Chris Wright for noticing that this could be more optimal in PAE compiles by eliminating the memset. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] x86 NMI: better support for debuggersGeorge Anzinger
This patch adds a notify to the die_nmi notify that the system is about to be taken down. If the notify is handled with a NOTIFY_STOP return, the system is given a new lease on life. We also change the nmi watchdog to carry on if die_nmi returns. This give debug code a chance to a) catch watchdog timeouts and b) possibly allow the system to continue, realizing that the time out may be due to debugger activities such as single stepping which is usually done with "other" cpus held. Signed-off-by: George Anzinger<george@mvista.com> Cc: Keith Owens <kaos@ocs.com.au> Signed-off-by: George Anzinger <george@mvista.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] x86: introduce a write acessor for updating the current LDTZachary Amsden
Introduce a write acessor for updating the current LDT. This is required for hypervisors like Xen that do not allow LDT pages to be directly written. Testing - here's a fun little LDT test that can be trivially modified to test limits as well. /* * Copyright (c) 2005, Zachary Amsden (zach@vmware.com) * This is licensed under the GPL. */ #include <stdio.h> #include <signal.h> #include <asm/ldt.h> #include <asm/segment.h> #include <sys/types.h> #include <unistd.h> #include <sys/mman.h> #define __KERNEL__ #include <asm/page.h> void main(void) { struct user_desc desc; char *code; unsigned long long tsc; code = (char *)mmap(0, 8192, PROT_EXEC|PROT_READ|PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); desc.entry_number = 0; desc.base_addr = code; desc.limit = 1; desc.seg_32bit = 1; desc.contents = MODIFY_LDT_CONTENTS_CODE; desc.read_exec_only = 0; desc.limit_in_pages = 1; desc.seg_not_present = 0; desc.useable = 1; if (modify_ldt(1, &desc, sizeof(desc)) != 0) { perror("modify_ldt"); } printf("code base is 0x%08x\n", (unsigned)code); code[0x0ffe] = 0x0f; /* rdtsc */ code[0x0fff] = 0x31; code[0x1000] = 0xcb; /* lret */ __asm__ __volatile("lcall $7,$0xffe" : "=A" (tsc)); printf("TSC is 0x%016llx\n", tsc); } Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] x86: make IOPL explicitZachary Amsden
The pushf/popf in switch_to are ONLY used to switch IOPL. Making this explicit in C code is more clear. This pushf/popf pair was added as a bugfix for leaking IOPL to unprivileged processes when using sysenter/sysexit based system calls (sysexit does not restore flags). When requesting an IOPL change in sys_iopl(), it is just as easy to change the current flags and the flags in the stack image (in case an IRET is required), but there is no reason to force an IRET if we came in from the SYSENTER path. This change is the minimal solution for supporting a paravirtualized Linux kernel that allows user processes to run with I/O privilege. Other solutions require radical rewrites of part of the low level fault / system call handling code, or do not fully support sysenter based system calls. Unfortunately, this added one field to the thread_struct. But as a bonus, on P4, the fastest time measured for switch_to() went from 312 to 260 cycles, a win of about 17% in the fast case through this performance critical path. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] x86: privilege cleanupZachary Amsden
Privilege checking cleanup. Originally, these diffs were much greater, but recent cleanups in Linux have already done much of the cleanup. I added some explanatory comments in places where the reasoning behind certain tests is rather subtle. Also, in traps.c, we can skip the user_mode check in handle_BUG(). The reason is, there are only two call chains - one via die_if_kernel() and one via do_page_fault(), both entering from die(). Both of these paths already ensure that a kernel mode failure has happened. Also, the original check here, if (user_mode(regs)) was insufficient anyways, since it would not rule out BUG faults from V8086 mode execution. Saving the %ss segment in show_regs() rather than assuming a fixed value also gives better information about the current kernel state in the register dump. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] x86: more asm cleanupsZachary Amsden
Some more assembler cleanups I noticed along the way. Signed-off-by: Zachary Amsden <zach@vmware.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] i386: load_tls() fixZachary Amsden
Subtle fix: load_TLS has been moved after saving %fs and %gs segments to avoid creating non-reversible segments. This could conceivably cause a bug if the kernel ever needed to save and restore fs/gs from the NMI handler. It currently does not, but this is the safest approach to avoiding fs/gs corruption. SMIs are safe, since SMI saves the descriptor hidden state. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] i386: inline assembler: cleanup and encapsulate descriptor and task ↵Zachary Amsden
register management i386 inline assembler cleanup. This change encapsulates descriptor and task register management. Also, it is possible to improve assembler generation in two cases; savesegment may store the value in a register instead of a memory location, which allows GCC to optimize stack variables into registers, and MOV MEM, SEG is always a 16-bit write to memory, making the casting in math-emu unnecessary. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] i386: cleanup serialize msrZachary Amsden
i386 arch cleanup. Introduce the serialize macro to serialize processor state. Why the microcode update needs it I am not quite sure, since wrmsr() is already a serializing instruction, but it is a microcode update, so I will keep the semantic the same, since this could be a timing workaround. As far as I can tell, this has always been there since the original microcode update source. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] i386: inline asm cleanupZachary Amsden
i386 Inline asm cleanup. Use cr/dr accessor functions. Also, a potential bugfix. Also, some CR accessors really should be volatile. Reads from CR0 (numeric state may change in an exception handler), writes to CR4 (flipping CR4.TSD) and reads from CR2 (page fault) prevent instruction re-ordering. I did not add memory clobber to CR3 / CR4 / CR0 updates, as it was not there to begin with, and in no case should kernel memory be clobbered, except when doing a TLB flush, which already has memory clobber. I noticed that page invalidation does not have a memory clobber. I can't find a bug as a result, but there is definitely a potential for a bug here: #define __flush_tlb_single(addr) \ __asm__ __volatile__("invlpg %0": :"m" (*(char *) addr)) Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] i386: clean up vDSO alignment paddingRoland McGrath
This makes the vDSO use nops for all its padding around instructions, rather than sometimes zeros, and nop-pads the end of the area containing instructions to a 32-byte cache line, to keep text and data in separate lines. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] x86: Add the check for all the cores in a package in cache informationVenkatesh Pallipadi
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] x86: sutomatically enable bigsmp when we have more than 8 CPUsVenkatesh Pallipadi
i386 generic subarchitecture requires explicit dmi strings or command line to enable bigsmp mode. The patch below removes that restriction, and uses bigsmp as soon as it finds more than 8 logical CPUs, Intel processors and xAPIC support. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] kdump: Save parameter segment in protected mode (x86)Vivek Goyal
o With introduction of kexec as boot-loader, the assumption that parameter segment will always be loaded at lower address than kernel and will be addressable by early bootup page tables is no longer valid. In kexec on panic case parameter segment might well be loaded beyond kernel image and might not be addressable by early boot page tables. o This case might hit in the scenario where user has reserved a chunk of memory for second kernel, for example 16MB to 64MB, and has also built second kernel for physical memory location 16MB. In this case kexec has no choice but to load the parameter segment at a higher address than new kernel image at safe location where new kernel does not stomp it. o Though problem should automatically go away once relocatable kernel for i386 is in place and kexec can determine the location of new kernel at run time and load parameter segment at lower address than kernel image. But till then this patch can go in (assuming it does not break something else). o This patch moves up the boot parameter saving code. Now boot parameters are copied out in protected mode before page tables are initialized. This will ensure that parameter segment is always addressable irrespective of its physical location. Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] vm86: Honor TF bit when emulating an instructionPetr Tesarik
If the virtual 86 machine reaches an instruction which raises a General Protection Fault (such as CLI or STI), the instruction is emulated (in handle_vm86_fault). However, the emulation ignored the TF bit, so the hardware debug interrupt was not invoked after such an emulated instruction (and the DOS debugger missed it). This patch fixes the problem by emulating the hardware debug interrupt as the last action before control is returned to the VM86 program. Signed-off-by: Petr Tesarik <kernel@tesarici.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] x86: fix EFI memory map parsingMatt Tolentino
The memory descriptors that comprise the EFI memory map are not fixed in stone such that the size could change in the future. This uses the memory descriptor size obtained from EFI to iterate over the memory map entries during boot. This enables the removal of an x86 specific pad (and ifdef) in the EFI header. I also couldn't stomach the broken up nature of the function to put EFI runtime calls into virtual mode any longer so I fixed that up a bit as well. For reference, this patch only impacts x86. Signed-off-by: Matt Tolentino <matthew.e.tolentino@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] hpet: use read_timer_tsc only when CPU has TSCVenkatesh Pallipadi
Only use read_timer_tsc only when CPU has TSC. Thanks to Andrea for pointing this out. Should not be issue on any platforms as all recent systems that has HPET also has CPUs that supports TSC. The patch is still required for correctness. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-29[PATCH] convert signal handling of NODEFER to act like other Unix boxes.Steven Rostedt
It has been reported that the way Linux handles NODEFER for signals is not consistent with the way other Unix boxes handle it. I've written a program to test the behavior of how this flag affects signals and had several reports from people who ran this on various Unix boxes, confirming that Linux seems to be unique on the way this is handled. The way NODEFER affects signals on other Unix boxes is as follows: 1) If NODEFER is set, other signals in sa_mask are still blocked. 2) If NODEFER is set and the signal is in sa_mask, then the signal is still blocked. (Note: this is the behavior of all tested but Linux _and_ NetBSD 2.0 *). The way NODEFER affects signals on Linux: 1) If NODEFER is set, other signals are _not_ blocked regardless of sa_mask (Even NetBSD doesn't do this). 2) If NODEFER is set and the signal is in sa_mask, then the signal being handled is not blocked. The patch converts signal handling in all current Linux architectures to the way most Unix boxes work. Unix boxes that were tested: DU4, AIX 5.2, Irix 6.5, NetBSD 2.0, SFU 3.5 on WinXP, AIX 5.3, Mac OSX, and of course Linux 2.6.13-rcX. * NetBSD was the only other Unix to behave like Linux on point #2. The main concern was brought up by point #1 which even NetBSD isn't like Linux. So with this patch, we leave NetBSD as the lonely one that behaves differently here with #2. Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-23[PATCH] i386: fix incorrect FP signal codeChuck Ebbert
i386 floating-point exception handling has a bug that can cause error code 0 to be sent instead of the proper code during signal delivery. This is caused by unconditionally checking the IS and c1 bits from the FPU status word when they are not always relevant. The IS bit tells whether an exception is a stack fault and is only relevant when the exception is IE (invalid operation.) The C1 bit determines whether a stack fault is overflow or underflow and is only relevant when IS and IE are set. Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-19[PATCH] Mobil Pentium 4 HT and the NMISteven Rostedt
I'm trying to get the nmi working with my laptop (IBM ThinkPad G41) and after debugging it a while, I found that the nmi code doesn't want to set it up for this particular CPU. Here I have: $ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 4 model name : Mobile Intel(R) Pentium(R) 4 CPU 3.33GHz stepping : 1 cpu MHz : 3320.084 cache size : 1024 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 1 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 3 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe pni monitor ds_cpl est tm2 cid xtpr bogomips : 6642.39 processor : 1 vendor_id : GenuineIntel cpu family : 15 model : 4 model name : Mobile Intel(R) Pentium(R) 4 CPU 3.33GHz stepping : 1 cpu MHz : 3320.084 cache size : 1024 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 1 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 3 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe pni monitor ds_cpl est tm2 cid xtpr bogomips : 6637.46 And the following code shows: $ cat linux-2.6.13-rc6/arch/i386/kernel/nmi.c [...] void setup_apic_nmi_watchdog (void) { switch (boot_cpu_data.x86_vendor) { case X86_VENDOR_AMD: if (boot_cpu_data.x86 != 6 && boot_cpu_data.x86 != 15) return; setup_k7_watchdog(); break; case X86_VENDOR_INTEL: switch (boot_cpu_data.x86) { case 6: if (boot_cpu_data.x86_model > 0xd) return; setup_p6_watchdog(); break; case 15: if (boot_cpu_data.x86_model > 0x3) return; Here I get boot_cpu_data.x86_model == 0x4. So I decided to change it and reboot. I now seem to have a working NMI. So, unless there's something know to be bad about this processor and the NMI. I'm submitting the following patch. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Zwane Mwaikambo <zwane@arm.linux.org.uk> Acked-by: Mikael Pettersson <mikpe@csd.uu.se> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-18[PATCH] x86: Remove obsolete get_cpu_vendor callAndi Kleen
Since early CPU identify is in this information is already available Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-01[PATCH] transmeta: CONFIG_PROC_FS=n build fixAndrew Morton
Fix bug found by Grant Coady <lkml@dodo.com.au>'s autobuild setup. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-01[PATCH] disable addres space randomization default on transmeta CPUsEric Lammerts
We know that the randomisation slows down some workloads on Transmeta CPUs by quite large amounts. We think it's because the CPU needs to recode the same x86 instructions when they pop up at a different virtual address after a fork+exec. So disable randomization by default on those CPUs. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-01[PATCH] remove sys_set_zone_reclaim()Ingo Molnar
This removes sys_set_zone_reclaim() for now. While i'm sure Martin is trying to solve a real problem, we must not hard-code an incomplete and insufficient approach into a syscall, because syscalls are pretty much for eternity. I am quite strongly convinced that this syscall must not hit v2.6.13 in its current form. Firstly, the syscall lacks basic syscall design: e.g. it allows the global setting of VM policy for unprivileged users. (!) [ Imagine an Oracle installation and a SAP installation on the same NUMA box fighting over the 'optimal' setting for this flag. What will they do? Will they try to set the flag to their own preferred value every second or so? ] Secondly, it was added based on a single datapoint from Martin: http://marc.theaimsgroup.com/?l=linux-mm&m=111763597218177&w=2 where Martin characterizes the numbers the following way: ' Run-to-run variability for "make -j" is huge, so these numbers aren't terribly useful except to see that with reclaim the benchmark still finishes in a reasonable amount of time. ' in other words: the fundamental problem has likely not been solved, only a tendential move into the right direction has been observed, and a handful of numbers were picked out of a set of hugely variable results, without showing the variability data. How much variance is there run-to-run? I'd really suggest to first walk the walk and see what's needed to get stable & predictable kernel compilation numbers on that NUMA box, before adding random syscalls to tune a particular aspect of the VM ... which approach might not even matter once the whole picture has been analyzed and understood! The third, most important point is that the syscall exposes VM tuning internals in a completely unstructured way. What sense does it make to have a _GLOBAL_ per-node setting for 'should we go to another node for reclaim'? If then it might make sense to do this per-app, via numalib or so. The change is minimalistic in that it doesnt remove the syscall and the underlying infrastructure changes, only the user-visible changes. We could perhaps add a CAP_SYS_ADMIN-only sysctl for this hack, a'ka /proc/sys/vm/swappiness, but even that looks quite counterproductive when the generic approach is that we are trying to reduce the number of external factors in the VM balance picture. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-30merge 2.6.13-rc4 with ACPI's to-linus treeLen Brown
2005-07-29/home/lenb/src/to-linus branch 'acpi-2.6.12'Len Brown
2005-07-29[ACPI] Always set P-state on initializationDominik Brodowski
Otherwise a platform that supports ACPI based cpufreq and boots up at lowest possible speed could stay there forever. This because the governor may request max speed, but the code doesn't update if there is no change in speed, and it assumed the initial state of max speed. http://bugzilla.kernel.org/show_bug.cgi?id=4634 Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2005-07-29[PATCH] x86: avoid wasting IRQs patch updateNatalie.Protasevich@unisys.com
The patch addresses a problem with ACPI SCI interrupt entry, which gets re-used, and the IRQ is assigned to another unrelated device. The patch corrects the code such that SCI IRQ is skipped and duplicate entry is avoided. Second issue came up with VIA chipset, the problem was caused by original patch assigning IRQs starting 16 and up. The VIA chipset uses 4-bit IRQ register for internal interrupt routing, and therefore cannot handle IRQ numbers assigned to its devices. The patch corrects this problem by allowing PCI IRQs below 16. Signed-off by: Natalie Protasevich <Natalie.Protasevich@unisys.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-29Merge master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreqLinus Torvalds
2005-07-29arch/i386/kernel/cpu/cpufreq/powernow-k8.c: In function ↵Dave Jones
`powernow_k8_cpu_init_acpi': arch/i386/kernel/cpu/cpufreq/powernow-k8.c:740: warning: unused variable `vid' arch/i386/kernel/cpu/cpufreq/powernow-k8.c:739: warning: unused variable `fid' arch/i386/kernel/cpu/cpufreq/powernow-k8.c:743: warning: unused variable `vid' arch/i386/kernel/cpu/cpufreq/powernow-k8.c:742: warning: unused variable `fid' arch/i386/kernel/cpu/cpufreq/powernow-k8.c:746: `fid' undeclared (first use in this function) arch/i386/kernel/cpu/cpufreq/powernow-k8.c:746: `vid' undeclared (first use in this function) Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Dave Jones <davej@redhat.com>
2005-07-29[PATCH] i386 machine_kexec: Cleanup inline assemblyEric W. Biederman
For some reason I was telling my inline assembly that the input argument was an output argument. Playing in the trampoline code I have seen a couple of instances where lgdt get the wrong size (because the trampolines run in 16bit mode) so use lgdtl and lidtl to be explicit. Additionally gcc-3.3 and gcc-3.4 want's an lvalue for a memory argument and it doesn't think an array of characters is an lvalue so use a packed structure instead. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-29Fix up powernow-k8 compile. (Missing definitions).Dave Jones
From: Mark Langsdorf <mark.langsdorf@amd.com> Signed-off-by: Dave Jones <davej@redhat.com>
2005-07-29Merge master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreqLinus Torvalds
2005-07-28[PATCH] re-disable TSC on NUMAQDave Hansen
Somewhere recently, the TSC got re-enabled for timekeeping on NUMAQ machines. However, the hardware makes these get unsynchronized quite badly. So badly, in fact, that the code to fix up the skew can just hang on boot. This patch re-disables them. It's nicely confined to the numaq.c file. It would be great if this could make it into 2.6.13, I think it counts as a bugfix. Tested on a 16-proc 4-node NUMAQ. Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-28[PATCH] x86_64: When running cpuid4 need to run on the correct CPUAndi Kleen
Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-28powernow-k8.c: In function `query_current_values_with_pending_wait':Dave Jones
powernow-k8.c:110: warning: `hi' may be used uninitialized in this function Signed-off-by: Brian Gerst <bgerst@didntduck.org> Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org>
2005-07-28Opteron revision F will support higher frequencies thanDave Jones
can be encoded in the current driver's 4 bit frequency field. This patch updates the driver to support Rev F including 6 bit FIDs and processor ID updates. This should apply cleanly whether or not the dual-core bugfix I sent out last week is applied. I'd prefer that both get applied, of course. Signed-off-by: David Keck <david.keck@amd.com> Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com> Signed-off-by: Dave Jones <davej@redhat.com>
2005-07-28powernow-k8 requires that a data structure forDave Jones
each core be created in the _cpu_init function call. The cpufreq infrastructure doesn't call _cpu_init for the second core in each processor. Some systems crashed when _get was called with an odd-numbered core because it tried to dereference a NULL pointer since the data structure had not been created. The attached patch solves the problem by initializing data structures for all shared cores in the _cpu_init function. It should apply to 2.6.12-rc6 and has been tested by AMD and Sun. Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com> Signed-off-by: Dave Jones <davej@redhat.com>
2005-07-27[PATCH] sys_get_thread_area does not clear the returned argumentBlaisorblade
sys_get_thread_area does not memset to 0 its struct user_desc info before copying it to user space... since sizeof(struct user_desc) is 16 while the actual datas which are filled are only 12 bytes + 9 bits (across the bitfields), there is a (small) information leak. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-27[PATCH] Fix warning in powernow-k8.cBrian Gerst
powernow-k8.c: In function `query_current_values_with_pending_wait': powernow-k8.c:110: warning: `hi' may be used uninitialized in this function Signed-off-by: Brian Gerst <bgerst@didntduck.org> Cc: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>